need help with extract info using regular expression from input line

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • manishabh77
    New Member
    • Apr 2008
    • 3

    need help with extract info using regular expression from input line

    I want to extract some info from the following input line using perl regular expression. I will appreciate any help in doing so.

    input line:
    hg19_ensGene_EN ST00000237247 range=chr1:6720 8779-67210057 5'pad=0 3'pad=0 strand=+ repeatMasking=n one

    info to be extracted:
    chr1:67208779-67210057:+

    The perl code i use till now that works successfully is:
    Code:
    while(<LOC>){
            chomp;
         if(/^>(\w+)\s\w+\=(chr\w+)\:(\d+)\-(\d+)/)	
                   $loc{$1} = "$2:$3:$4:$5"; 
    		print $loc{$1}."\n";
            }
    }
    extracts the following info:chr1:67208 779-67210057
    however i am unable to extract + info from the input line above.
    Last edited by numberwhun; Mar 19 '10, 12:23 PM. Reason: Please use code tags!
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    The reason you aren't extracting the "+" is because you aren't telling it to. The "+" seems to be after "strand=" and I don't see you matching that anywhere. You may want to rework your regex a bit.

    Regards,

    Jeff

    Comment

    • RonB
      Recognized Expert Contributor
      • Jun 2009
      • 589

      #3
      Code:
      #!/usr/bin/perl
      
      use strict;
      use warnings;
      use Data::Dumper;
      
      my $str = "hg19_ensGene_ENST00000237247 range=chr1:67208779-67210057 5'pad=0 3'pad=0 strand=+ repeatMasking=none";
      my ($range, $strand) = (split /[\s=]/, $str)[2,8];
      print Dumper ($range, $strand);

      Comment

      • murugaperumal
        New Member
        • Mar 2010
        • 3

        #4
        Code:
         
        my $var="hg19_ensGene_ENST00000237247 range=chr1:67208779-67210057 5'pad=0 3'pad=0 strand=+ repeatMasking=none";
        if($var=~/.*(chr1(:)[0-9-]+).*(\+) .*/)
        {
            print "$1$2$3";
        }

        Comment

        Working...