parsing text

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • idorjee
    New Member
    • Mar 2007
    • 76

    parsing text

    Hi,
    how can i get the second data in the 4th column (ie "NP_047184. 1") if the data above that in the same column ("AAD12597.1 ") is what i have as my query data. as you can see that the second column has same numbers grouped together which means they are the same, but with different data in the 4th columns (ids). i have one of the ids, how can i get the other one?
    thanks a lot.

    here is how my input file (testfile) looks like (with space delimited):
    Code:
    9	1246500		-	         AAD12597.1		3282737
    9	1246500	Provisional	    NP_047184.1	     10954455
    9	1246501		-	         AAD12599.1		3282739
    9	1246501	Provisional	    NP_047186.1	     10954457
    Code:
    my $infile='./testfile';
    open(FH,$infile);
    while(<FH>){
    	if($_ =~ /^\d+\s+(\d+)\s+\-\s+($search)\./) { 
    		next; 
    	}
    	if($_ =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./) { 
    		$id=$1; 
    		print $id; 
    		exit; 
    	}
    }
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    One possible way:

    Code:
    my $infile='./testfile';
    open(FH,$infile);
    while(<FH>){
       if ( /^\d+\s+(\d+)\s+\-\s+($search)\./ ) { 
          my $next_line = <FH>;
          if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
             my $id=$1; 
             print $id; 
             close (FH);
             exit;
         }
    }

    Comment

    • idorjee
      New Member
      • Mar 2007
      • 76

      #3
      Thank you so much Kevin. That works fine when the immediate next line is where the data I want to get is, but how do I do when I have the data I'm interested in is in the 3rd line, as in the one below:
      Code:
      9   1246501        -            AAD12597.1        3282737
      9   1246501        -            AAD12599.1        3282739
      9   1246501    Provisional     NP_047184.1       10954455
      I'm sorry but I should have told you that I'm looking for the data in the 4th column (NP_047184.1) whenever there's the word "Provisiona l", given that I have the data (AAD12597.1) as the first one in same column.

      Cheers!

      Originally posted by KevinADC
      One possible way:

      Code:
      my $infile='./testfile';
      open(FH,$infile);
      while(<FH>){
         if ( /^\d+\s+(\d+)\s+\-\s+($search)\./ ) { 
            my $next_line = <FH>;
            if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
               my $id=$1; 
               print $id; 
               close (FH);
               exit;
           }
      }

      Comment

      • KevinADC
        Recognized Expert Specialist
        • Jan 2007
        • 4092

        #4
        Its difficult to hit a moving target. Are you sure this is the only other requirement that the search term is found and then x lines later the word provisional indicates the line you want to extract some data out of or will the target move again?

        Comment

        • idorjee
          New Member
          • Mar 2007
          • 76

          #5
          Hi Kevin,
          Yes, the word "Provisiona l" could be in the immediate next line when the query term (search term) is found or could be after x lines later. So, it is something that I can't possibly do?
          Help please, if you can. Thanks a lot.

          Originally posted by KevinADC
          Its difficult to hit a moving target. Are you sure this is the only other requirement that the search term is found and then x lines later the word provisional indicates the line you want to extract some data out of or will the target move again?

          Comment

          • nithinpes
            Recognized Expert Contributor
            • Dec 2007
            • 410

            #6
            From your initial code, where you have used exit after printing the desired number, I am assuming you just want the first occurence it and not continue the search further. This code will work even if 'Provisional' line is immediately after your search string's line or 'n' lines after it.

            Code:
            my $infile='testfile.txt';
            my $search='AAD12597.1';
            open(FH,$infile) or die "failed to open:$!";
            while(<FH>){
               if ( /^\d+\s+(\d+)\s+\-\s+($search)/ ) { 
                  my $next_line = <FH>;
                  until ($next_line =~ /^\d+\s+\d+\s+Provisional\s+\S+\./ ) {
                        $next_line = <FH>;
                     }
                if ($next_line =~ /^\d+\s+(\d+)\s+\w+\s+(\S+)\./ ) {
                     my $id=$1; 
                     print "$id\n"; 
                     close (FH);
                     exit;
                 }
            }
            }

            Comment

            • KevinADC
              Recognized Expert Specialist
              • Jan 2007
              • 4092

              #7
              Originally posted by idorjee
              Hi Kevin,
              Yes, the word "Provisiona l" could be in the immediate next line when the query term (search term) is found or could be after x lines later. So, it is something that I can't possibly do?
              Help please, if you can. Thanks a lot.
              It is easily possible, but is there always an occurance of "provisiona l" within the set of lines you are searching? If so, nithinpes's code looks like it will work.

              Comment

              • idorjee
                New Member
                • Mar 2007
                • 76

                #8
                Thank you so much to both of you for your help. My script works fine now.
                Cheers!
                ^ ^*

                Comment

                Working...