perl string comparison

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lilly07
    New Member
    • Jul 2008
    • 89

    perl string comparison

    Hi,
    I have one column of strings in 1st file file and another file which consists of 5 clumns in each line and my basic objective is to find each item/line of 1st file is available in 3rd column of 2 nd file.
    And I tried the following logic. It might be bit round about way but as a beginner am trying as follows.

    The column in the 1st file is having data as example
    Code:
    NS008_456_R0030_3008
    The 2nd file data is as follows:

    Code:
     
    +   test   NS008_456_R0030_3008   67   223
    My logic is as follows:

    I am opening the 1st file in an array and for each item I am opening the second file and scanning through each line and checking whether the array content is equal to $V[2] of second file. The logic seems to work even though the search is taking time.

    But I considered
    Code:
    NS008_456_R0030_3008
    as a string literal and my if loop is as below:

    Code:
     
    if($rawdata[0] eq $v[2]) {
    do something here
    }
    But it does not seem to work. Anything wrong in considering the data as string literal or when I read the file contents in an array, anymore maniputaion is wrong with string comparison? Please let me know. Regards
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    Originally posted by lilly07
    Hi,
    I have one column of strings in 1st file file and another file which consists of 5 clumns in each line and my basic objective is to find each item/line of 1st file is available in 3rd column of 2 nd file.
    And I tried the following logic. It might be bit round about way but as a beginner am trying as follows.

    The column in the 1st file is having data as example
    Code:
    NS008_456_R0030_3008
    The 2nd file data is as follows:

    Code:
     
    +   test   NS008_456_R0030_3008   67   223
    My logic is as follows:

    I am opening the 1st file in an array and for each item I am opening the second file and scanning through each line and checking whether the array content is equal to $V[2] of second file. The logic seems to work even though the search is taking time.

    But I considered
    Code:
    NS008_456_R0030_3008
    as a string literal and my if loop is as below:

    Code:
     
    if($rawdata[0] eq $v[2]) {
    do something here
    }
    But it does not seem to work. Anything wrong in considering the data as string literal or when I read the file contents in an array, anymore maniputaion is wrong with string comparison? Please let me know. Regards
    Can you please post the rest of your code so that we can see how you go to this point? That will give us a better understanding all around.

    Regards,

    Jeff

    Comment

    • KevinADC
      Recognized Expert Specialist
      • Jan 2007
      • 4092

      #3
      if you compare the strings using "eq" they must be an exact match, including spaces, control chracters, and upper/lower case of any alpha characters. My guess is that you need to chomp() the records in the first file before comparing to records in the second file, but why make us guess? Post the code. ;)

      Comment

      • lilly07
        New Member
        • Jul 2008
        • 89

        #4
        Thx, Kevin I tried chomping the data from the first file but still it is not working. Please find my code as below. It is not complaining about compilation error. Even though I didn't copy, I retyped again.

        Code:
         
        #!usr/bin/perl
        $first_data = "first.txt";
        open(DAT,$first_data) || die("Could not open file!");
        @search_data = <DAT>;
        $searchSize = scalar( @search_data);
        
        $second_file = "second.txt";
        for($count=0, $count < $searchSize; $count++) {
         open (RF, $second_file) || die("Could not open file!");
         
         $find_raw = @search_data[$count];
         $find = chomp $find_raw;
         
          while($line=<RF>) {
           chomp $line;
           @v = split(/\s+/,$line);
           
            if($v[2] eq $find){
             print "$line \n";
            }
          }
         
         close RF;
        }

        Comment

        • lilly07
          New Member
          • Jul 2008
          • 89

          #5
          Actually the program works if I modify the following code
          Code:
          [LIST=1][*]$find_raw = @search_data[$count];[*] $find = chomp $find_raw;
          [/LIST]as below:

          Code:
          $find_raw = @search_data[$count]; 
          chomp $find_raw;
          Is there any tricky way or shorter way for this kind of search as it takes a longer duration. Thanks.

          Comment

          • KevinADC
            Recognized Expert Specialist
            • Jan 2007
            • 4092

            #6
            When you assign the return value of chomp to a scalar it returns the number of times chomp() was succesful. So in your case $find was probably either a 0 or 1.

            This line:

            $find_raw = @search_data[$count];

            should be:

            $find_raw = $search_data[$count];

            using @ for a single array element is long deprecated. Use $ for a single array element and @ for multiple array elelments.

            Comment

            • KevinADC
              Recognized Expert Specialist
              • Jan 2007
              • 4092

              #7
              If neither file is too big you can do something like this:

              Code:
              #!usr/bin/perl
              use strict;
              use warnings;
              
              my $first_data = "first.txt";
              open(DAT,$first_data) or die "Could not open file: $!";
              my @search_data = <DAT>;
              close DAT;
              chomp @search_data;
              
              my $second_file = "second.txt";
              open (RF, $second_file) or die "Could not open file: $!";
              while(my $line = <RF>) {
                 chomp $line;
                 foreach my $find (@search_data) {
                    my $v = (split(/\s+/,$line))[2];
                    if ($v eq $find){
                       print "Found '$find' in second.txt at line number $. : [$line] \n";
                       last;
                    }
                 }
              }
              close RF;

              Comment

              • lilly07
                New Member
                • Jul 2008
                • 89

                #8
                yes Kevin, you are right initially after chomping the value was 1 and hence I overcame that as I did.

                My objective is to find all the possible 1st file columns available in the second file and print them and hence
                Code:
                 last;
                may not work in my case. I just thought that whether I am doing a round about way? Thanks again.
                Cheers

                Comment

                • KevinADC
                  Recognized Expert Specialist
                  • Jan 2007
                  • 4092

                  #9
                  Try the code I posted. "last" ends the "foreach" loop after an element in the array is found in the file. It then goes to the next line in the file and searches the entire array again. Now this entire process could probably be speeded up considerably using a hash and/or the memoize module.

                  Memoize - perldoc.perl.or g

                  Comment

                  • lilly07
                    New Member
                    • Jul 2008
                    • 89

                    #10
                    Hi Kevin, Thanks for your help.

                    Basically my data file (second file looks as follows)

                    Code:
                     
                    +   test   NS008_456_R0030_3008   67   223 
                    +   ghi    NS008_456_R0030_3678   17   678
                    +   ggl    NS008_456_R0030_3678   17   270
                    +   ghi    NS008_456_R0030_3672   17   209
                    +   ghi    NS008_456_R0030_3690   17   280
                    +   ghi    NS008_456_R0030_3690   15   267
                    My objective is to find the records which has multiple enteries on the 3rd column. For example in the above case,
                    Code:
                    +   ghi    NS008_456_R0030_3678   17   678
                    +   ggl    NS008_456_R0030_3678   17   270
                    and
                    Code:
                    +   ghi    NS008_456_R0030_3690   17   280
                    +   ghi    NS008_456_R0030_3690   15   267
                    are the candidate record which I am interested.

                    And my logic is as follows:
                    1. I added the 3rd column and 4th in a hashmap and checked all the values in the hash map. If the value in the hash map is more than 1, then I collect them as a multiple records and store 3rd column
                    Code:
                    NS008_456_R0030_3690
                    in a file ($first_file) Then I search for the records in the second_file as I had explained before. But this is taking enormous amount of time as the file is huge and hence extensive search.
                    Is tehre anyway to pick up from second_file directly. I need the records which shows multiple entries in the 3rd column. Please let me know.
                    I tried your code also and the sript is still executing and hence I thought let me explain you about the whole picture.
                    Thanks.

                    Comment

                    • lilly07
                      New Member
                      • Jul 2008
                      • 89

                      #11
                      I would like to know whether any shell script would do?

                      Comment

                      Working...