Compare Two csv files using perl

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Vasuki Masilamani
    New Member
    • Dec 2006
    • 18

    Compare Two csv files using perl

    Hi,

    Can any one help me in writing a script in Perl to compare two csv files and pick out the records which show differences?

    Any responses would be appreciated.

    Thanks,
    Vasuki
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    post your current code and someone will probably help.

    Comment

    • Vasuki Masilamani
      New Member
      • Dec 2006
      • 18

      #3
      I tried and got the entire script. It is work fine now. Please find the script below.

      [CODE=perl]
      $f1 = 'C:\Vasuki\chm_ dirx_bud_28.csv ';
      open FILE1, "$f1" or die "Could not open file chm_dirx_bud_28 .csv \n";
      $f2= 'C:\Vasuki\chm_ dirx_bud_29.csv ';
      open FILE2, "$f2" or die "Could not open file chm_dirx_bud_29 .csv \n";

      $outfile = 'C:\Vasuki\chm_ dirx_bud.csv';

      my @outlines;

      foreach (<FILE1>) {
      $y = 0;
      $outer_text = $_;

      seek(FILE2,0,0) ;

      foreach (<FILE2>) {
      $inner_text = $_;

      if($outer_text eq $inner_text) {
      $y = 1;
      print "Match Found \n";
      last;
      }
      }

      if($y != 1) {
      print "No Match Found \n";
      push(@outlines, $outer_text);
      }
      }

      open (OUTFILE, ">$outfile" ) or die "Cannot open $outfile for writing \n";
      print OUTFILE @outlines;
      close OUTFILE;

      close FILE1;
      close FILE2;
      [/CODE]

      This script is running very slow in case of large number of records. Can anyone suggest some ideas to fine tune this script? Thanks in advance.
      Last edited by miller; May 17 '07, 05:54 PM. Reason: Code Tag and ReFormatting

      Comment

      • miller
        Recognized Expert Top Contributor
        • Oct 2006
        • 1086

        #4
        Well, of course it's slow. You're scanning through a large portion of file2 for every line in file1. This means that your your execute time is relative to the square of the size of the files.

        Ignoring your current algorithm for now though, I would suggest that you look into a cpan module to do this for you.

        cpan Text::Diff


        The fact that your files are CSV files is irrelavent for what you're trying to do, so just go back to simply file comparing. I don't know what type of output this module will provide, but I'm almost certainly that it can be adapted in such a way to acheive the results you desire.

        - Miller

        Comment

        • KevinADC
          Recognized Expert Specialist
          • Jan 2007
          • 4092

          #5
          if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.

          Comment

          • AdrianH
            Recognized Expert Top Contributor
            • Feb 2007
            • 1251

            #6
            Originally posted by KevinADC
            if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.
            The easist way is to use something that is already made.

            Try using diff. It is a Unix utility and is designed for this sort of work.

            Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.


            Adrian

            Comment

            • AdrianH
              Recognized Expert Top Contributor
              • Feb 2007
              • 1251

              #7
              Originally posted by AdrianH
              The easist way is to use something that is already made.

              Try using diff. It is a Unix utility and is designed for this sort of work.

              Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.


              Adrian
              Rethinking this, if the key is at begining of the line, you could sort and then use diff.


              Adrian

              Comment

              • KevinADC
                Recognized Expert Specialist
                • Jan 2007
                • 4092

                #8
                Why are you assuming unix? Looks like windows to me.

                $f1 = 'C:\Vasuki\chm_ dirx_bud_28.csv ';

                Comment

                • AdrianH
                  Recognized Expert Top Contributor
                  • Feb 2007
                  • 1251

                  #9
                  Originally posted by KevinADC
                  Why are you assuming unix? Looks like windows to me.

                  $f1 = 'C:\Vasuki\chm_ dirx_bud_28.csv ';
                  I'm not assuming Unix. There are GNU ports of Unix utilities all over the place.


                  Adrian

                  Comment

                  • KevinADC
                    Recognized Expert Specialist
                    • Jan 2007
                    • 4092

                    #10
                    True enough

                    (filler for message too short)

                    Comment

                    • ghostdog74
                      Recognized Expert Contributor
                      • Apr 2006
                      • 511

                      #11
                      you can try memory mapping

                      Comment

                      • ad4x2l
                        New Member
                        • Sep 2007
                        • 1

                        #12
                        csvdiff a GPL Perl Tool

                        Comment

                        Working...