Compare Two csv files using perl

**KevinADC** · May 17 '07, 05:04 AM

post your current code and someone will probably help.

**Vasuki Masilamani** · May 17 '07, 06:36 AM

I tried and got the entire script. It is work fine now. Please find the script below.

[CODE=perl]
$f1 = 'C:\Vasuki\chm_ dirx_bud_28.csv ';
open FILE1, "$f1" or die "Could not open file chm_dirx_bud_28 .csv \n";
$f2= 'C:\Vasuki\chm_ dirx_bud_29.csv ';
open FILE2, "$f2" or die "Could not open file chm_dirx_bud_29 .csv \n";

$outfile = 'C:\Vasuki\chm_ dirx_bud.csv';

my @outlines;

foreach (<FILE1>) {
$y = 0;
$outer_text = $_;

seek(FILE2,0,0) ;

foreach (<FILE2>) {
$inner_text = $_;

if($outer_text eq $inner_text) {
$y = 1;
print "Match Found \n";
last;
}
}

if($y != 1) {
print "No Match Found \n";
push(@outlines, $outer_text);
}
}

open (OUTFILE, ">$outfile" ) or die "Cannot open $outfile for writing \n";
print OUTFILE @outlines;
close OUTFILE;

close FILE1;
close FILE2;
[/CODE]

This script is running very slow in case of large number of records. Can anyone suggest some ideas to fine tune this script? Thanks in advance.

**miller** · May 17 '07, 06:05 PM

Well, of course it's slow. You're scanning through a large portion of file2 for every line in file1. This means that your your execute time is relative to the square of the size of the files.

Ignoring your current algorithm for now though, I would suggest that you look into a cpan module to do this for you.

cpan Text::Diff

The fact that your files are CSV files is irrelavent for what you're trying to do, so just go back to simply file comparing. I don't know what type of output this module will provide, but I'm almost certainly that it can be adapted in such a way to acheive the results you desire.

- Miller

**KevinADC** · May 17 '07, 08:44 PM

if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.

**AdrianH** · May 18 '07, 05:54 PM

Originally posted by KevinADC

if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.

The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.

Adrian

**AdrianH** · May 18 '07, 06:00 PM

Originally posted by AdrianH

The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.

Adrian

Rethinking this, if the key is at begining of the line, you could sort and then use diff.

Adrian

**KevinADC** · May 18 '07, 07:00 PM

Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_ dirx_bud_28.csv ';

**AdrianH** · May 18 '07, 08:42 PM

Originally posted by KevinADC

Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_ dirx_bud_28.csv ';

I'm not assuming Unix. There are GNU ports of Unix utilities all over the place.

Adrian

**KevinADC** · May 18 '07, 09:05 PM

True enough

(filler for message too short)

**ghostdog74** · May 20 '07, 12:24 AM

you can try memory mapping

**ad4x2l** · Sep 27 '07, 08:42 AM

csvdiff a GPL Perl Tool

Compare Two csv files using perl

Compare Two csv files using perl

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment