Filtering out Duplicate IDs

**KevinADC** · Apr 10 '07, 03:59 AM

what have you tried so far?

**idorjee** · Apr 10 '07, 06:19 PM

this is what i did and it doesn't do anything, just gets the same input file.

Code:

while (<INFILE>) {
	if ($_ =~ /(\S+)\t(.+)/) {
		my $qa = $1;
		my $rest = $2;
		my $lowest = $qa;
		$lowest = $qa if $qa ne $lowest;
		print OUTFILE "$lowest\t$rest\n";
	}
}

thanks

**KevinADC** · Apr 10 '07, 06:41 PM

You need to use a hash to keep track of what you have "seen" so you don't repeat it:

Code:

my %seen = ();
while(<INFILE>){
   if (/^(\S+)\t/) {
     next if ++$seen{$1} > 1;
   }
   print OUTFILE;
}

**idorjee** · Apr 11 '07, 12:15 AM

thanks alot Kevin,
that worked fine.
^ ^*

Filtering out Duplicate IDs

Filtering out Duplicate IDs

Comment

Comment

Comment

Comment