sorting of array for duplicacy

**deep022in** · Oct 3 '06, 06:33 AM

Originally posted by mamoon

i need a syntax to sort elements of an array to remove duplicacy if any.
i tried sort -u to sort a file BUT i need to do this sorting on array.
plz help me if it could be.
with regard

hi,

have you tried using the function sort

sort will sort the given array and return an sorted array.

suppose you have got an array named array1 with duplicate data in unsorted manner.

@array2 = sort(@array1);
open(fp1,">file .txt") || die "could not open the file for writting";
foreach my $element (@array1)
{
print fp1 $element;
print fp1 "\n";
}
close(fp1);
#fire the uniq command on the file and redirect the output to a new file
system("uniq file.txt > file1.txt");
#dump the vontnet of file in array;
open(fp2,"file1 .txt");
@array3=<fp2>;
close(fp3);

#array3 contains the uniw sorted data
#you can even use other logic to picj uniq elements from the array.

let me know if this approch solves your problem.

**sstouk** · Oct 4 '06, 03:23 PM

my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
my(@array2);
my(%hash) = undef;
foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};
print "\@array1 = @array1\n";
print "\@array2 = @array2\n";

**homesick123** · Oct 12 '06, 06:02 AM

Originally posted by sstouk

my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
my(@array2);
my(%hash) = undef;
foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};
print "\@array1 = @array1\n";
print "\@array2 = @array2\n";

What should I do if I want the array elements to remain in the order in which they were before sorting.And the duplicacy should not exist.

**mamoon** · Oct 13 '06, 10:03 AM

hi,
thanks.this script i tried earlier but it is too long.
i needed a short script, that sort array without involving lots of file handling and redirecting.
well the alternate script is-

[system ("sort -u file.txt >file1.txt");]

the above script will sort file.txt into file1.txt removing duplicacy.BUT the order will change.
bye

**mamoon** · Oct 13 '06, 10:26 AM

Originally posted by sstouk

my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
my(@array2);
my(%hash) = undef;
foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};
print "\@array1 = @array1\n";
print "\@array2 = @array2\n";

[HTML]hi sstouk,
thanks alot. i got so many approaches to this problem. well puzzle still persists.
i am giving input and output both. can u suggest some perl script.
input [/HTML]

Code:

imp_185#0.0063
imp_184#0.018
imp_185#0.59
imp_184#0.59
amla_33#2.5
imp_378#2.4
imp_83#6.9

output

Code:

 
amla_33#2.5
imp_184#0.018
imp_185#0.0063
imp_378#2.4
imp_83#6.9

i mean to remove duplicacy on the left hand side of # but according to the values on the right hand side of #.
waiting eagerly

**miller** · Oct 16 '06, 08:22 PM

Code:

#!/usr/bin/perl

my $inFile = $ARGV[0] or die "no file specified";
my $outFile = $inFile . '.unique';
my $dupFile = $inFile . '.dup';

local *INPUT, *OUTPUT, *DUPS;

open(IN, "<$inFile") or die "open $inFile: $!";
open(OUT, ">$outFile") or die "open >$outFile: $!";
open(DUP, ">$dupFile") or die "open >$dupFile: $!";

my %beenSeen;
while (my $line = <IN>) {
	next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
	
	if (! $beenSeen{$1}++) {
		print OUT $line or die "write $outFile: $!";
	} else {
		print DUP $line or die "write $dupFile: $!";
	}
}

close(IN) or die "close $inFile: $!";
close(OUT) or die "close $outFile: $!";
close(DUP) or die "close $dupFile: $!";

1;

__END__

The above script takes a file name parameter, and outputs two new files. One file is created with the extension ".unique", the other with the extension ".dup". All unique keys that are found in the source file and routed to the unique output, and alternatively all subsequent duplicates are routed to the dup output. This should be more than enough code for you to figure out how to more accurately match whatever your specifications are.

<rant>
Please note that next time you ask a question, it would help if you more accurately stated what your problem is. The subject of this request "sorting of array for duplicacy" was actually not what you wanted. Instead what you desired was "Removing duplicates from a file". Obviously the sort utility in unix almost achieves this, but fixation on this attempted solution clouded your request and actually introduced new problems. Instead, be sure to state exactly what you want next time, and you'll be more likely to get a solution in a timely manner.
</rant>

**gerardjp** · Dec 1 '06, 05:26 PM

Dear Sstouk,

I came across your piece of code in this thread. The two lines below help me perfectly to get the highest group id from "/etc/group" after having spunged it with "endpwent() ". However I have not been able to figure out exactly what happens with the arrays and hash hidden in there somewhere ... :)

foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};

Would you be so kind to explain in a nutshell what these do?

Thanx a lot!

Regards,

Gerard.

**Studentmadhura05** · Dec 6 '06, 05:05 PM

Miller,
I came across the following code that you posted a few days back. I want to do something similar to what you are doing here. I did not want to copy and use the code blindly. I am trying to understand what you are doing here.
I am having trouble understanding the key line:

if (! $beenSeen{$1}++ ) {

I am kind of confused as to where do you assign any value to the %beenSeen before yuouse it in the if statement? What does {$1} mean in this context?

I will really appreciate if you could explain.
Thanks
M

Originally posted by miller

Code:

#!/usr/bin/perl

my $inFile = $ARGV[0] or die "no file specified";
my $outFile = $inFile . '.unique';
my $dupFile = $inFile . '.dup';

local *INPUT, *OUTPUT, *DUPS;

open(IN, "<$inFile") or die "open $inFile: $!";
open(OUT, ">$outFile") or die "open >$outFile: $!";
open(DUP, ">$dupFile") or die "open >$dupFile: $!";

my %beenSeen;
while (my $line = <IN>) {
	next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
	
	if (! $beenSeen{$1}++) {
		print OUT $line or die "write $outFile: $!";
	} else {
		print DUP $line or die "write $dupFile: $!";
	}
}

close(IN) or die "close $inFile: $!";
close(OUT) or die "close $outFile: $!";
close(DUP) or die "close $dupFile: $!";

1;

__END__

The above script takes a file name parameter, and outputs two new files. One file is created with the extension ".unique", the other with the extension ".dup". All unique keys that are found in the source file and routed to the unique output, and alternatively all subsequent duplicates are routed to the dup output. This should be more than enough code for you to figure out how to more accurately match whatever your specifications are.

<rant>
Please note that next time you ask a question, it would help if you more accurately stated what your problem is. The subject of this request "sorting of array for duplicacy" was actually not what you wanted. Instead what you desired was "Removing duplicates from a file". Obviously the sort utility in unix almost achieves this, but fixation on this attempted solution clouded your request and actually introduced new problems. Instead, be sure to state exactly what you want next time, and you'll be more likely to get a solution in a timely manner.
</rant>

**miller** · Dec 7 '06, 06:38 PM

Code:

my %beenSeen;
while (my $line = <IN>) {
	# This matches his specific record type, ex: "imp_185#0.0063"
	# - It extracts the value he wants to filter by, and assigns that to $1
	next unless $line =~ m{(.*)#.*}; # Skip Empty Lines

	# The only challenge to understanding this line is to respect the order
	# of operations.  The ++ in this statement is a post-incrementer, meaning
	# the value is only incremented after all other operations are done.
	# Therefore, all values of $1 will return true the first time, but every
	# subsequent if, the value will be found in %beenSeen and will return false.
	if (! $beenSeen{$1}++) {
		print OUT $line or die "write $outFile: $!";
	} else {
		print DUP $line or die "write $dupFile: $!";
	}
}

sorting of array for duplicacy

sorting of array for duplicacy

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment