sorting of array for duplicacy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mamoon
    New Member
    • Sep 2006
    • 18

    sorting of array for duplicacy

    i need a syntax to sort elements of an array to remove duplicacy if any.
    i tried sort -u to sort a file BUT i need to do this sorting on array.
    plz help me if it could be.
    with regard
  • deep022in
    New Member
    • Sep 2006
    • 23

    #2
    Originally posted by mamoon
    i need a syntax to sort elements of an array to remove duplicacy if any.
    i tried sort -u to sort a file BUT i need to do this sorting on array.
    plz help me if it could be.
    with regard

    hi,

    have you tried using the function sort

    sort will sort the given array and return an sorted array.

    suppose you have got an array named array1 with duplicate data in unsorted manner.

    @array2 = sort(@array1);
    open(fp1,">file .txt") || die "could not open the file for writting";
    foreach my $element (@array1)
    {
    print fp1 $element;
    print fp1 "\n";
    }
    close(fp1);
    #fire the uniq command on the file and redirect the output to a new file
    system("uniq file.txt > file1.txt");
    #dump the vontnet of file in array;
    open(fp2,"file1 .txt");
    @array3=<fp2>;
    close(fp3);

    #array3 contains the uniw sorted data
    #you can even use other logic to picj uniq elements from the array.

    let me know if this approch solves your problem.

    Comment

    • sstouk
      New Member
      • Oct 2006
      • 3

      #3
      my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
      my(@array2);
      my(%hash) = undef;
      foreach (@array1) {$hash{$_}++};
      foreach (sort keys %hash) {push @array2, $_};
      print "\@array1 = @array1\n";
      print "\@array2 = @array2\n";

      Comment

      • homesick123
        New Member
        • Oct 2006
        • 8

        #4
        Originally posted by sstouk
        my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
        my(@array2);
        my(%hash) = undef;
        foreach (@array1) {$hash{$_}++};
        foreach (sort keys %hash) {push @array2, $_};
        print "\@array1 = @array1\n";
        print "\@array2 = @array2\n";
        What should I do if I want the array elements to remain in the order in which they were before sorting.And the duplicacy should not exist.

        Comment

        • mamoon
          New Member
          • Sep 2006
          • 18

          #5
          hi,
          thanks.this script i tried earlier but it is too long.
          i needed a short script, that sort array without involving lots of file handling and redirecting.
          well the alternate script is-

          [system ("sort -u file.txt >file1.txt");]

          the above script will sort file.txt into file1.txt removing duplicacy.BUT the order will change.
          bye

          Comment

          • mamoon
            New Member
            • Sep 2006
            • 18

            #6
            Originally posted by sstouk
            my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
            my(@array2);
            my(%hash) = undef;
            foreach (@array1) {$hash{$_}++};
            foreach (sort keys %hash) {push @array2, $_};
            print "\@array1 = @array1\n";
            print "\@array2 = @array2\n";

            [HTML]hi sstouk,
            thanks alot. i got so many approaches to this problem. well puzzle still persists.
            i am giving input and output both. can u suggest some perl script.
            input [/HTML]
            Code:
            imp_185#0.0063
            imp_184#0.018
            imp_185#0.59
            imp_184#0.59
            amla_33#2.5
            imp_378#2.4
            imp_83#6.9
            output
            Code:
             
            amla_33#2.5
            imp_184#0.018
            imp_185#0.0063
            imp_378#2.4
            imp_83#6.9
            i mean to remove duplicacy on the left hand side of # but according to the values on the right hand side of #.
            waiting eagerly

            Comment

            • miller
              Recognized Expert Top Contributor
              • Oct 2006
              • 1086

              #7
              Code:
              #!/usr/bin/perl
              
              my $inFile = $ARGV[0] or die "no file specified";
              my $outFile = $inFile . '.unique';
              my $dupFile = $inFile . '.dup';
              
              local *INPUT, *OUTPUT, *DUPS;
              
              open(IN, "<$inFile") or die "open $inFile: $!";
              open(OUT, ">$outFile") or die "open >$outFile: $!";
              open(DUP, ">$dupFile") or die "open >$dupFile: $!";
              
              my %beenSeen;
              while (my $line = <IN>) {
              	next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
              	
              	if (! $beenSeen{$1}++) {
              		print OUT $line or die "write $outFile: $!";
              	} else {
              		print DUP $line or die "write $dupFile: $!";
              	}
              }
              
              close(IN) or die "close $inFile: $!";
              close(OUT) or die "close $outFile: $!";
              close(DUP) or die "close $dupFile: $!";
              
              1;
              
              __END__
              The above script takes a file name parameter, and outputs two new files. One file is created with the extension ".unique", the other with the extension ".dup". All unique keys that are found in the source file and routed to the unique output, and alternatively all subsequent duplicates are routed to the dup output. This should be more than enough code for you to figure out how to more accurately match whatever your specifications are.

              <rant>
              Please note that next time you ask a question, it would help if you more accurately stated what your problem is. The subject of this request "sorting of array for duplicacy" was actually not what you wanted. Instead what you desired was "Removing duplicates from a file". Obviously the sort utility in unix almost achieves this, but fixation on this attempted solution clouded your request and actually introduced new problems. Instead, be sure to state exactly what you want next time, and you'll be more likely to get a solution in a timely manner.
              </rant>

              Comment

              • gerardjp
                New Member
                • Nov 2006
                • 1

                #8
                Dear Sstouk,

                I came across your piece of code in this thread. The two lines below help me perfectly to get the highest group id from "/etc/group" after having spunged it with "endpwent() ". However I have not been able to figure out exactly what happens with the arrays and hash hidden in there somewhere ... :)

                foreach (@array1) {$hash{$_}++};
                foreach (sort keys %hash) {push @array2, $_};

                Would you be so kind to explain in a nutshell what these do?

                Thanx a lot!

                Regards,

                Gerard.

                Comment

                • Studentmadhura05
                  New Member
                  • Nov 2006
                  • 25

                  #9
                  Miller,
                  I came across the following code that you posted a few days back. I want to do something similar to what you are doing here. I did not want to copy and use the code blindly. I am trying to understand what you are doing here.
                  I am having trouble understanding the key line:

                  if (! $beenSeen{$1}++ ) {

                  I am kind of confused as to where do you assign any value to the %beenSeen before yuouse it in the if statement? What does {$1} mean in this context?

                  I will really appreciate if you could explain.
                  Thanks
                  M

                  Originally posted by miller
                  Code:
                  #!/usr/bin/perl
                  
                  my $inFile = $ARGV[0] or die "no file specified";
                  my $outFile = $inFile . '.unique';
                  my $dupFile = $inFile . '.dup';
                  
                  local *INPUT, *OUTPUT, *DUPS;
                  
                  open(IN, "<$inFile") or die "open $inFile: $!";
                  open(OUT, ">$outFile") or die "open >$outFile: $!";
                  open(DUP, ">$dupFile") or die "open >$dupFile: $!";
                  
                  my %beenSeen;
                  while (my $line = <IN>) {
                  	next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
                  	
                  	if (! $beenSeen{$1}++) {
                  		print OUT $line or die "write $outFile: $!";
                  	} else {
                  		print DUP $line or die "write $dupFile: $!";
                  	}
                  }
                  
                  close(IN) or die "close $inFile: $!";
                  close(OUT) or die "close $outFile: $!";
                  close(DUP) or die "close $dupFile: $!";
                  
                  1;
                  
                  __END__
                  The above script takes a file name parameter, and outputs two new files. One file is created with the extension ".unique", the other with the extension ".dup". All unique keys that are found in the source file and routed to the unique output, and alternatively all subsequent duplicates are routed to the dup output. This should be more than enough code for you to figure out how to more accurately match whatever your specifications are.

                  <rant>
                  Please note that next time you ask a question, it would help if you more accurately stated what your problem is. The subject of this request "sorting of array for duplicacy" was actually not what you wanted. Instead what you desired was "Removing duplicates from a file". Obviously the sort utility in unix almost achieves this, but fixation on this attempted solution clouded your request and actually introduced new problems. Instead, be sure to state exactly what you want next time, and you'll be more likely to get a solution in a timely manner.
                  </rant>

                  Comment

                  • miller
                    Recognized Expert Top Contributor
                    • Oct 2006
                    • 1086

                    #10
                    Code:
                    my %beenSeen;
                    while (my $line = <IN>) {
                    	# This matches his specific record type, ex: "imp_185#0.0063"
                    	# - It extracts the value he wants to filter by, and assigns that to $1
                    	next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
                    
                    	# The only challenge to understanding this line is to respect the order
                    	# of operations.  The ++ in this statement is a post-incrementer, meaning
                    	# the value is only incremented after all other operations are done.
                    	# Therefore, all values of $1 will return true the first time, but every
                    	# subsequent if, the value will be found in %beenSeen and will return false.
                    	if (! $beenSeen{$1}++) {
                    		print OUT $line or die "write $outFile: $!";
                    	} else {
                    		print DUP $line or die "write $dupFile: $!";
                    	}
                    }

                    Comment

                    Working...