How to do a quicker array replace

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chaarmann
    Recognized Expert Contributor
    • Nov 2007
    • 785

    How to do a quicker array replace

    Is there a faster way for replacing cats with dogs in every string of the array?
    Is there a way with a shorter expression, but not necessarily faster?
    Code:
    @temp = map { $_ =~ s/cats/dogs/g; $_ } @temp;
  • Oralloy
    Recognized Expert Contributor
    • Jun 2010
    • 988

    #2
    @chaarmann,

    How about this:

    Code:
    for(@temp){s/cats/dogs/g}
    I'm not sure it's quicker, but it's definitely a few characters smaller.

    Comment

    • chaarmann
      Recognized Expert Contributor
      • Nov 2007
      • 785

      #3
      @Oralloy

      Thank you very much.
      It's not only smaller but also quicker (more than double speed).

      Here is the test result:
      Code:
                 Rate test_map test_for
      test_map 6.25/s       --     -55%
      test_for 14.0/s     125%       --
      Here is the test:
      Code:
      package test;
      
      use warnings; 
      use strict;
      use Benchmark qw(:all);
      
      # main program
      my @array;
      foreach my $number (0..9999) {
      	$array[$number] = "cats...cats......cats";
      }
       
      cmpthese(-1, {
        'test_map' => \&test_map,
        'test_for' => \&test_for,
      });
      
      sub test_map {
      	@array = map { $_ =~ s/cats/dogs/g; $_ } @array;
      	@array = map { $_ =~ s/dogs/cats/g; $_ } @array;
      }
      
      sub test_for {
      	for (@array) { s/cats/dogs/g };
      	for (@array) { s/dogs/cats/g };
      }
      An explanation could be that it makes a deep copy with "map", but not with "for".

      Comment

      • Oralloy
        Recognized Expert Contributor
        • Jun 2010
        • 988

        #4
        @chaarmann,

        I'm not sure that you need to do the reassignment after executing your maps, given that map() exposes the actual element to the substitution. If it does, then what you're effectively doing is overwriting @array, which is already modified, with the output of map, and thus changing the array's spine.

        I'm pretty sure that the times will be a lot closer, if you drop the assignments in test_map.

        Now, the question is, of course, does the alternate map implementation work? Perl always surprises me by where it has side effects (in this case, the string updates).

        Cheers!

        Comment

        • chaarmann
          Recognized Expert Contributor
          • Nov 2007
          • 785

          #5
          @Oralloy,

          I was assuming wrongly that Perl would copy the modified lines of the map-command to a new array, but it's replacing them. Thanks for pointing that out. So I added a third test:
          Code:
          sub test_map_unassigned {
          	map { $_ =~ s/cats/dogs/g; $_ } @array;
          	map { $_ =~ s/dogs/cats/g; $_ } @array;
          }
          The result is:

          Code:
                                Rate         test_map test_map_unassigned         test_for
          test_map            6.00/s               --                -32%             -58%
          test_map_unassigned 8.82/s              47%                  --             -38%
          test_for            14.2/s             136%                 60%               --
          That is strange. I was expecting the second test as quick as the first. A command like "@a=@b" should only copy references (shallow copy), that means the array address itself and not all its elements. This is very quick and should take no measurable time overhead.

          Comment

          • Oralloy
            Recognized Expert Contributor
            • Jun 2010
            • 988

            #6
            @Chaarmann,

            I'm a belt and suspenders guy, so I'd say you should inspect the computed values to make sure that the updates are happening as I believe they're supposed to. I've screwed myself more than once with Perl by assuming (or not assuming) a side effect.

            That said, there are a couple possibilities - first is the garbage collector - is it being invoked during one of the tests? Garbage collection can take a lot of time at random times, so repeating tests is also a good thing.

            Also, the reason map is probably slower than the for loop is that it is building a result list, even though your code is dropping it on the floor (void context). That means that you pay the cost of memory allocation as an unexpected side effect.

            Re-assigning to @array doubles that allocation cost, and suffers the cost of the reference count updates as the old spine of @array is destroyed. That explains the execution time differences, I think.

            I wonder if the release of the spine of an array is O(n) time, or O(n^2). That would explain a lot of the time cost you're seeing for the test.

            BTW, what happens if you test 100,000 times, rather than just 10,000 times?

            Cheers!
            Last edited by Oralloy; Sep 9 '10, 02:35 PM. Reason: change "@test" to "@array"; added additional musing

            Comment

            • chaarmann
              Recognized Expert Contributor
              • Nov 2007
              • 785

              #7
              @Oralloy.
              Thank's, that's a good explanation.
              So as what I understand is that
              1. if I use "for", it doesn't change the array spine (object), only all its string references to the new strings.
              2. When I use "map without assignment", it creates a new array object, assigns to the new strings, and when finished it drops the old array object and assigns the new array object to variable @array.
              3. When I use "map with assignment", it does the same as "map without assignment", then additionally builds a new array (a third object) and loops through all string references again to assign all strings from the second to the new array, then drops the second array object.


              Question: Is Perl not doing any performance optimization when compiling the code?
              By the way:

              I have verified the values with
              Code:
              print "map: array[0]=$array[0]\n";
              in all tests after every array, and it's really changing cats to dogs, and backward later on, in all test. I just commented out these lines before rerunning for performance, and I stripped all comments before listing it here, to make it small.

              I repeated the test a lot of times, I even switched the order of the tests, but always the same results. So garbage collection is no issue (it's a powerful Sun Solaris station with a lot of RAM).

              I tested with 100,000, but then it takes more than 1 second for a sigle run so the benchmark warns me that the result is not reliable:
              Code:
                          (warning: too few iterations for a reliable count)
                          (warning: too few iterations for a reliable count)
                          (warning: too few iterations for a reliable count)
                                  s/iter         test_map test_map_unassigned         test_for
              test_map              1.68               --                -32%             -59%
              test_map_unassigned   1.14              47%                  --             -39%
              test_for             0.695             142%                 64%               --
              Here is the output with 1,000 elements array size:
              Code:
                                    Rate         test_map test_map_unassigned         test_for
              test_map            66.0/s               --                -27%             -54%
              test_map_unassigned 90.7/s              37%                  --             -36%
              test_for             142/s             115%                 57%               --

              Comment

              • Oralloy
                Recognized Expert Contributor
                • Jun 2010
                • 988

                #8
                @chaarman,

                Thanks for rounding out the tests. It is good to see that they scale approximately linearly. I guess my faith in programmers isn't destroyed yet. :)

                That said, I appreciate going through the exercise with you. I learned something about Perl, too. It's always good to sharpen our pencils.

                Thanks and Good Luck!

                Comment

                • KennB
                  New Member
                  • Jan 2013
                  • 2

                  #9
                  @temp = map { $_ =~ s/cats/dogs/g; $_ } @temp;

                  I think that what is happening is that, if a replacement occurs, the value '1' gets placed into @Temp. Otherwise, a null value gets inserted. And for some reason, the original array gets the replacement changes.

                  Romano

                  Comment

                  • KennB
                    New Member
                    • Jan 2013
                    • 2

                    #10
                    update on my reply

                    @temp = map { $_ =~ s/cats/dogs/g; $_ } @temp;

                    I think that what is happening is that, if a substitution occurs, the $Count value for the number of substitutions gets placed into @temp. Otherwise, a null value gets inserted. And for some reason, the original array gets the replacement changes.

                    My solution was to make a copy of the original array and a third copy to receive the $Count values.

                    @Test = ("XXXXX", "NNNNN", "XNNNN", "XXNNN");
                    @Test1 = @Test;
                    @Test2 = map(s/X/x/g, @Test1);

                    for($i = 0; $i <= $#Test; $i++)
                    {
                    print "$i Test = $Test[$i] Test1 = $Test1[$i] Test2 = $Test2[$i]\n";
                    }


                    0 Test = XXXXX Test1 = xxxxx Test2 = 5
                    1 Test = NNNNN Test1 = NNNNN Test2 =
                    2 Test = XNNNN Test1 = xNNNN Test2 = 1
                    3 Test = XXNNN Test1 = xxNNN Test2 = 2

                    @Test is unchanged. @Test1 gets the substitution changes and @Test2 gets the $Count values.
                    One can discard @Test2 or use it to get indices for the strings in which substitutions occurred.

                    Both of these statements give the same result:
                    @Test2 = map(~s/X/x/g, @Test1);
                    @Test2 = map($_ =~s/X/x/g, @Test1);

                    I do not know why the original array values gets changed.

                    Romano

                    Comment

                    Working...