New perl article: Generate Unique Strings of Random Numbers

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    New perl article: Generate Unique Strings of Random Numbers

    Having too much time on my hands recently, I wrote this article. I didn't put a whole lot of effort into it but hopefully its not too terrible. Any comments or additions/corrections are welcome.

    <Begin article>

    A code snippet that will generate n numbers of strings of n length. No string will repeat a number and no string will be repeated. The code produces a list of strings of numbers something like this:

    49-3-13-2-12
    38-22-3-36-44
    22-32-46-30-7
    22-39-18-30-47


    See below for a discussion of the code.

    Code:
    use warnings;
    use strict;
    
    my $length = 5; # the amount of numbers per string
    my $max = 1000; # the number of strings
    my @strings; # array to store the strings
    for (1..$max) {
       push @strings, rand_nums($length);
    }
    print "$_\n" for @strings;
    
    {
       my %cache; # create a "static" variable
       sub rand_nums {
          my %local_cache; # create a new hash
          my ($length) = @_; # the length of the random number string passed to the sub
          $local_cache{int(rand(49))+1} = 1; # get the first number 
          for (2..$length) {# start at 2 because we already have the first number
             my $num = int(rand(49))+1; # get a new random number 
             redo if exists $local_cache{$num}; # redo the loop if the number already exists
             $local_cache{$num} = 1; # store the number in the hash 
          }
          my $key = join '-', keys %local_cache; # make a hash key by serializing (converting to a string) the numbers 
          rand_nums($length) if exists $cache{$key}; # redo the function if the key already exists
          $cache{$key}=1; # store the new key in the hash (%cache)
          return $key;
       }
    }
    The reason I posted this snippet is because there is a lot going on and the same logic can be applied to a wide variety of requirements. While there are no indepth explanations the discussion touches on a few important subjects:
    • recursive function
    • serialization
    • static variable
    • using hashes to create unique results


    Whenever you think unique with perl a hash is invariably going to be used. This is because the keys of a hash must be unique. I used two hashes, one to keep track of each string of numbers (%local_cache) to insure there are no repeated numbers in any one string, and another one to insure that all the strings are unique (%cache).

    %cache is wrapped in a block {} around the function (rand_nums) to create a perlish version of a C static variable. Declaring it with "my" inside the block makes it lexically scoped to the function but hides it from the rest of the code/file. Since the code snippet above is not doing anything else this is not really necessary, but its an example of how you can create static variables with perl. Perl 5.10 also has a new feature that allows you to create static variables called state.

    %local_cache is declared with "my" inside the function because we want to declare a new hash each time the function is called. This insures that each string of numbers has no repeated values. %cache insures that there are no repeated strings of numbers.

    $length and $max could easily be used to accept some input to the script making those variables dynamic each time the script is run

    Inside the rand_nums() function you see a call back to the function:

    rand_nums($leng th) if exists $cache{$key};

    This just tells perl to run the function again if the string already exists. The function runs again, produces another string of numbers and checks again if it already exists or not. This is a simple example of a recursive perl function. If the string of numbers does not already exist the code continues to run and eventually returns a string back to be added to the array @strings.

    Backing up a few lines in the snippet, you will see this line:

    $local_cache{in t(rand(49))+1} = 1;

    This is an example of how perl automatically converts an expression into a string when you put the expression inside the curly brackets when defining a hash key. Perl will convert the expression into a string that equals whatever random number the expression returns. See the rand function link below to understand how rand works.

    Another line of the code that is of interest is this one:

    my $key = join '-', keys %local_cache;

    It joins the keys of the %local_cache hash to make a string to be used as a hash key in the %cache hash. Joining a list of strings together like this to create a single string is called serialization. Its important to note that the string when serialized does not put the numbers in the order that the code generated them. Thats because hashes are unordered lists. If you needed the strings to be in the same order the code generates the numbers you would need to use an ordered list, an array, to temporarily store the numbers before serializing them into a hash key.

    I think the comments in the code snippet help describe what is happening in the code but if anyone has questions or comments please post them.

    A list of pragmas and perl builtin functions used in the code:

    Perl functions :

    Pragmas :
    • strict - Perl pragma to restrict unsafe constructs
    • warnings - Perl pragma to control optional warnings


    <End Article>
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    The code needs a little work as I now realize there is a problem with serializing the numbers to create a key from the keys of hash %local_hash. Its a subtle problem that took me a little while to even see. To create truely random strings of numbers you can't join the keys of a hash, even if the numbers were generated using rand(). Because while the order of hash keys is unordered (or has no guaranteed order) they are not random. So when you build a hash and join the keys, there is a repetition in the order of the keys which causes the joined string of numbers to no longer be random but to follow a pattern. While no two strings in the end are repeated, the probability of numbers occuring in any position of the joined strings favors a pattern: the pattern of the order of the hash keys. On my system the number 8 has the highest probability to be in the first position, and the number 5 will never occur in the first position and has the highest probability of occuring in the last position.

    Comment

    • KevinADC
      Recognized Expert Specialist
      • Jan 2007
      • 4092

      #3
      Modified code and article adjusted accordingly:

      <begin article>

      The reason I posted this snippet is because there is a lot going on in a short amount of code and the same logic can be applied to a wide variety of requirements. While there are no indepth explanations the discussion touches on a few important subjects:
      • recursive function
      • serialization
      • static variable
      • using hashes to create unique results

      Code:
      use warnings;
      use strict;
      
      my $length = 7; # the amount of numbers per string
      my $max = 1000; # the number of strings
      my @strings; # array to store the strings
      for (1..$max) {
         push @strings, rand_nums($length);
      }
      print "$_\n" for @strings;
      {
         my %cache; # create a "static" variable
         sub rand_nums {
            my %local_cache; # create a new hash
            my ($length) = @_; # the length of the random number string passed to the sub
            my $serial = int(rand(49))+1; # get the first number for the serialized string
            $local_cache{$serial} = 1; # store the first number in the hash
            for (2..$length) {# start at 2 because we already have the first number
               my $num = int(rand(49))+1; # get a new random number
               redo if exists $local_cache{$num}; # redo the loop if the number already exists
               $local_cache{$num} = 1; # store the number in the hash
               $serial .= "-$num"; # append to the serialized string
            }
            rand_nums($length) if exists $cache{$serial}; # redo the function if the key already exists
            $cache{$serial} = 1; # store the new key in the hash (%cache)
            return $serial;
         }
      }


      Whenever you think unique with perl a hash is invariably going to be used. This is because the keys of a hash must be unique. I used two hashes, one to keep track of each string of numbers (%local_cache) to insure there are no repeated numbers in any one string, and another one to insure that all the strings are unique (%cache).

      %cache is wrapped in a block {} around the function rand_nums() to create a perlish version of a C static variable. Declaring it with "my" inside the block but outside the function makes it lexically scoped to the function but hides it from the rest of the code/file. If we declared the hash inside the function block it would not work as each time the function runs a new hash would be created.

      Since the code snippet above is not doing anything else this is not really necessary, but its an example of how you can create static variables with perl. Perl 5.10 also has a new feature that allows you to create static variables called state.

      %local_cache is declared with "my" inside the function because we want to declare a new hash each time the function is called. This insures that each string of numbers has no repeated values. %cache insures that there are no repeated strings of numbers.

      $length and $max could easily be used to accept some input to the script making those variables dynamic each time the script is run

      Inside the rand_nums() function you see a call back to the function:

      rand_nums($leng th) if exists $cache{$serial} ;

      This just tells perl to run the function again if the string already exists. The function runs again, produces another string of numbers and checks again if it already exists or not. This is a simple example of a recursive perl function. If the string of numbers does not already exist the code continues to run and eventually returns a string back to be added to the array @strings.

      I mentioned serialization at the beginning. Serialization is the joining together of arbitrary strings to make a single string. $serial will be the serialized string, which is the joined list of random number. It starts at this line:

      my $serial = int(rand(49))+1 ; # get the first number for the serialized string

      The string is serialized by using the concatenation operator together with the assignment operator:

      $serial .= "-$num";

      We end up with a string something like this:

      3-22-45-9-11-33-48

      That string is used as the hash key to make sure there are no duplicated strings in the final list. The reason we don’t join the keys of %local_cache to make the serialized string is because while hashes have no guaranteed order, they don’t have a random order. The resultant strings of random numbers created by joining the hash keys together would no longer be truly random but would favor the probability of numbers occurring in certain positions of the strings. You will end up with strings that have the same numbers in the same positions over and over even though no two strings will be duplicated.

      I think the comments in the code snippet help describe what is happening in the code but if anyone has questions or comments please post them.

      A list of pragmas and perl builtin functions used in the code:

      Perl functions :

      Pragmas :
      • strict - Perl pragma to restrict unsafe constructs
      • warnings - Perl pragma to control optional warnings


      <end article>

      Comment

      • KevinADC
        Recognized Expert Specialist
        • Jan 2007
        • 4092

        #4
        After some rewrtting I posted the article in the perl insights forum. If anyone cares to comment, see the article:

        Comment

        Working...