Generating a large random string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Peter Otten

    #16
    Re: Generating a large random string

    Paul Rubin wrote:
    [color=blue]
    > Oops, per other post, it gives strings of bytes and needs filtering.
    > The following runs in about 1.2 seconds on my machine, but has an
    > small (infinitesimal) chance of failure:
    >
    > import string,array,ti me
    > t=time.time()
    > ttab = string.letters* 4 + '\0'*48
    > a = array.array('B' , open("/dev/urandom").read( 1500000).transl ate(ttab))
    > a = array.array('B' , filter(abs,a)). tostring()[:1000000]
    > print time.time()-t[/color]

    from __future__ import division
    import array, random, string, sys

    identity = string.maketran s("", "")
    ld = 256//len(string.lett ers)
    rest = 256 % len(string.lett ers)
    ttab = string.letters* ld + '\0'*rest
    dtab = identity[-rest:]

    # a fully functional variant of your approach
    def randstrUnix(len gth, extra=1.25):
    a = open("/dev/urandom").read( int(length*extr a)).translate(t tab, dtab)
    while len(a) < length:
    a += randstrUnix(len gth-len(a), 1.3)
    return a[:length]

    twoletters = [c+d for c in string.letters for d in string.letters]

    # the fastest pure-python version I was able to produce
    def randstrPure(len gth):
    r = random.random
    n = len(twoletters)
    l2 = length//2
    lst = [None] * l2
    for i in xrange(l2):
    lst[i] = twoletters[int(r() * n)]
    if length & 1:
    lst.append(rand om.choice(strin g.letters))
    return "".join(lst )

    The timings:

    $ timeit.py -s"import randchoice as r" "r.randstrUnix( 1000000)"
    10 loops, best of 3: 2.29e+05 usec per loop
    $ timeit.py -s"import randchoice as r" "r.randstrPure( 1000000)"
    10 loops, best of 3: 6.51e+05 usec per loop

    A factor of 3 would hardly justify the OS-dependency in most cases.
    Note that using twoletters[int(r() * n)] as seen in Sean Ross' version
    instead of random.choice(t woletters) doubled the speed.

    Peter

    Comment

    • Chris

      #17
      Re: Generating a large random string

      If you're looking to have a string that you can write to in a stream, like a
      file, you might try StringIO
      [color=blue][color=green][color=darkred]
      >>> import random, string
      >>> from cString import StringIO[/color][/color][/color]
      [color=blue][color=green][color=darkred]
      >>> s = StringIO()
      >>> for i in xrange(1000000) :[/color][/color][/color]
      s.write(random. choice(string.l etters))
      [color=blue][color=green][color=darkred]
      >>> len(s.getvalue( ))[/color][/color][/color]
      1000000

      This works fine for strings up to 10 MB, after that you might want to
      consider stashing your data to disk and reading/writing in chunks.

      Chris



      "Andreas Lobinger" <andreas.lobing er@netsurf.de> wrote in message
      news:4035DC5F.F BACD4F8@netsurf .de...[color=blue]
      > Aloha,
      >
      > Andreas Lobinger schrieb:[color=green]
      > > How to generate (memory and time)-efficient a string containing
      > > random characters?[/color]
      >[color=green]
      > > d = [random.choice(s tring.letters) for x in xrange(3000)]
      > > s = "".join(d)[/color]
      >
      > 1) Sorry for starting a new thread, but there were calls for more spec.
      > 2) Thanks for all replies so far.
      >
      > To be more specific about
      > - OS/availability of /dev/random
      > Im looking for an elegant solution for all types. At the moment i'm
      > developing in parallel on a slow (250MHz) i86/Win95 notebook, a typical
      > i86/linux box and a couple of SUNs (up to 16GB main-mem...).
      >
      > - using random / crypto-safe-strings
      > The use of random is intended, because it generates a known sequence.
      > I can store the seed an re-generate the sequence any time, and afaik
      > the sequence is OS/machinetype independent.
      >
      > As i wrote in the original post, the random string is only a prerequisite
      > for another question.
      >
      > - What to do with the string
      > I use the sting as a testpattern for a tokenizer/parser. So the usage
      > of string.letters is only used as an example here.
      >
      > The main question was:
      > How to concat a string without contantly reallocating it.
      >
      > Wishing a happy day
      > LOBI[/color]


      Comment

      • Roger Binns

        #18
        Re: Generating a large random string

        > > How to generate (memory and time)-efficient a string containing[color=blue][color=green]
        > > random characters?[/color][/color]

        It depends how random you need it to be.

        The approach I take in my test harness (which generates a CSV file
        with random contents) is to create a 30,000 character string the
        old fashioned way:

        "".join([random.choice(i tem) for i in range(30000)])

        item is a string of which characters to choose from (some fields
        are phone numbers, some are names, some are email addresses etc).

        To generate a random string I then take random slices of that
        30,000 character object. If I need longer strings, I glue
        the random slices together (in a cStringIO).

        Roger


        Comment

        • Josiah Carlson

          #19
          Re: Generating a large random string

          > This works fine for strings up to 10 MB, after that you might want to[color=blue]
          > consider stashing your data to disk and reading/writing in chunks.[/color]

          Or he could use mmap. It handles all of that for you, is mutable, and
          can be used as a replacement for strings in most places.

          - Josiah

          Comment

          • Chris

            #20
            Re: Generating a large random string

            I suppose this would be cheating?
            [color=blue][color=green][color=darkred]
            >>> import random, sys
            >>> class ReallyRandomStr ing:[/color][/color][/color]
            "Hey Rocky, watch me pull a random string out of my hat!"
            def __init__(self, item):
            self.__item = item

            def __getitem__(sel f, key):
            if type(key) is int:
            return random.choice(s elf.__item)
            elif type(key) is slice:
            return ''.join([random.choice(s elf.__item)
            for i in xrange(*key.ind ices(sys.maxint ))])
            else:
            raise TypeError('Time to get a new hat!')

            [color=blue][color=green][color=darkred]
            >>> ReallyRandomStr ing('spam')[1000:1060][/color][/color][/color]
            'mapsmpspapaaam ppmapammaasapma spmspmpppmpmams mpsaspamaamsssp m'

            Chris

            "Roger Binns" <rogerb@rogerbi nns.com> wrote in message
            news:v99gg1-g7g.ln1@home.ro gerbinns.com...[color=blue][color=green][color=darkred]
            > > > How to generate (memory and time)-efficient a string containing
            > > > random characters?[/color][/color]
            >
            > It depends how random you need it to be.
            >
            > The approach I take in my test harness (which generates a CSV file
            > with random contents) is to create a 30,000 character string the
            > old fashioned way:
            >
            > "".join([random.choice(i tem) for i in range(30000)])
            >
            > item is a string of which characters to choose from (some fields
            > are phone numbers, some are names, some are email addresses etc).
            >
            > To generate a random string I then take random slices of that
            > 30,000 character object. If I need longer strings, I glue
            > the random slices together (in a cStringIO).
            >
            > Roger
            >
            >[/color]


            Comment

            • Chris Herborth

              #21
              Re: Generating a large random string

              Sean Ross wrote:
              [color=blue]
              > I'd be interested to see how
              >
              > s = open("/dev/urandom").read( 3000)
              >
              > compares, and, if better, whether something similar can
              > be done on Windows.[/color]

              Works on my Windows boxes:

              chris@chrish [501]: uname -a
              CYGWIN_NT-5.1 chrish 1.5.7(0.109/3/2) 2004-01-30 19:32 i686 unknown unknown
              Cygwin
              chris@chrish [502]: python
              Python 2.3.3 (#1, Dec 30 2003, 08:29:25)
              [GCC 3.3.1 (cygming special)] on cygwin
              Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
              >>> s = open("/dev/urandom").read( 3000)
              >>> len(s)[/color][/color][/color]
              3000[color=blue][color=green][color=darkred]
              >>> s[0][/color][/color][/color]
              ']'[color=blue][color=green][color=darkred]
              >>> s[1][/color][/color][/color]
              'W'

              ;-)

              --
              Chris Herborth chrish@cryptoca rd.com
              Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
              Never send a monster to do the work of an evil scientist.

              Comment

              • Chris

                #22
                Re: Generating a large random string

                You're not running Windows, you've been infected with the Cygwin virus, a
                fiendish creation by anti-Win32 API hackers that irreparably damages the
                proprietary nature of the Windows platform , and worse yet, could
                potentially allow applications to be built without Visual Studio, using a
                GNU tool chain. Better format your drive immediately before it spreads.

                Chris

                "Chris Herborth" <chrish@cryptoc ard.com> wrote in message
                news:yssZb.1162 9$Cd6.823992@ne ws20.bellglobal .com...[color=blue]
                > Sean Ross wrote:
                >[color=green]
                > > I'd be interested to see how
                > >
                > > s = open("/dev/urandom").read( 3000)
                > >
                > > compares, and, if better, whether something similar can
                > > be done on Windows.[/color]
                >
                > Works on my Windows boxes:
                >
                > chris@chrish [501]: uname -a
                > CYGWIN_NT-5.1 chrish 1.5.7(0.109/3/2) 2004-01-30 19:32 i686 unknown[/color]
                unknown[color=blue]
                > Cygwin
                > chris@chrish [502]: python
                > Python 2.3.3 (#1, Dec 30 2003, 08:29:25)
                > [GCC 3.3.1 (cygming special)] on cygwin
                > Type "help", "copyright" , "credits" or "license" for more information.[color=green][color=darkred]
                > >>> s = open("/dev/urandom").read( 3000)
                > >>> len(s)[/color][/color]
                > 3000[color=green][color=darkred]
                > >>> s[0][/color][/color]
                > ']'[color=green][color=darkred]
                > >>> s[1][/color][/color]
                > 'W'
                >
                > ;-)
                >
                > --
                > Chris Herborth chrish@cryptoca rd.com
                > Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
                > Never send a monster to do the work of an evil scientist.[/color]


                Comment

                Working...