Generating a large random string

**Peter Otten** · Jul 18 '05, 08:37 AM

Re: Generating a large random string

Paul Rubin wrote:
[color=blue]
> Oops, per other post, it gives strings of bytes and needs filtering.
> The following runs in about 1.2 seconds on my machine, but has an
> small (infinitesimal) chance of failure:
>
> import string,array,ti me
> t=time.time()
> ttab = string.letters* 4 + '\0'*48
> a = array.array('B' , open("/dev/urandom").read( 1500000).transl ate(ttab))
> a = array.array('B' , filter(abs,a)). tostring()[:1000000]
> print time.time()-t[/color]

from __future__ import division
import array, random, string, sys

identity = string.maketran s("", "")
ld = 256//len(string.lett ers)
rest = 256 % len(string.lett ers)
ttab = string.letters* ld + '\0'*rest
dtab = identity[-rest:]

# a fully functional variant of your approach
def randstrUnix(len gth, extra=1.25):
a = open("/dev/urandom").read( int(length*extr a)).translate(t tab, dtab)
while len(a) < length:
a += randstrUnix(len gth-len(a), 1.3)
return a[:length]

twoletters = [c+d for c in string.letters for d in string.letters]

# the fastest pure-python version I was able to produce
def randstrPure(len gth):
r = random.random
n = len(twoletters)
l2 = length//2
lst = [None] * l2
for i in xrange(l2):
lst[i] = twoletters[int(r() * n)]
if length & 1:
lst.append(rand om.choice(strin g.letters))
return "".join(lst )

The timings:

$ timeit.py -s"import randchoice as r" "r.randstrUnix( 1000000)"
10 loops, best of 3: 2.29e+05 usec per loop
$ timeit.py -s"import randchoice as r" "r.randstrPure( 1000000)"
10 loops, best of 3: 6.51e+05 usec per loop

A factor of 3 would hardly justify the OS-dependency in most cases.
Note that using twoletters[int(r() * n)] as seen in Sean Ross' version
instead of random.choice(t woletters) doubled the speed.

Peter

**Chris** · Jul 18 '05, 08:37 AM

Re: Generating a large random string

If you're looking to have a string that you can write to in a stream, like a
file, you might try StringIO
[color=blue][color=green][color=darkred]
>>> import random, string
>>> from cString import StringIO[/color][/color][/color]
[color=blue][color=green][color=darkred]
>>> s = StringIO()
>>> for i in xrange(1000000) :[/color][/color][/color]
s.write(random. choice(string.l etters))
[color=blue][color=green][color=darkred]
>>> len(s.getvalue( ))[/color][/color][/color]
1000000

This works fine for strings up to 10 MB, after that you might want to
consider stashing your data to disk and reading/writing in chunks.

Chris

"Andreas Lobinger" <andreas.lobing er@netsurf.de> wrote in message
news:4035DC5F.F BACD4F8@netsurf .de...[color=blue]
> Aloha,
>
> Andreas Lobinger schrieb:[color=green]
> > How to generate (memory and time)-efficient a string containing
> > random characters?[/color]
>[color=green]
> > d = [random.choice(s tring.letters) for x in xrange(3000)]
> > s = "".join(d)[/color]
>
> 1) Sorry for starting a new thread, but there were calls for more spec.
> 2) Thanks for all replies so far.
>
> To be more specific about
> - OS/availability of /dev/random
> Im looking for an elegant solution for all types. At the moment i'm
> developing in parallel on a slow (250MHz) i86/Win95 notebook, a typical
> i86/linux box and a couple of SUNs (up to 16GB main-mem...).
>
> - using random / crypto-safe-strings
> The use of random is intended, because it generates a known sequence.
> I can store the seed an re-generate the sequence any time, and afaik
> the sequence is OS/machinetype independent.
>
> As i wrote in the original post, the random string is only a prerequisite
> for another question.
>
> - What to do with the string
> I use the sting as a testpattern for a tokenizer/parser. So the usage
> of string.letters is only used as an example here.
>
> The main question was:
> How to concat a string without contantly reallocating it.
>
> Wishing a happy day
> LOBI[/color]

**Roger Binns** · Jul 18 '05, 08:38 AM

Re: Generating a large random string

> > How to generate (memory and time)-efficient a string containing[color=blue][color=green]
> > random characters?[/color][/color]

It depends how random you need it to be.

The approach I take in my test harness (which generates a CSV file
with random contents) is to create a 30,000 character string the
old fashioned way:

"".join([random.choice(i tem) for i in range(30000)])

item is a string of which characters to choose from (some fields
are phone numbers, some are names, some are email addresses etc).

To generate a random string I then take random slices of that
30,000 character object. If I need longer strings, I glue
the random slices together (in a cStringIO).

Roger

**Josiah Carlson** · Jul 18 '05, 08:38 AM

Re: Generating a large random string

> This works fine for strings up to 10 MB, after that you might want to[color=blue]
> consider stashing your data to disk and reading/writing in chunks.[/color]

Or he could use mmap. It handles all of that for you, is mutable, and
can be used as a replacement for strings in most places.

- Josiah

**Chris** · Jul 18 '05, 08:38 AM

Re: Generating a large random string

I suppose this would be cheating?
[color=blue][color=green][color=darkred]
>>> import random, sys
>>> class ReallyRandomStr ing:[/color][/color][/color]
"Hey Rocky, watch me pull a random string out of my hat!"
def __init__(self, item):
self.__item = item

def __getitem__(sel f, key):
if type(key) is int:
return random.choice(s elf.__item)
elif type(key) is slice:
return ''.join([random.choice(s elf.__item)
for i in xrange(*key.ind ices(sys.maxint ))])
else:
raise TypeError('Time to get a new hat!')

[color=blue][color=green][color=darkred]
>>> ReallyRandomStr ing('spam')[1000:1060][/color][/color][/color]
'mapsmpspapaaam ppmapammaasapma spmspmpppmpmams mpsaspamaamsssp m'

Chris

"Roger Binns" <rogerb@rogerbi nns.com> wrote in message
news:v99gg1-g7g.ln1@home.ro gerbinns.com...[color=blue][color=green][color=darkred]
> > > How to generate (memory and time)-efficient a string containing
> > > random characters?[/color][/color]
>
> It depends how random you need it to be.
>
> The approach I take in my test harness (which generates a CSV file
> with random contents) is to create a 30,000 character string the
> old fashioned way:
>
> "".join([random.choice(i tem) for i in range(30000)])
>
> item is a string of which characters to choose from (some fields
> are phone numbers, some are names, some are email addresses etc).
>
> To generate a random string I then take random slices of that
> 30,000 character object. If I need longer strings, I glue
> the random slices together (in a cStringIO).
>
> Roger
>
>[/color]

**Chris Herborth** · Jul 18 '05, 08:38 AM

Re: Generating a large random string

Sean Ross wrote:
[color=blue]
> I'd be interested to see how
>
> s = open("/dev/urandom").read( 3000)
>
> compares, and, if better, whether something similar can
> be done on Windows.[/color]

Works on my Windows boxes:

chris@chrish [501]: uname -a
CYGWIN_NT-5.1 chrish 1.5.7(0.109/3/2) 2004-01-30 19:32 i686 unknown unknown
Cygwin
chris@chrish [502]: python
Python 2.3.3 (#1, Dec 30 2003, 08:29:25)
[GCC 3.3.1 (cygming special)] on cygwin
Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
>>> s = open("/dev/urandom").read( 3000)
>>> len(s)[/color][/color][/color]
3000[color=blue][color=green][color=darkred]
>>> s[0][/color][/color][/color]
']'[color=blue][color=green][color=darkred]
>>> s[1][/color][/color][/color]
'W'

;-)

--
Chris Herborth chrish@cryptoca rd.com
Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.

**Chris** · Jul 18 '05, 08:38 AM

Re: Generating a large random string

You're not running Windows, you've been infected with the Cygwin virus, a
fiendish creation by anti-Win32 API hackers that irreparably damages the
proprietary nature of the Windows platform , and worse yet, could
potentially allow applications to be built without Visual Studio, using a
GNU tool chain. Better format your drive immediately before it spreads.

Chris

"Chris Herborth" <chrish@cryptoc ard.com> wrote in message
news:yssZb.1162 9$Cd6.823992@ne ws20.bellglobal .com...[color=blue]
> Sean Ross wrote:
>[color=green]
> > I'd be interested to see how
> >
> > s = open("/dev/urandom").read( 3000)
> >
> > compares, and, if better, whether something similar can
> > be done on Windows.[/color]
>
> Works on my Windows boxes:
>
> chris@chrish [501]: uname -a
> CYGWIN_NT-5.1 chrish 1.5.7(0.109/3/2) 2004-01-30 19:32 i686 unknown[/color]
unknown[color=blue]
> Cygwin
> chris@chrish [502]: python
> Python 2.3.3 (#1, Dec 30 2003, 08:29:25)
> [GCC 3.3.1 (cygming special)] on cygwin
> Type "help", "copyright" , "credits" or "license" for more information.[color=green][color=darkred]
> >>> s = open("/dev/urandom").read( 3000)
> >>> len(s)[/color][/color]
> 3000[color=green][color=darkred]
> >>> s[0][/color][/color]
> ']'[color=green][color=darkred]
> >>> s[1][/color][/color]
> 'W'
>
> ;-)
>
> --
> Chris Herborth chrish@cryptoca rd.com
> Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
> Never send a monster to do the work of an evil scientist.[/color]

Generating a large random string

Comment

Comment

Comment

Comment

Comment

Comment

Comment