SHA-based subclass for random module

**Raymond Hettinger** · Jul 18 '05, 09:30 AM

Re: SHA-based subclass for random module

[Paul Rubin][color=blue]
> This is intended to be less predicable/have fewer correlations than
> the default Mersenne Twister or Wichmann-Hill generators. Comments
> are appreciated.[/color]

Because SHA-1 is a digest function, additional input can be tacked on
without impacting invertibility. That gives you an opportunity to
incorporate the Mersenne Twister to take advantage of its provably
long period.

Here is a rough cut that extracts a 53 bit float on every pass and
leaves the remaining 107 bits of state hidden:

import struct
import sha
import random

def srandom(seed):
gen = sha.new(seed)
digest = gen.digest
update = gen.update
random.seed(see d)
mt = random.random
unpack = struct.unpack
tofloat = 1.0 / 2 ** 53
while 1:
state = digest()
randint = unpack('Q', state[:8])[0] >> 11 # grab 53 bits
yield randint * tofloat
update(state)
update(str(mt() )) # twister never hurts, but can help period

r = srandom('the quick brown fox jumped')
for i in xrange(10):
print r.next()

Raymond Hettinger

**Paul Rubin** · Jul 18 '05, 09:31 AM

Re: SHA-based subclass for random module

python@rcn.com (Raymond Hettinger) writes:[color=blue]
> Because SHA-1 is a digest function, additional input can be tacked on
> without impacting invertibility. That gives you an opportunity to
> incorporate the Mersenne Twister to take advantage of its provably
> long period.[/color]

I don't see any point to including Mersenne Twister output in the SHA
update. If you want to make sure of a long period, it's easier to
just mix in a simple counter. Note that the generator I posted should
have period about 2**160 since in order to repeat, two different SHA
states would have to repeat simultaneously.

**Raymond Hettinger** · Jul 18 '05, 09:33 AM

Re: SHA-based subclass for random module

[Paul Rubin][color=blue]
> This is intended to be less predicable/have fewer correlations than
> the default Mersenne Twister or Wichmann-Hill generators. Comments
> are appreciated.[/color]

I offer this as an alternative:

from random import Random
from struct import unpack
import md5

class MD5Random(Rando m):
def newrandom(self, tofloat = 1.0 / 2 ** 53):
plaintxt = str(Random.rand om(self))
ciphertxt = md5.new(plaintx t).digest()
randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
return randint * tofloat

Advantages over the original:

* Much faster

* Much shorter code

* More readable code

* Threadsafe

* Takes advantage of MT's proven equidistributio n, of its passing
known tests for randomness, and of its known period (in contrast,
SHA-1 was designed as a digest that makes it computationally
intractable to find a different message giving the same signature --
that does *not* imply the non-existence of short cycles for some
keys).

* It uses both MD5 and the Mersenne Twister as they were designed (in
contrast, my google searches show *no* research on using SHA-1 in OFB
and reapplying SHA-1 again to conceal the output).

Raymond Hettinger

**Raymond Hettinger** · Jul 18 '05, 09:33 AM

Re: SHA-based subclass for random module

Correction to the last posting.
The method name should have been random() instead of newrandom().

Raymond

**Paul Rubin** · Jul 18 '05, 09:33 AM

Re: SHA-based subclass for random module

python@rcn.com (Raymond Hettinger) writes:[color=blue]
> I offer this as an alternative:
>
> from random import Random
> from struct import unpack
> import md5[/color]

Why md5 instead of sha?
[color=blue]
> class MD5Random(Rando m):
> def newrandom(self, tofloat = 1.0 / 2 ** 53):
> plaintxt = str(Random.rand om(self))[/color]

What's Random.random(s elf) supposed to do?
[color=blue]
> ciphertxt = md5.new(plaintx t).digest()[/color]

I think you mean update.
[color=blue]
> randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
> return randint * tofloat
>
> Advantages over the original:
>
> * Much faster[/color]

Yes, other version wasn't optimized. Both call the hash function
approx once per output though.
[color=blue]
> * Much shorter code[/color]

But doesn't support all the operations specified in the Random API.
What's up with that?
[color=blue]
> * More readable code[/color]

Perhaps so. It took me a little while to figure out in the previous
version that the security came from revealing only part of the output
of each hash (107 secret bits for SHA). In the md5 version there's
only 75 secret bits, which is cutting things a little close. It may
be harder to do a concrete security proof for this construction too
(I'm not sure, I've never done one).
[color=blue]
> * Threadsafe[/color]

Always a plus.
[color=blue]
> * Takes advantage of MT's proven equidistributio n, of its passing
> known tests for randomness, and of its known period (in contrast,
> SHA-1 was designed as a digest that makes it computationally
> intractable to find a different message giving the same signature --
> that does *not* imply the non-existence of short cycles for some
> keys).[/color]

I'm missing something here--if you're just trying to avoid possible
cycles, why do you want to use MT instead of a sequential counter?
The counter is guaranteed to never cycle (unless you overflow the 512
bit compression function input, which won't happen in the life of the
universe).

Note that while there's probably some short cycles in the double-SHA
construction, the chance of finding one is infinitesimally small (it
means finding a 320 bit birthday collision). Hitting one by accident
is comparable to guessing an AES key by accident. Finding one by
brute force or by cryptanalysis is again comparable to breaking AES.
[color=blue]
> * It uses both MD5 and the Mersenne Twister as they were designed (in
> contrast, my google searches show *no* research on using SHA-1 in OFB
> and reapplying SHA-1 again to conceal the output).[/color]

Applying a collision resistant hash function to itself is a standard
construction for making a PRF and is the basis of HMAC. I believe
the relevant paper is

Bellare, M., Canetti, R., and Krawczyk, H., "Pseudorand om Functions
Revisited: The Cascade Construction". Proc. of the 37th IEEE Symp. on
Foundation of Computer Science, 1996, webbed (PostScript) at:

http://www.research.ibm.com/security/bck1.ps

but I haven't slogged all the way through it yet. It has a concrete
security proof and I've always found those things incredibly fussy and
pedantic compared to normal math papers. I haven't yet figured out
the reason why they're like that, or if they're really need to be.
I'd like to attempt to do one for this generator though, once we
settle on an algorithm.

See also the other references to RFC 2104, notably Krawczyk et al's
original HMAC paper from Crypto 96.

My guess is the reason you haven't found any publications specific to
SHA1 with this construction is that nobody has found any results
against SHA1. As far as anyone can tell, the output is
undistinguishab le from random.

**Jeff Epler** · Jul 18 '05, 09:34 AM

Re: SHA-based subclass for random module

Since the size of plaintext is only 2^53, can't I just calculate
all 2^53 md5 values in advance, and invert the output of MD5Random to
get MT outputs, then attack MT just like any LFSR?

A time-space tradeoff could be made between the number of precalculated
hashes and the number of outputs from the MD5Random PRNG needed, too.
A quick search on the internet indicates that about 2n bits are needed
to find the initial state of an n-bit LSFR[1], so something like 40000 bits
would be needed to find the internal state of MT, or 380 53-bit outputs.

So let's say you can precalculate 2^30 MD5 inverses, the chance of any
particular output being in the table is 2^-23. So you'd need something
like 2^31 outputs from MD5Random(). I'm sure the analysis is also
complicated by the fact that your 2n bits won't be consecutive outputs,
but will be widely separated, but it's likely to remain possible.

2^53 is a big number, but it's well less than the 2^128 range for
all MD5 signatures or the 2^64 signatures needed to find a collision.
2^30 storage and 2^31 calls to random is something many of us could at
work, if not at home.

IANAC,
Jeff
[1] http://www.ciphersbyritter.com/RES/COMBCORR.HTM
"(The Berlekamp-Massey algorithm will recover the unknown state of a
simple n-bit LFSR, and its feedback polynomial, with just 2n known
bits.)"

**Josiah Carlson** · Jul 18 '05, 09:34 AM

Re: SHA-based subclass for random module

> What's Random.random(s elf) supposed to do?

Generate a random number using the Mersenne Twister random number generator.
[color=blue][color=green]
>> ciphertxt = md5.new(plaintx t).digest()[/color]
>
> I think you mean update.[/color]

Perhaps yes, perhaps no. Certainly the digest of both are dependant on
two inputs (the current internal state of MD5 and the random number
generated by MT). However, unless you can store the series of updates
to MD5, then getstate() followed by setstate() would not be sufficient
to get the same series of random numbers. This is also a "possible
issue" shared by the double-sha algorithm.
[color=blue]
> But doesn't support all the operations specified in the Random API.
> What's up with that?[/color]

That is what the subclass was for:[color=blue]
> class MD5Random(Rando m):[/color]

- Josiah

**Paul Rubin** · Jul 18 '05, 09:34 AM

Re: SHA-based subclass for random module

Josiah Carlson <jcarlson@nospa m.uci.edu> writes:[color=blue][color=green]
> > What's Random.random(s elf) supposed to do?[/color]
>
> Generate a random number using the Mersenne Twister random number
> generator.[/color]

According to the Random doc, Random.random() doesn't take an argument,
so I'm asking what the "self" is for.
[color=blue][color=green][color=darkred]
> >> ciphertxt = md5.new(plaintx t).digest()[/color]
> > I think you mean update.[/color]
>
> Perhaps yes, perhaps no. Certainly the digest of both are dependant
> on two inputs (the current internal state of MD5 and the random number
> generated by MT). However, unless you can store the series of updates
> to MD5, then getstate() followed by setstate() would not be sufficient
> to get the same series of random numbers. This is also a "possible
> issue" shared by the double-sha algorithm.[/color]

Without the update, all that's happening is you're reading an MT output
and hashing it, so there's no secret state without some security assumption
on MT. And the only security assumption I'm willing to make about MT
is that it's completely insecure ;-).
[color=blue][color=green]
> > But doesn't support all the operations specified in the Random API.
> > What's up with that?[/color]
>
> That is what the subclass was for:[color=green]
> > class MD5Random(Rando m):[/color][/color]

But the methods in the superclass don't do anything to manage the state
of subclass instances.

**Josiah Carlson** · Jul 18 '05, 09:35 AM

Re: SHA-based subclass for random module

>>>What's Random.random(s elf) supposed to do?[color=blue][color=green]
>>
>>Generate a random number using the Mersenne Twister random number
>>generator.[/color]
>
> According to the Random doc, Random.random() doesn't take an argument,
> so I'm asking what the "self" is for.[/color]
[color=blue][color=green][color=darkred]
>>>But doesn't support all the operations specified in the Random API.
>>>What's up with that?[/color]
>>
>>That is what the subclass was for:[color=darkred]
>> > class MD5Random(Rando m):[/color][/color]
>
> But the methods in the superclass don't do anything to manage the state
> of subclass instances.[/color]

Perhaps you should spend some more time learning how Python classes work.

Random is a class.
Random.random is an unbound class method.
Random.random(s elf) calls the unbound class method 'Random.random' with
the first argument being the MD5Random instance.

You should satisfy yourself that the below is correct behavior for
Python classes.
[color=blue][color=green][color=darkred]
>>> class A:[/color][/color][/color]
.... def blah(self):
.... print self.__class__
....[color=blue][color=green][color=darkred]
>>> class B(A):[/color][/color][/color]
.... def blah(self):
.... print "I'm in B, calling A"
.... A.blah(self)
....[color=blue][color=green][color=darkred]
>>> a = A()
>>> a.blah()[/color][/color][/color]
__main__.A[color=blue][color=green][color=darkred]
>>> b = B()
>>> b.blah()[/color][/color][/color]
I'm in B, calling A
__main__.B

Can you figure out why the following is also correct?
[color=blue][color=green][color=darkred]
>>> class C:[/color][/color][/color]
.... def blah(self):
.... print "I'm in C, calling A"
.... A.blah(self)
....[color=blue][color=green][color=darkred]
>>> c = C()
>>> c.blah()[/color][/color][/color]
I'm in C, calling A
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 4, in blah
TypeError: unbound method blah() must be called with A instance as first
argument (got C instance instead)

Perhaps now you understand why he can use Random.random(s elf) in the
earlier version, and Random.getrandb its(self, 128) in the current version.

- Josiah

**Raymond Hettinger** · Jul 18 '05, 09:35 AM

Re: SHA-based subclass for random module

> Why md5 instead of sha?

You could go either way. Make the decision based on how many bits you
need (128 should be plenty) for generating a single 53-bit float and
then take into account speed considerations (MD5 vs SHA-1) and the
time to generate the bits going into the message digest (it takes
longer to generate 160 bits than 128).

[color=blue]
> What's Random.random(s elf) supposed to do?[/color]

It's called subclassing ;-)

In this case, it inherits all the other random functionality (being
sure to include my corrected method name from newrandom() to
random()).

[color=blue][color=green]
> > ciphertxt = md5.new(plaintx t).digest()[/color]
>
> I think you mean update.[/color]

Nope, I meant digest(), but it could be done with update as well.

[color=blue]
> But doesn't support all the operations specified in the Random API.
> What's up with that?[/color]

Sure it does. That is what the subclassing is for. Again be sure to
incorporate the two submitted revisions resulting in:

from random import Random
from struct import unpack
import md5

class MD5Random(Rando m):
def random(self, tofloat = 1.0 / 2 ** 53):
plaintxt = str(Random.getr andbits(self, 128))
ciphertxt = md5.new(plaintx t).digest()
randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
return randint * tofloat

[color=blue][color=green]
> > * Takes advantage of MT's proven equidistributio n, of its passing
> > known tests for randomness, and of its known period (in contrast,
> > SHA-1 was designed as a digest that makes it computationally
> > intractable to find a different message giving the same signature --
> > that does *not* imply the non-existence of short cycles for some
> > keys).[/color]
>
> I'm missing something here--if you're just trying to avoid possible
> cycles, why do you want to use MT instead of a sequential counter?[/color]

Either my version or your original requires something that generates a
sequence of randoms from a seed (mine does it with MT and yours with
s0=sha1(s0)). The advantages of MT are proven properties, and it can
more efficiently generate an arbitrary number of bits (thanks to
genrandbits()) without the messy pure python code for extracting a
byte at a time. It was also nice to not have to override jumpahead(),
getstate(), setstate(), etc.

For the base generator, MT or s0-sha1(s0), no cryptography is
necessary; you're getting that from the second application of sha1 or
md5.

If you use sha1 for the base generator, you gain cryptographic
strength to the left which is useless for your application (lottery
numbers are selected in a way that is strong to the left and right,
but no one cares about inferring last weeks numbers from this weeks,
you make money only if you can predict the following week). It is
doubly useless, because the second application of SHA1 gives you what
you need.

[color=blue][color=green]
> > * It uses both MD5 and the Mersenne Twister as they were designed (in
> > contrast, my google searches show *no* research on using SHA-1 in OFB
> > and reapplying SHA-1 again to conceal the output).[/color]
>
> Applying a collision resistant hash function to itself is a standard
> construction for making a PRF and is the basis of HMAC.[/color]

Re-reading your sentence carefully, does it say that SHA-1 is
collision resistant. The design goal for SHA-1, of necessity,
produces collisions for messages over 160 bits in length and AFAICT
there are no citations showing collision resistance at 160 bits.
[color=blue]
> I believe
> the relevant paper is
>
> Bellare, M., Canetti, R., and Krawczyk, H., "Pseudorand om Functions
> Revisited: The Cascade Construction". Proc. of the 37th IEEE Symp. on
> Foundation of Computer Science, 1996, webbed (PostScript) at:
>
> http://www.research.ibm.com/security/bck1.ps[/color]

Good paper, but read it carefully. Basically what it starts out
saying is that not all trapdoor functions make good PRFs, and then it
sets about how to construct functions that are both trapdoor and
collision resistant (with positive implications for the minimum
period).

[color=blue]
> My guess is the reason you haven't found any publications specific to
> SHA1 with this construction is that nobody has found any results
> against SHA1. As far as anyone can tell, the output is
> undistinguishab le from random.[/color]

I wish you would abandond this line of reasoning. In security
applications, proofs rule and the absence of published flaws means
nothing (consider the case of Engima where the designers relied on
"nobody has found any results
against" their cipher).

In this particular case, the finding a short cycle would not really be
a major flaw; the function still meets its design goal of producing a
message digest for a given message such that it is difficult to
produce another message with the same digest. A few known 160-bit
collisions would not be an issue except for those exact 160 bit
messages. IOW, even if some collisions are known, don't expect to
find a paper on it.

All of that is moot, MT is fast, subclasses nicely, is threadsafe, has
proven distribution properties, and has a known (large) period. Why
not digest it and produce your shielded random numbers? Or is is just
a matter of being glued to your originally published code (which would
be correct if collision resistance is known)?

Raymond Hettinger

**Anton Vredegoor** · Jul 18 '05, 09:36 AM

Re: SHA-based subclass for random module

python@rcn.com (Raymond Hettinger) wrote:
[color=blue]
>from random import Random
>from struct import unpack
>import md5
>
>class MD5Random(Rando m):
> def random(self, tofloat = 1.0 / 2 ** 53):
> plaintxt = str(Random.getr andbits(self, 128))
> ciphertxt = md5.new(plaintx t).digest()
> randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
> return randint * tofloat[/color]

I wonder why it is necessary to do the bit twiddling once one has
chosen to multiply by a fraction instead of directly putting the bits
inside a float. If 1.0 / 2**53 is just some arbitrary value (and why
not: A floats' smallest value is a lot smaller than that?) then we can
generate integers of other sizes and divide them by other floats, for
example:

from random import Random
import md5

class MD5Random(Rando m):

def random(self, tofloat = 1.0/2**128, prf = md5.new()):
prf.update(repr (Random.random( self)))
return int(prf.hexdige st(),16) * tofloat

def test():
random = MD5Random().ran dom
for i in range(10):
print random()

if __name__=='__ma in__':
test()

Anton

**Paul Rubin** · Jul 18 '05, 09:36 AM

Re: SHA-based subclass for random module

python@rcn.com (Raymond Hettinger) writes:[color=blue][color=green]
> > Why md5 instead of sha?[/color]
>
> You could go either way. Make the decision based on how many bits you
> need (128 should be plenty) for generating a single 53-bit float and
> then take into account speed considerations (MD5 vs SHA-1) and the
> time to generate the bits going into the message digest (it takes
> longer to generate 160 bits than 128).[/color]

I think it's best to stick with SHA, because of its longer hash,
because it's a US federal standard, and because MD5 is already known
to not meet its security goals (see Dobbertin's paper on finding
collisions in MD5's compression function). Python has already screwed
up choosing its PRNG twice out of speed considerations, first with
WHrandom, and when that wasn't random enough, then with MT, which
still isn't random enough. If we're selecting a PRNG for a third
time, let's stop cutting corners and take care of the problem once and
for all. Even the very non-optimized code that I posted runs at
reasonable speed.
[color=blue][color=green]
> > What's Random.random(s elf) supposed to do?[/color]
>
> It's called subclassing ;-)[/color]

OK, I misunderstood at first what was going on.
[color=blue][color=green]
> > But doesn't support all the operations specified in the Random API.
> > What's up with that?[/color]
>
> Sure it does. That is what the subclassing is for. Again be sure to
> incorporate the two submitted revisions resulting in:[/color]

Yes, I understand now. You're simply taking MT output and hashing it.
[color=blue][color=green]
> > I'm missing something here--if you're just trying to avoid possible
> > cycles, why do you want to use MT instead of a sequential counter?[/color]
>
> Either my version or your original requires something that generates a
> sequence of randoms from a seed (mine does it with MT and yours with
> s0=sha1(s0)). The advantages of MT are proven properties, and it can
> more efficiently generate an arbitrary number of bits (thanks to
> genrandbits()) without the messy pure python code for extracting a
> byte at a time. It was also nice to not have to override jumpahead(),
> getstate(), setstate(), etc.[/color]

But MT has no security properties at all. It's equivalent to just
generating the whole output stream by hashing a sequential counter
with a secret seed.
[color=blue]
> If you use sha1 for the base generator, you gain cryptographic
> strength to the left which is useless for your application (lottery
> numbers are selected in a way that is strong to the left and right,
> but no one cares about inferring last weeks numbers from this weeks,[/color]

No, I wanted cryptographic strength to both the left and right,
otherwise I would have just hashed a counter. I do care about
inferring last week's numbers from this week's. If you're playing
poker with someone, you not only don't want him to see your cards
unless he bets against you, you also don't want him to see the cards
you were holding in the last hand when he folded.

See Bruce Schneier's paper on Yarrow for desirable PRNG criteria:

Yarrow-160 - Schneier on Security

http://www.schneier.com/paper-yarrow.html

ABSTRACT: We describe the design of Yarrow, a family of cryptographic pseudo-random number generators (PRNG). We describe the concept of a PRNG as a separate cryptographic primitive, and the design principles used to develop Yarrow. We then discuss the ways that PRNGs can fail in practice, which motivates our discussion of the components of Yarrow and how they make Yarrow secure. Next, we define a specific instance of a PRNG in the Yarrow family that makes use of available technology today. We conclude with a brief listing of open questions and intended improvements in future releases...

[color=blue][color=green]
> > Applying a collision resistant hash function to itself is a standard
> > construction for making a PRF and is the basis of HMAC.[/color]
>
> Re-reading your sentence carefully, does it say that SHA-1 is
> collision resistant. The design goal for SHA-1, of necessity,
> produces collisions for messages over 160 bits in length and AFAICT
> there are no citations showing collision resistance at 160 bits.[/color]

Collision resistance means there's no practical way to FIND
collisions. Obviously it doesn't mean that no collision exists. It's
just like in cryptography, you don't want the adversary to be able to
find the secret key. That doesn't mean there is no key.
[color=blue][color=green]
> > My guess is the reason you haven't found any publications specific to
> > SHA1 with this construction is that nobody has found any results
> > against SHA1. As far as anyone can tell, the output is
> > undistinguishab le from random.[/color]
>
> I wish you would abandond this line of reasoning. In security
> applications, proofs rule and the absence of published flaws means
> nothing (consider the case of Engima where the designers relied on
> "nobody has found any results against" their cipher).[/color]

In that case, where's your proof that MT feeding SHA is secure? I
haven't even seen any assertion made that MT even preserves all the
entropy in the seed that you supply. The only thing I've heard about
MT is that it passes some statistical tests and is guaranteed to have
a long period because of its large internal state. For all I know, it
takes the seed the you give it and crunches it down to 32 bits before
initializing that state. Maybe this question can be answered with a
detailed analysis of MT, but it's simpler if we can avoid the need for
such analysis.
[color=blue]
> In this particular case, the finding a short cycle would not really be
> a major flaw; the function still meets its design goal of producing a
> message digest for a given message such that it is difficult to
> produce another message with the same digest. A few known 160-bit
> collisions would not be an issue except for those exact 160 bit
> messages. IOW, even if some collisions are known, don't expect to
> find a paper on it.[/color]

What do you mean by that? Finding a collision in SHA would be a major
cryptanalysis result, like the result against MD5. Even a
distinguisher (i.e. given N gigabytes of output, a way to tell, with
probability greater than 50%, whether it's SHA output or true random
output) would be a publishable result (there are now several such
results published against RC4 and so RC4 is now deprecated for new
applications).

Or do you mean that if a collision were found by a spook agency, then
we wouldn't hear about it? That will be true of any function we come
up with. At least in SHA's case, if the NSA finds some problem,
they'll probably advise us to stop using SHA (without telling us the
reason), like they did with SHA-0. They sure won't tell us that for
some home-cooked function.

Anyway, the current version of the double SHA generator (which
incorporates a sequential counter into the input of the first hash) is
guaranteed to have no short cycles, because of the sequential counter.

Note that the MT-MD5 generator has a distinguisher after about 2**64
calls, because the MD5 call can see birthday collisions on both its
input and its output rather than only on its output. So it will
repeat outputs slightly more often than expected.
[color=blue]
> All of that is moot, MT is fast, subclasses nicely, is threadsafe, has
> proven distribution properties, and has a known (large) period. Why
> not digest it and produce your shielded random numbers? Or is is just
> a matter of being glued to your originally published code (which would
> be correct if collision resistance is known)?[/color]

I have no attachment to that code, which was provided as a sample
implementation--a real version should be much better optimized, and
maybe even written in C. But I have to ask you the same question: are
you glued to MT for some reason? I'm in favor of staying independent
of MT, for these reasons:

1) I'd rather use primitives that were designed from the beginning as
security primitives; and also to use as few primitives as
possible, so that there's fewer parts to go wrong.

2) The double SHA construction is surely easier to analyze for
security, at least if we assume SHA meets its design criteria. MT
doesn't have any such criteria for us to assume.

3) The PRNG's goal is to generate a reproducable sequence, and we
should take that to mean that if someone wants to port an
application to some other system, they should be able to generate
the same sequence. It's one thing to depend on SHA, which is a
standard and available in lots of environments, but if MT is
needed then its implementation needs to be ported, and its code is
a mess. I don't know if even Jython supports MT, much less
whether any non-Python systems support it. It's best to keep the
generator simple and free-standing, that maybe can even be used as
a standard.

Basically making the PRNG depend on a hairy and obscure function like
MT for no reason other than a little implementation convenience
strikes me as "worse as better" in a rather pure form. If we decide
we don't care about "leftward" security, it's simplest to just feed
SHA with a sequential counter.

**Paul Rubin** · Jul 18 '05, 09:36 AM

Re: SHA-based subclass for random module

anton@vredegoor .doge.nl (Anton Vredegoor) writes:[color=blue]
> I wonder why it is necessary to do the bit twiddling once one has
> chosen to multiply by a fraction instead of directly putting the bits
> inside a float. If 1.0 / 2**53 is just some arbitrary value (and why
> not: A floats' smallest value is a lot smaller than that?) then we can
> generate integers of other sizes and divide them by other floats, for
> example:[/color]

I think life gets very complicated once you start trying to generate
denormalized floats. They aren't uniformly distributed on the unit
interval, so if you make random numbers by dividing uniformly
distributed integers by some constant, you'll get way too many
denorms. If your application is using so many random numbers that it
much chance of ever seeing one below 2**-53, maybe you need to use
extended precision or something. Note this whole thing gets into the
precise properties of machine floats, and Python in principle is not
supposed to be IEEE-dependent. (The magic number 2**-53 is an IEEE
artifact).

**Raymond Hettinger** · Jul 18 '05, 09:37 AM

Re: SHA-based subclass for random module

[Paul Rubin][color=blue]
> I think it's best to stick with SHA, because of its longer hash,
> because it's a US federal standard, and because MD5 is already known
> to not meet its security goals (see Dobbertin's paper on finding
> collisions in MD5's compression function).[/color]

Okay, that is reasonable.

[color=blue]
> Python has already screwed
> up choosing its PRNG twice out of speed considerations, first with
> WHrandom, and when that wasn't random enough, then with MT, which
> still isn't random enough.[/color]

The cause won't be advanced by becoming a zealot or by insulting
people volunteering time to review your patch. The three principal
maintainers of the module have been me, Tim, and Guido -- bashing all
three in one sentence is an unusual strategy ;-)

Also, the criticism of MT is baseless.

[color=blue]
> If we're selecting a PRNG for a third
> time, let's stop cutting corners and take care of the problem once and
> for all.[/color]

No corners were cut. MT is a state-of-the-art psuedo random number
generator. The problem is that you want encryption.

Haskell, Perl, Ruby, and SML also have not gone down your suggested
path. A post to their newsgroups telling them that they are screwing
up and cutting corners likely will not be well received. Of course,
that could just be part of the unusual strategy ;-)

[color=blue]
> But MT has no security properties at all. It's equivalent to just
> generating the whole output stream by hashing a sequential counter
> with a secret seed.[/color]

Right. All of the security comes from the hashing step.

[color=blue]
> No, I wanted cryptographic strength to both the left and right,
> otherwise I would have just hashed a counter. I do care about
> inferring last week's numbers from this week's. If you're playing
> poker with someone, you not only don't want him to see your cards
> unless he bets against you, you also don't want him to see the cards
> you were holding in the last hand when he folded.[/color]

That makes sense. Still, the hashing step protects you. Given a
series of 10,000 bits from the generator, do you have any means of
deducing either a) the state of MT, b) what came before, or c) what
comes after?

I'm keeping an open mind but will abandon this patch unless the tone
improves.

Raymond