SHA-based subclass for random module

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paul Rubin

    SHA-based subclass for random module



    This is intended to be less predicable/have fewer correlations than
    the default Mersenne Twister or Wichmann-Hill generators. Comments
    are appreciated.
  • Raymond Hettinger

    #2
    Re: SHA-based subclass for random module

    [Paul Rubin][color=blue]
    > This is intended to be less predicable/have fewer correlations than
    > the default Mersenne Twister or Wichmann-Hill generators. Comments
    > are appreciated.[/color]

    Because SHA-1 is a digest function, additional input can be tacked on
    without impacting invertibility. That gives you an opportunity to
    incorporate the Mersenne Twister to take advantage of its provably
    long period.

    Here is a rough cut that extracts a 53 bit float on every pass and
    leaves the remaining 107 bits of state hidden:

    import struct
    import sha
    import random

    def srandom(seed):
    gen = sha.new(seed)
    digest = gen.digest
    update = gen.update
    random.seed(see d)
    mt = random.random
    unpack = struct.unpack
    tofloat = 1.0 / 2 ** 53
    while 1:
    state = digest()
    randint = unpack('Q', state[:8])[0] >> 11 # grab 53 bits
    yield randint * tofloat
    update(state)
    update(str(mt() )) # twister never hurts, but can help period

    r = srandom('the quick brown fox jumped')
    for i in xrange(10):
    print r.next()


    Raymond Hettinger

    Comment

    • Paul Rubin

      #3
      Re: SHA-based subclass for random module

      python@rcn.com (Raymond Hettinger) writes:[color=blue]
      > Because SHA-1 is a digest function, additional input can be tacked on
      > without impacting invertibility. That gives you an opportunity to
      > incorporate the Mersenne Twister to take advantage of its provably
      > long period.[/color]

      I don't see any point to including Mersenne Twister output in the SHA
      update. If you want to make sure of a long period, it's easier to
      just mix in a simple counter. Note that the generator I posted should
      have period about 2**160 since in order to repeat, two different SHA
      states would have to repeat simultaneously.

      Comment

      • Raymond Hettinger

        #4
        Re: SHA-based subclass for random module

        [Paul Rubin][color=blue]
        > This is intended to be less predicable/have fewer correlations than
        > the default Mersenne Twister or Wichmann-Hill generators. Comments
        > are appreciated.[/color]

        I offer this as an alternative:


        from random import Random
        from struct import unpack
        import md5

        class MD5Random(Rando m):
        def newrandom(self, tofloat = 1.0 / 2 ** 53):
        plaintxt = str(Random.rand om(self))
        ciphertxt = md5.new(plaintx t).digest()
        randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
        return randint * tofloat


        Advantages over the original:

        * Much faster

        * Much shorter code

        * More readable code

        * Threadsafe

        * Takes advantage of MT's proven equidistributio n, of its passing
        known tests for randomness, and of its known period (in contrast,
        SHA-1 was designed as a digest that makes it computationally
        intractable to find a different message giving the same signature --
        that does *not* imply the non-existence of short cycles for some
        keys).

        * It uses both MD5 and the Mersenne Twister as they were designed (in
        contrast, my google searches show *no* research on using SHA-1 in OFB
        and reapplying SHA-1 again to conceal the output).


        Raymond Hettinger

        Comment

        • Raymond Hettinger

          #5
          Re: SHA-based subclass for random module

          Correction to the last posting.
          The method name should have been random() instead of newrandom().


          Raymond

          Comment

          • Paul Rubin

            #6
            Re: SHA-based subclass for random module

            python@rcn.com (Raymond Hettinger) writes:[color=blue]
            > I offer this as an alternative:
            >
            > from random import Random
            > from struct import unpack
            > import md5[/color]

            Why md5 instead of sha?
            [color=blue]
            > class MD5Random(Rando m):
            > def newrandom(self, tofloat = 1.0 / 2 ** 53):
            > plaintxt = str(Random.rand om(self))[/color]

            What's Random.random(s elf) supposed to do?
            [color=blue]
            > ciphertxt = md5.new(plaintx t).digest()[/color]

            I think you mean update.
            [color=blue]
            > randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
            > return randint * tofloat
            >
            > Advantages over the original:
            >
            > * Much faster[/color]

            Yes, other version wasn't optimized. Both call the hash function
            approx once per output though.
            [color=blue]
            > * Much shorter code[/color]

            But doesn't support all the operations specified in the Random API.
            What's up with that?
            [color=blue]
            > * More readable code[/color]

            Perhaps so. It took me a little while to figure out in the previous
            version that the security came from revealing only part of the output
            of each hash (107 secret bits for SHA). In the md5 version there's
            only 75 secret bits, which is cutting things a little close. It may
            be harder to do a concrete security proof for this construction too
            (I'm not sure, I've never done one).
            [color=blue]
            > * Threadsafe[/color]

            Always a plus.
            [color=blue]
            > * Takes advantage of MT's proven equidistributio n, of its passing
            > known tests for randomness, and of its known period (in contrast,
            > SHA-1 was designed as a digest that makes it computationally
            > intractable to find a different message giving the same signature --
            > that does *not* imply the non-existence of short cycles for some
            > keys).[/color]

            I'm missing something here--if you're just trying to avoid possible
            cycles, why do you want to use MT instead of a sequential counter?
            The counter is guaranteed to never cycle (unless you overflow the 512
            bit compression function input, which won't happen in the life of the
            universe).

            Note that while there's probably some short cycles in the double-SHA
            construction, the chance of finding one is infinitesimally small (it
            means finding a 320 bit birthday collision). Hitting one by accident
            is comparable to guessing an AES key by accident. Finding one by
            brute force or by cryptanalysis is again comparable to breaking AES.
            [color=blue]
            > * It uses both MD5 and the Mersenne Twister as they were designed (in
            > contrast, my google searches show *no* research on using SHA-1 in OFB
            > and reapplying SHA-1 again to conceal the output).[/color]

            Applying a collision resistant hash function to itself is a standard
            construction for making a PRF and is the basis of HMAC. I believe
            the relevant paper is

            Bellare, M., Canetti, R., and Krawczyk, H., "Pseudorand om Functions
            Revisited: The Cascade Construction". Proc. of the 37th IEEE Symp. on
            Foundation of Computer Science, 1996, webbed (PostScript) at:



            but I haven't slogged all the way through it yet. It has a concrete
            security proof and I've always found those things incredibly fussy and
            pedantic compared to normal math papers. I haven't yet figured out
            the reason why they're like that, or if they're really need to be.
            I'd like to attempt to do one for this generator though, once we
            settle on an algorithm.

            See also the other references to RFC 2104, notably Krawczyk et al's
            original HMAC paper from Crypto 96.

            My guess is the reason you haven't found any publications specific to
            SHA1 with this construction is that nobody has found any results
            against SHA1. As far as anyone can tell, the output is
            undistinguishab le from random.

            Comment

            • Jeff Epler

              #7
              Re: SHA-based subclass for random module

              Since the size of plaintext is only 2^53, can't I just calculate
              all 2^53 md5 values in advance, and invert the output of MD5Random to
              get MT outputs, then attack MT just like any LFSR?

              A time-space tradeoff could be made between the number of precalculated
              hashes and the number of outputs from the MD5Random PRNG needed, too.
              A quick search on the internet indicates that about 2n bits are needed
              to find the initial state of an n-bit LSFR[1], so something like 40000 bits
              would be needed to find the internal state of MT, or 380 53-bit outputs.

              So let's say you can precalculate 2^30 MD5 inverses, the chance of any
              particular output being in the table is 2^-23. So you'd need something
              like 2^31 outputs from MD5Random(). I'm sure the analysis is also
              complicated by the fact that your 2n bits won't be consecutive outputs,
              but will be widely separated, but it's likely to remain possible.

              2^53 is a big number, but it's well less than the 2^128 range for
              all MD5 signatures or the 2^64 signatures needed to find a collision.
              2^30 storage and 2^31 calls to random is something many of us could at
              work, if not at home.

              IANAC,
              Jeff
              [1] http://www.ciphersbyritter.com/RES/COMBCORR.HTM
              "(The Berlekamp-Massey algorithm will recover the unknown state of a
              simple n-bit LFSR, and its feedback polynomial, with just 2n known
              bits.)"

              Comment

              • Josiah Carlson

                #8
                Re: SHA-based subclass for random module

                > What's Random.random(s elf) supposed to do?

                Generate a random number using the Mersenne Twister random number generator.
                [color=blue][color=green]
                >> ciphertxt = md5.new(plaintx t).digest()[/color]
                >
                > I think you mean update.[/color]

                Perhaps yes, perhaps no. Certainly the digest of both are dependant on
                two inputs (the current internal state of MD5 and the random number
                generated by MT). However, unless you can store the series of updates
                to MD5, then getstate() followed by setstate() would not be sufficient
                to get the same series of random numbers. This is also a "possible
                issue" shared by the double-sha algorithm.
                [color=blue]
                > But doesn't support all the operations specified in the Random API.
                > What's up with that?[/color]

                That is what the subclass was for:[color=blue]
                > class MD5Random(Rando m):[/color]

                - Josiah

                Comment

                • Paul Rubin

                  #9
                  Re: SHA-based subclass for random module

                  Josiah Carlson <jcarlson@nospa m.uci.edu> writes:[color=blue][color=green]
                  > > What's Random.random(s elf) supposed to do?[/color]
                  >
                  > Generate a random number using the Mersenne Twister random number
                  > generator.[/color]

                  According to the Random doc, Random.random() doesn't take an argument,
                  so I'm asking what the "self" is for.
                  [color=blue][color=green][color=darkred]
                  > >> ciphertxt = md5.new(plaintx t).digest()[/color]
                  > > I think you mean update.[/color]
                  >
                  > Perhaps yes, perhaps no. Certainly the digest of both are dependant
                  > on two inputs (the current internal state of MD5 and the random number
                  > generated by MT). However, unless you can store the series of updates
                  > to MD5, then getstate() followed by setstate() would not be sufficient
                  > to get the same series of random numbers. This is also a "possible
                  > issue" shared by the double-sha algorithm.[/color]

                  Without the update, all that's happening is you're reading an MT output
                  and hashing it, so there's no secret state without some security assumption
                  on MT. And the only security assumption I'm willing to make about MT
                  is that it's completely insecure ;-).
                  [color=blue][color=green]
                  > > But doesn't support all the operations specified in the Random API.
                  > > What's up with that?[/color]
                  >
                  > That is what the subclass was for:[color=green]
                  > > class MD5Random(Rando m):[/color][/color]

                  But the methods in the superclass don't do anything to manage the state
                  of subclass instances.

                  Comment

                  • Josiah Carlson

                    #10
                    Re: SHA-based subclass for random module

                    >>>What's Random.random(s elf) supposed to do?[color=blue][color=green]
                    >>
                    >>Generate a random number using the Mersenne Twister random number
                    >>generator.[/color]
                    >
                    > According to the Random doc, Random.random() doesn't take an argument,
                    > so I'm asking what the "self" is for.[/color]
                    [color=blue][color=green][color=darkred]
                    >>>But doesn't support all the operations specified in the Random API.
                    >>>What's up with that?[/color]
                    >>
                    >>That is what the subclass was for:[color=darkred]
                    >> > class MD5Random(Rando m):[/color][/color]
                    >
                    > But the methods in the superclass don't do anything to manage the state
                    > of subclass instances.[/color]

                    Perhaps you should spend some more time learning how Python classes work.

                    Random is a class.
                    Random.random is an unbound class method.
                    Random.random(s elf) calls the unbound class method 'Random.random' with
                    the first argument being the MD5Random instance.

                    You should satisfy yourself that the below is correct behavior for
                    Python classes.
                    [color=blue][color=green][color=darkred]
                    >>> class A:[/color][/color][/color]
                    .... def blah(self):
                    .... print self.__class__
                    ....[color=blue][color=green][color=darkred]
                    >>> class B(A):[/color][/color][/color]
                    .... def blah(self):
                    .... print "I'm in B, calling A"
                    .... A.blah(self)
                    ....[color=blue][color=green][color=darkred]
                    >>> a = A()
                    >>> a.blah()[/color][/color][/color]
                    __main__.A[color=blue][color=green][color=darkred]
                    >>> b = B()
                    >>> b.blah()[/color][/color][/color]
                    I'm in B, calling A
                    __main__.B

                    Can you figure out why the following is also correct?
                    [color=blue][color=green][color=darkred]
                    >>> class C:[/color][/color][/color]
                    .... def blah(self):
                    .... print "I'm in C, calling A"
                    .... A.blah(self)
                    ....[color=blue][color=green][color=darkred]
                    >>> c = C()
                    >>> c.blah()[/color][/color][/color]
                    I'm in C, calling A
                    Traceback (most recent call last):
                    File "<stdin>", line 1, in ?
                    File "<stdin>", line 4, in blah
                    TypeError: unbound method blah() must be called with A instance as first
                    argument (got C instance instead)


                    Perhaps now you understand why he can use Random.random(s elf) in the
                    earlier version, and Random.getrandb its(self, 128) in the current version.

                    - Josiah

                    Comment

                    • Raymond Hettinger

                      #11
                      Re: SHA-based subclass for random module

                      > Why md5 instead of sha?

                      You could go either way. Make the decision based on how many bits you
                      need (128 should be plenty) for generating a single 53-bit float and
                      then take into account speed considerations (MD5 vs SHA-1) and the
                      time to generate the bits going into the message digest (it takes
                      longer to generate 160 bits than 128).

                      [color=blue]
                      > What's Random.random(s elf) supposed to do?[/color]

                      It's called subclassing ;-)

                      In this case, it inherits all the other random functionality (being
                      sure to include my corrected method name from newrandom() to
                      random()).

                      [color=blue][color=green]
                      > > ciphertxt = md5.new(plaintx t).digest()[/color]
                      >
                      > I think you mean update.[/color]

                      Nope, I meant digest(), but it could be done with update as well.


                      [color=blue]
                      > But doesn't support all the operations specified in the Random API.
                      > What's up with that?[/color]

                      Sure it does. That is what the subclassing is for. Again be sure to
                      incorporate the two submitted revisions resulting in:


                      from random import Random
                      from struct import unpack
                      import md5

                      class MD5Random(Rando m):
                      def random(self, tofloat = 1.0 / 2 ** 53):
                      plaintxt = str(Random.getr andbits(self, 128))
                      ciphertxt = md5.new(plaintx t).digest()
                      randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
                      return randint * tofloat

                      [color=blue][color=green]
                      > > * Takes advantage of MT's proven equidistributio n, of its passing
                      > > known tests for randomness, and of its known period (in contrast,
                      > > SHA-1 was designed as a digest that makes it computationally
                      > > intractable to find a different message giving the same signature --
                      > > that does *not* imply the non-existence of short cycles for some
                      > > keys).[/color]
                      >
                      > I'm missing something here--if you're just trying to avoid possible
                      > cycles, why do you want to use MT instead of a sequential counter?[/color]

                      Either my version or your original requires something that generates a
                      sequence of randoms from a seed (mine does it with MT and yours with
                      s0=sha1(s0)). The advantages of MT are proven properties, and it can
                      more efficiently generate an arbitrary number of bits (thanks to
                      genrandbits()) without the messy pure python code for extracting a
                      byte at a time. It was also nice to not have to override jumpahead(),
                      getstate(), setstate(), etc.

                      For the base generator, MT or s0-sha1(s0), no cryptography is
                      necessary; you're getting that from the second application of sha1 or
                      md5.

                      If you use sha1 for the base generator, you gain cryptographic
                      strength to the left which is useless for your application (lottery
                      numbers are selected in a way that is strong to the left and right,
                      but no one cares about inferring last weeks numbers from this weeks,
                      you make money only if you can predict the following week). It is
                      doubly useless, because the second application of SHA1 gives you what
                      you need.

                      [color=blue][color=green]
                      > > * It uses both MD5 and the Mersenne Twister as they were designed (in
                      > > contrast, my google searches show *no* research on using SHA-1 in OFB
                      > > and reapplying SHA-1 again to conceal the output).[/color]
                      >
                      > Applying a collision resistant hash function to itself is a standard
                      > construction for making a PRF and is the basis of HMAC.[/color]

                      Re-reading your sentence carefully, does it say that SHA-1 is
                      collision resistant. The design goal for SHA-1, of necessity,
                      produces collisions for messages over 160 bits in length and AFAICT
                      there are no citations showing collision resistance at 160 bits.
                      [color=blue]
                      > I believe
                      > the relevant paper is
                      >
                      > Bellare, M., Canetti, R., and Krawczyk, H., "Pseudorand om Functions
                      > Revisited: The Cascade Construction". Proc. of the 37th IEEE Symp. on
                      > Foundation of Computer Science, 1996, webbed (PostScript) at:
                      >
                      > http://www.research.ibm.com/security/bck1.ps[/color]

                      Good paper, but read it carefully. Basically what it starts out
                      saying is that not all trapdoor functions make good PRFs, and then it
                      sets about how to construct functions that are both trapdoor and
                      collision resistant (with positive implications for the minimum
                      period).

                      [color=blue]
                      > My guess is the reason you haven't found any publications specific to
                      > SHA1 with this construction is that nobody has found any results
                      > against SHA1. As far as anyone can tell, the output is
                      > undistinguishab le from random.[/color]

                      I wish you would abandond this line of reasoning. In security
                      applications, proofs rule and the absence of published flaws means
                      nothing (consider the case of Engima where the designers relied on
                      "nobody has found any results
                      against" their cipher).

                      In this particular case, the finding a short cycle would not really be
                      a major flaw; the function still meets its design goal of producing a
                      message digest for a given message such that it is difficult to
                      produce another message with the same digest. A few known 160-bit
                      collisions would not be an issue except for those exact 160 bit
                      messages. IOW, even if some collisions are known, don't expect to
                      find a paper on it.

                      All of that is moot, MT is fast, subclasses nicely, is threadsafe, has
                      proven distribution properties, and has a known (large) period. Why
                      not digest it and produce your shielded random numbers? Or is is just
                      a matter of being glued to your originally published code (which would
                      be correct if collision resistance is known)?


                      Raymond Hettinger

                      Comment

                      • Anton Vredegoor

                        #12
                        Re: SHA-based subclass for random module

                        python@rcn.com (Raymond Hettinger) wrote:
                        [color=blue]
                        >from random import Random
                        >from struct import unpack
                        >import md5
                        >
                        >class MD5Random(Rando m):
                        > def random(self, tofloat = 1.0 / 2 ** 53):
                        > plaintxt = str(Random.getr andbits(self, 128))
                        > ciphertxt = md5.new(plaintx t).digest()
                        > randint = unpack('<Q', ciphertxt[:8])[0] >> 11 # grab 53 bits
                        > return randint * tofloat[/color]


                        I wonder why it is necessary to do the bit twiddling once one has
                        chosen to multiply by a fraction instead of directly putting the bits
                        inside a float. If 1.0 / 2**53 is just some arbitrary value (and why
                        not: A floats' smallest value is a lot smaller than that?) then we can
                        generate integers of other sizes and divide them by other floats, for
                        example:

                        from random import Random
                        import md5

                        class MD5Random(Rando m):

                        def random(self, tofloat = 1.0/2**128, prf = md5.new()):
                        prf.update(repr (Random.random( self)))
                        return int(prf.hexdige st(),16) * tofloat

                        def test():
                        random = MD5Random().ran dom
                        for i in range(10):
                        print random()

                        if __name__=='__ma in__':
                        test()

                        Anton

                        Comment

                        • Paul Rubin

                          #13
                          Re: SHA-based subclass for random module

                          python@rcn.com (Raymond Hettinger) writes:[color=blue][color=green]
                          > > Why md5 instead of sha?[/color]
                          >
                          > You could go either way. Make the decision based on how many bits you
                          > need (128 should be plenty) for generating a single 53-bit float and
                          > then take into account speed considerations (MD5 vs SHA-1) and the
                          > time to generate the bits going into the message digest (it takes
                          > longer to generate 160 bits than 128).[/color]

                          I think it's best to stick with SHA, because of its longer hash,
                          because it's a US federal standard, and because MD5 is already known
                          to not meet its security goals (see Dobbertin's paper on finding
                          collisions in MD5's compression function). Python has already screwed
                          up choosing its PRNG twice out of speed considerations, first with
                          WHrandom, and when that wasn't random enough, then with MT, which
                          still isn't random enough. If we're selecting a PRNG for a third
                          time, let's stop cutting corners and take care of the problem once and
                          for all. Even the very non-optimized code that I posted runs at
                          reasonable speed.
                          [color=blue][color=green]
                          > > What's Random.random(s elf) supposed to do?[/color]
                          >
                          > It's called subclassing ;-)[/color]

                          OK, I misunderstood at first what was going on.
                          [color=blue][color=green]
                          > > But doesn't support all the operations specified in the Random API.
                          > > What's up with that?[/color]
                          >
                          > Sure it does. That is what the subclassing is for. Again be sure to
                          > incorporate the two submitted revisions resulting in:[/color]

                          Yes, I understand now. You're simply taking MT output and hashing it.
                          [color=blue][color=green]
                          > > I'm missing something here--if you're just trying to avoid possible
                          > > cycles, why do you want to use MT instead of a sequential counter?[/color]
                          >
                          > Either my version or your original requires something that generates a
                          > sequence of randoms from a seed (mine does it with MT and yours with
                          > s0=sha1(s0)). The advantages of MT are proven properties, and it can
                          > more efficiently generate an arbitrary number of bits (thanks to
                          > genrandbits()) without the messy pure python code for extracting a
                          > byte at a time. It was also nice to not have to override jumpahead(),
                          > getstate(), setstate(), etc.[/color]

                          But MT has no security properties at all. It's equivalent to just
                          generating the whole output stream by hashing a sequential counter
                          with a secret seed.
                          [color=blue]
                          > If you use sha1 for the base generator, you gain cryptographic
                          > strength to the left which is useless for your application (lottery
                          > numbers are selected in a way that is strong to the left and right,
                          > but no one cares about inferring last weeks numbers from this weeks,[/color]

                          No, I wanted cryptographic strength to both the left and right,
                          otherwise I would have just hashed a counter. I do care about
                          inferring last week's numbers from this week's. If you're playing
                          poker with someone, you not only don't want him to see your cards
                          unless he bets against you, you also don't want him to see the cards
                          you were holding in the last hand when he folded.

                          See Bruce Schneier's paper on Yarrow for desirable PRNG criteria:

                          ABSTRACT: We describe the design of Yarrow, a family of cryptographic pseudo-random number generators (PRNG). We describe the concept of a PRNG as a separate cryptographic primitive, and the design principles used to develop Yarrow. We then discuss the ways that PRNGs can fail in practice, which motivates our discussion of the components of Yarrow and how they make Yarrow secure. Next, we define a specific instance of a PRNG in the Yarrow family that makes use of available technology today. We conclude with a brief listing of open questions and intended improvements in future releases...

                          [color=blue][color=green]
                          > > Applying a collision resistant hash function to itself is a standard
                          > > construction for making a PRF and is the basis of HMAC.[/color]
                          >
                          > Re-reading your sentence carefully, does it say that SHA-1 is
                          > collision resistant. The design goal for SHA-1, of necessity,
                          > produces collisions for messages over 160 bits in length and AFAICT
                          > there are no citations showing collision resistance at 160 bits.[/color]

                          Collision resistance means there's no practical way to FIND
                          collisions. Obviously it doesn't mean that no collision exists. It's
                          just like in cryptography, you don't want the adversary to be able to
                          find the secret key. That doesn't mean there is no key.
                          [color=blue][color=green]
                          > > My guess is the reason you haven't found any publications specific to
                          > > SHA1 with this construction is that nobody has found any results
                          > > against SHA1. As far as anyone can tell, the output is
                          > > undistinguishab le from random.[/color]
                          >
                          > I wish you would abandond this line of reasoning. In security
                          > applications, proofs rule and the absence of published flaws means
                          > nothing (consider the case of Engima where the designers relied on
                          > "nobody has found any results against" their cipher).[/color]

                          In that case, where's your proof that MT feeding SHA is secure? I
                          haven't even seen any assertion made that MT even preserves all the
                          entropy in the seed that you supply. The only thing I've heard about
                          MT is that it passes some statistical tests and is guaranteed to have
                          a long period because of its large internal state. For all I know, it
                          takes the seed the you give it and crunches it down to 32 bits before
                          initializing that state. Maybe this question can be answered with a
                          detailed analysis of MT, but it's simpler if we can avoid the need for
                          such analysis.
                          [color=blue]
                          > In this particular case, the finding a short cycle would not really be
                          > a major flaw; the function still meets its design goal of producing a
                          > message digest for a given message such that it is difficult to
                          > produce another message with the same digest. A few known 160-bit
                          > collisions would not be an issue except for those exact 160 bit
                          > messages. IOW, even if some collisions are known, don't expect to
                          > find a paper on it.[/color]

                          What do you mean by that? Finding a collision in SHA would be a major
                          cryptanalysis result, like the result against MD5. Even a
                          distinguisher (i.e. given N gigabytes of output, a way to tell, with
                          probability greater than 50%, whether it's SHA output or true random
                          output) would be a publishable result (there are now several such
                          results published against RC4 and so RC4 is now deprecated for new
                          applications).

                          Or do you mean that if a collision were found by a spook agency, then
                          we wouldn't hear about it? That will be true of any function we come
                          up with. At least in SHA's case, if the NSA finds some problem,
                          they'll probably advise us to stop using SHA (without telling us the
                          reason), like they did with SHA-0. They sure won't tell us that for
                          some home-cooked function.

                          Anyway, the current version of the double SHA generator (which
                          incorporates a sequential counter into the input of the first hash) is
                          guaranteed to have no short cycles, because of the sequential counter.

                          Note that the MT-MD5 generator has a distinguisher after about 2**64
                          calls, because the MD5 call can see birthday collisions on both its
                          input and its output rather than only on its output. So it will
                          repeat outputs slightly more often than expected.
                          [color=blue]
                          > All of that is moot, MT is fast, subclasses nicely, is threadsafe, has
                          > proven distribution properties, and has a known (large) period. Why
                          > not digest it and produce your shielded random numbers? Or is is just
                          > a matter of being glued to your originally published code (which would
                          > be correct if collision resistance is known)?[/color]

                          I have no attachment to that code, which was provided as a sample
                          implementation--a real version should be much better optimized, and
                          maybe even written in C. But I have to ask you the same question: are
                          you glued to MT for some reason? I'm in favor of staying independent
                          of MT, for these reasons:

                          1) I'd rather use primitives that were designed from the beginning as
                          security primitives; and also to use as few primitives as
                          possible, so that there's fewer parts to go wrong.

                          2) The double SHA construction is surely easier to analyze for
                          security, at least if we assume SHA meets its design criteria. MT
                          doesn't have any such criteria for us to assume.

                          3) The PRNG's goal is to generate a reproducable sequence, and we
                          should take that to mean that if someone wants to port an
                          application to some other system, they should be able to generate
                          the same sequence. It's one thing to depend on SHA, which is a
                          standard and available in lots of environments, but if MT is
                          needed then its implementation needs to be ported, and its code is
                          a mess. I don't know if even Jython supports MT, much less
                          whether any non-Python systems support it. It's best to keep the
                          generator simple and free-standing, that maybe can even be used as
                          a standard.

                          Basically making the PRNG depend on a hairy and obscure function like
                          MT for no reason other than a little implementation convenience
                          strikes me as "worse as better" in a rather pure form. If we decide
                          we don't care about "leftward" security, it's simplest to just feed
                          SHA with a sequential counter.

                          Comment

                          • Paul Rubin

                            #14
                            Re: SHA-based subclass for random module

                            anton@vredegoor .doge.nl (Anton Vredegoor) writes:[color=blue]
                            > I wonder why it is necessary to do the bit twiddling once one has
                            > chosen to multiply by a fraction instead of directly putting the bits
                            > inside a float. If 1.0 / 2**53 is just some arbitrary value (and why
                            > not: A floats' smallest value is a lot smaller than that?) then we can
                            > generate integers of other sizes and divide them by other floats, for
                            > example:[/color]

                            I think life gets very complicated once you start trying to generate
                            denormalized floats. They aren't uniformly distributed on the unit
                            interval, so if you make random numbers by dividing uniformly
                            distributed integers by some constant, you'll get way too many
                            denorms. If your application is using so many random numbers that it
                            much chance of ever seeing one below 2**-53, maybe you need to use
                            extended precision or something. Note this whole thing gets into the
                            precise properties of machine floats, and Python in principle is not
                            supposed to be IEEE-dependent. (The magic number 2**-53 is an IEEE
                            artifact).

                            Comment

                            • Raymond Hettinger

                              #15
                              Re: SHA-based subclass for random module

                              [Paul Rubin][color=blue]
                              > I think it's best to stick with SHA, because of its longer hash,
                              > because it's a US federal standard, and because MD5 is already known
                              > to not meet its security goals (see Dobbertin's paper on finding
                              > collisions in MD5's compression function).[/color]

                              Okay, that is reasonable.


                              [color=blue]
                              > Python has already screwed
                              > up choosing its PRNG twice out of speed considerations, first with
                              > WHrandom, and when that wasn't random enough, then with MT, which
                              > still isn't random enough.[/color]

                              The cause won't be advanced by becoming a zealot or by insulting
                              people volunteering time to review your patch. The three principal
                              maintainers of the module have been me, Tim, and Guido -- bashing all
                              three in one sentence is an unusual strategy ;-)

                              Also, the criticism of MT is baseless.


                              [color=blue]
                              > If we're selecting a PRNG for a third
                              > time, let's stop cutting corners and take care of the problem once and
                              > for all.[/color]

                              No corners were cut. MT is a state-of-the-art psuedo random number
                              generator. The problem is that you want encryption.

                              Haskell, Perl, Ruby, and SML also have not gone down your suggested
                              path. A post to their newsgroups telling them that they are screwing
                              up and cutting corners likely will not be well received. Of course,
                              that could just be part of the unusual strategy ;-)


                              [color=blue]
                              > But MT has no security properties at all. It's equivalent to just
                              > generating the whole output stream by hashing a sequential counter
                              > with a secret seed.[/color]

                              Right. All of the security comes from the hashing step.


                              [color=blue]
                              > No, I wanted cryptographic strength to both the left and right,
                              > otherwise I would have just hashed a counter. I do care about
                              > inferring last week's numbers from this week's. If you're playing
                              > poker with someone, you not only don't want him to see your cards
                              > unless he bets against you, you also don't want him to see the cards
                              > you were holding in the last hand when he folded.[/color]

                              That makes sense. Still, the hashing step protects you. Given a
                              series of 10,000 bits from the generator, do you have any means of
                              deducing either a) the state of MT, b) what came before, or c) what
                              comes after?


                              I'm keeping an open mind but will abandon this patch unless the tone
                              improves.



                              Raymond

                              Comment

                              Working...