change of random state when pyc created??

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Peter Otten

    #16
    Re: change of random state when pyc created??

    Alan Isaac wrote:
    This may seem very strange, but it is true.
    If I delete a .pyc file, my program executes with a different state!
    Can someone explain this to me?
    There is nothing wrong with the random module -- you get the same numbers on
    every run. When there is no pyc-file Python uses some RAM to create it and
    therefore your GridPlayer instances are located in different memory
    locations and get different hash values. This in turn affects the order in
    which they occur when you iterate over the GridPlayer.play ers_played set.

    Here is a minimal example:

    import test # sic

    class T:
    def __init__(self, name):
    self.name = name
    def __repr__(self):
    return "T(name=%r) " % self.name

    if __name__ == "__main__":
    print set(T(i) for i in range(4))

    $ python2.5 test.py
    set([T(name=2), T(name=1), T(name=0), T(name=3)])
    $ python2.5 test.py
    set([T(name=3), T(name=1), T(name=0), T(name=2)])
    $ python2.5 test.py
    set([T(name=3), T(name=1), T(name=0), T(name=2)])
    $ rm test.pyc
    $ python2.5 test.py
    set([T(name=2), T(name=1), T(name=0), T(name=3)])

    Peter

    Comment

    • Alan Isaac

      #17
      Re: change of random state when pyc created??


      "Peter Otten" <__peter__@web. dewrote in message
      news:f1rt61$kfg $03$1@news.t-online.com...
      Alan Isaac wrote:
      There is nothing wrong with the random module -- you get the same numbers
      on
      every run. When there is no pyc-file Python uses some RAM to create it and
      therefore your GridPlayer instances are located in different memory
      locations and get different hash values. This in turn affects the order in
      which they occur when you iterate over the GridPlayer.play ers_played set.
      Thanks!!
      This also explains Steven's results.

      If I sort the set before iterating over it,
      the "anomaly" disappears.

      This means that currently the use of sets
      (and, I assume, dictionaries) as iterators
      compromises replicability. Is that a fair
      statement?

      For me (and apparently for a few others)
      this was a very subtle problem. Is there
      a warning anywhere in the docs? Should
      there be?

      Thanks again!!

      Alan Isaac


      Comment

      • Diez B. Roggisch

        #18
        Re: change of random state when pyc created??

        Alan Isaac wrote:
        >
        "Peter Otten" <__peter__@web. dewrote in message
        news:f1rt61$kfg $03$1@news.t-online.com...
        >Alan Isaac wrote:
        >There is nothing wrong with the random module -- you get the same numbers
        on
        >every run. When there is no pyc-file Python uses some RAM to create it
        >and therefore your GridPlayer instances are located in different memory
        >locations and get different hash values. This in turn affects the order
        >in which they occur when you iterate over the GridPlayer.play ers_played
        >set.
        >
        Thanks!!
        This also explains Steven's results.
        >
        If I sort the set before iterating over it,
        the "anomaly" disappears.
        >
        This means that currently the use of sets
        (and, I assume, dictionaries) as iterators
        compromises replicability. Is that a fair
        statement?
        Yes.
        For me (and apparently for a few others)
        this was a very subtle problem. Is there
        a warning anywhere in the docs? Should
        there be?
        Not really, but that depends on what you know about the concept of sets and
        maps as collections of course.

        The contract for sets and dicts doesn't imply any order whatsoever. Which is
        essentially the reason why

        set(xrange(10))[0]

        doesn't exist, and quite a few times cries for an ordered dictionary as part
        of the standard libraries was made.

        Diez

        Comment

        • Alan G Isaac

          #19
          Re: change of random state when pyc created??

          Diez B. Roggisch wrote:
          Not really, but that depends on what you know about the concept of sets and
          maps as collections of course.
          >
          The contract for sets and dicts doesn't imply any order whatsoever. Which is
          essentially the reason why
          >
          set(xrange(10))[0]
          >
          doesn't exist, and quite a few times cries for an ordered dictionary as part
          of the standard libraries was made.

          It seems to me that you are missing the point,
          but maybe I am missing your point.

          The question of whether a set or dict guarantees
          some order seems quite different from the question
          of whether rerunning an **unchanged program** yields the
          **unchanged results**. The latter question is the question
          of replicability.

          Again I point out that some sophisticated users
          (among which I am not numbering myself) did not
          see into the source of this "anomaly". This
          suggests that an explicit warning is warranted.

          Cheers,
          Alan Isaac

          PS I know ordered dicts are under discussion;
          what about ordered sets?

          Comment

          • Robert Kern

            #20
            Re: change of random state when pyc created??

            Alan G Isaac wrote:
            Diez B. Roggisch wrote:
            >Not really, but that depends on what you know about the concept of sets and
            >maps as collections of course.
            >>
            >The contract for sets and dicts doesn't imply any order whatsoever. Which is
            >essentially the reason why
            >>
            >set(xrange(10) )[0]
            >>
            >doesn't exist, and quite a few times cries for an ordered dictionary as part
            >of the standard libraries was made.
            >
            It seems to me that you are missing the point,
            but maybe I am missing your point.
            >
            The question of whether a set or dict guarantees
            some order seems quite different from the question
            of whether rerunning an **unchanged program** yields the
            **unchanged results**. The latter question is the question
            of replicability.
            >
            Again I point out that some sophisticated users
            (among which I am not numbering myself) did not
            see into the source of this "anomaly". This
            suggests that an explicit warning is warranted.

            """
            Keys and values are listed in an arbitrary order which is non-random, varies
            across Python implementations , and depends on the dictionary's history of
            insertions and deletions.
            """

            The sets documentation is a bit less explicit, though.


            """
            Like other collections, sets support x in set, len(set), and for x in set. Being
            an unordered collection, sets do not record element position or order of insertion.
            """

            --
            Robert Kern

            "I have come to believe that the whole world is an enigma, a harmless enigma
            that is made terrible by our own mad attempt to interpret it as though it had
            an underlying truth."
            -- Umberto Eco

            Comment

            • Alan G Isaac

              #21
              Re: change of random state when pyc created??

              Robert Kern wrote:

              """
              Keys and values are listed in an arbitrary order which is non-random, varies
              across Python implementations , and depends on the dictionary's history of
              insertions and deletions.
              """

              Even this does not tell me that if I use a specified implementation
              that my results can vary from run to run. That is, it still does
              not communicate that rerunning an *unchanged* program with an
              *unchanged* implementation can produce a change in results.

              Alan Isaac

              Comment

              • Chris Mellon

                #22
                Re: change of random state when pyc created??

                On 5/9/07, Alan G Isaac <aisaac@america n.eduwrote:
                Robert Kern wrote:

                """
                Keys and values are listed in an arbitrary order which is non-random, varies
                across Python implementations , and depends on the dictionary's history of
                insertions and deletions.
                """
                >
                >
                Even this does not tell me that if I use a specified implementation
                that my results can vary from run to run. That is, it still does
                not communicate that rerunning an *unchanged* program with an
                *unchanged* implementation can produce a change in results.
                >
                Well, now you know. I'm not sure why you expect any given program to
                be idempotent unless you take specific measures to ensure that anyway.

                Comment

                • Robert Kern

                  #23
                  Re: change of random state when pyc created??

                  Alan G Isaac wrote:
                  Robert Kern wrote:
                  >http://docs.python.org/lib/typesmapping.html
                  >"""
                  >Keys and values are listed in an arbitrary order which is non-random, varies
                  >across Python implementations , and depends on the dictionary's history of
                  >insertions and deletions.
                  >"""
                  >
                  Even this does not tell me that if I use a specified implementation
                  that my results can vary from run to run. That is, it still does
                  not communicate that rerunning an *unchanged* program with an
                  *unchanged* implementation can produce a change in results.
                  The last clause does tell me that.

                  --
                  Robert Kern

                  "I have come to believe that the whole world is an enigma, a harmless enigma
                  that is made terrible by our own mad attempt to interpret it as though it had
                  an underlying truth."
                  -- Umberto Eco

                  Comment

                  • Carsten Haese

                    #24
                    Re: change of random state when pyc created??

                    On Wed, 2007-05-09 at 15:35 -0500, Alan G Isaac wrote:
                    Robert Kern wrote:

                    """
                    Keys and values are listed in an arbitrary order which is non-random, varies
                    across Python implementations , and depends on the dictionary's history of
                    insertions and deletions.
                    """
                    >
                    >
                    Even this does not tell me that if I use a specified implementation
                    that my results can vary from run to run. That is, it still does
                    not communicate that rerunning an *unchanged* program with an
                    *unchanged* implementation can produce a change in results.
                    It doesn't say that rerunning the program won't produce a change in
                    results. It doesn't say that the order depends *only* on those factors
                    in a deterministic and reproducible manner.

                    The documentation shouldn't be expected to list every little thing that
                    might change the order of keys in a dictionary. The documentation does
                    say explicitly what *is* guaranteed: Order of keys is preserved as long
                    as no intervening modifications happen to the dictionary. Tearing down
                    the interpreter, starting it back up, and rebuilding the dictionary from
                    scratch is very definitely an intervening modification.

                    Regards,

                    --
                    Carsten Haese



                    Comment

                    • Alan Isaac

                      #25
                      Re: change of random state when pyc created??

                      >Robert Kern wrote:
                      >>http://docs.python.org/lib/typesmapping.html
                      >>"""
                      >>Keys and values are listed in an arbitrary order which is non-random,
                      varies
                      >>across Python implementations , and depends on the dictionary's history
                      of
                      >>insertions and deletions.
                      >>"""
                      Alan G Isaac wrote:
                      >Even this does not tell me that if I use a specified implementation
                      >that my results can vary from run to run. That is, it still does
                      >not communicate that rerunning an *unchanged* program with an
                      >*unchanged* implementation can produce a change in results.

                      "Robert Kern" <robert.kern@gm ail.comwrote in message
                      news:mailman.74 88.1178744519.3 2031.python-list@python.org ...
                      The last clause does tell me that.
                      1. About your reading of the current language:
                      I believe you, of course, but can you tell me **how** it tells you that?
                      To be concrete, let us suppose parallel language were added to
                      the description of sets. What about that language should allow
                      me to anticipate Peter's example (in this thread)?

                      2. About possibly changing the docs:
                      You are much more sophisticated than ordinary users.
                      Did this thread not demonstrate that even sophisticated users
                      do not see into this "implicatio n" immediately? Replicability
                      of results is a huge deal in some circles. I think the docs
                      for sets and dicts should include a red flag: do not use
                      these as iterators if you want replicable results.
                      (Side note to Carsten: this does not require listing "every little thing".)

                      Cheers,
                      Alan Isaac


                      Comment

                      • Robert Kern

                        #26
                        Re: change of random state when pyc created??

                        Alan Isaac wrote:
                        >>Robert Kern wrote:
                        >>>http://docs.python.org/lib/typesmapping.html
                        >>>"""
                        >>>Keys and values are listed in an arbitrary order which is non-random,
                        varies
                        >>>across Python implementations , and depends on the dictionary's history
                        of
                        >>>insertions and deletions.
                        >>>"""
                        >
                        >Alan G Isaac wrote:
                        >>Even this does not tell me that if I use a specified implementation
                        >>that my results can vary from run to run. That is, it still does
                        >>not communicate that rerunning an *unchanged* program with an
                        >>*unchanged* implementation can produce a change in results.
                        >
                        "Robert Kern" <robert.kern@gm ail.comwrote in message
                        news:mailman.74 88.1178744519.3 2031.python-list@python.org ...
                        >The last clause does tell me that.
                        >
                        1. About your reading of the current language:
                        I believe you, of course, but can you tell me **how** it tells you that?
                        To be concrete, let us suppose parallel language were added to
                        the description of sets. What about that language should allow
                        me to anticipate Peter's example (in this thread)?
                        Actually, the root cause of Peter's specific example is the fact that the
                        default implementation of __hash__() and __eq__() rely on identity comparisons.
                        Two separate invocations of the same script give different objects by identity
                        and thus the "history of insertions and deletions" is different.
                        2. About possibly changing the docs:
                        You are much more sophisticated than ordinary users.
                        Did this thread not demonstrate that even sophisticated users
                        do not see into this "implicatio n" immediately?
                        Well, if you had a small test case that demonstrated the problem, we would have.
                        Your example was large, complicated, and involved other semi-deterministic red
                        herrings (the PRNG). It's quite easy to see the problem with Peter's example.
                        Replicability
                        of results is a huge deal in some circles. I think the docs
                        for sets and dicts should include a red flag: do not use
                        these as iterators if you want replicable results.
                        (Side note to Carsten: this does not require listing "every little thing".)
                        They do. They say very explicitly that they are not ordered and that the
                        sequence of iteration should not be relied upon. The red flags are there.

                        But I'm not going to stop you from writing up something that's even more explicit.

                        --
                        Robert Kern

                        "I have come to believe that the whole world is an enigma, a harmless enigma
                        that is made terrible by our own mad attempt to interpret it as though it had
                        an underlying truth."
                        -- Umberto Eco

                        Comment

                        • Carsten Haese

                          #27
                          Re: change of random state when pyc created??

                          On Thu, 2007-05-10 at 01:25 +0000, Alan Isaac wrote:
                          Did this thread not demonstrate that even sophisticated users
                          do not see into this "implicatio n" immediately?
                          Knowing that maps don't have reproducible ordering is one thing.
                          Realizing that that's the cause of the problem that's arbitrarily and
                          wrongly attributed to the 'random' module, in a piece of code that's not
                          posted to the public, and presumably not trimmed down to the shortest
                          possible example of the problem, is quite another.

                          I'll venture the guess that most Python programmers with a modicum of
                          experience will, when asked point blank if it's safe to rely on a
                          dictionary to be iterated in a particular order, answer no.
                          Replicability
                          of results is a huge deal in some circles.
                          Every software engineer wants their results to be replicable. Software
                          engineers also know that they can only expect their results to be
                          replicable if they use deterministic functions. You wouldn't expect
                          time.time() to return the same result just because you're running the
                          same code, would you?
                          I think the docs
                          for sets and dicts should include a red flag: do not use
                          these as iterators if you want replicable results.
                          It does, at least for dicts: "Keys and values are listed in an arbitrary
                          order." If this wording is not present for sets, something to this
                          effect should be added.

                          Regards,

                          --
                          Carsten Haese



                          Comment

                          • Steven D'Aprano

                            #28
                            Re: change of random state when pyc created??

                            On Wed, 09 May 2007 16:01:02 -0500, Robert Kern wrote:
                            Alan G Isaac wrote:
                            >Robert Kern wrote:
                            >>http://docs.python.org/lib/typesmapping.html
                            >>"""
                            >>Keys and values are listed in an arbitrary order which is non-random, varies
                            >>across Python implementations , and depends on the dictionary's history of
                            >>insertions and deletions.
                            >>"""
                            >>
                            >Even this does not tell me that if I use a specified implementation
                            >that my results can vary from run to run. That is, it still does
                            >not communicate that rerunning an *unchanged* program with an
                            >*unchanged* implementation can produce a change in results.
                            >
                            The last clause does tell me that.
                            Actually it doesn't. If you run a program twice, with the same inputs,
                            and no other source of randomness (or at most have pseudo-randomness
                            starting with the same seed), then the dictionary will have the same
                            history of insertions and deletions from run to run.

                            Go back to Peter Otten's diagnosis of the issue:

                            "... your GridPlayer instances are located in different memory locations
                            and get different hash values. This in turn affects the order in which
                            they occur when you iterate over the GridPlayer.play ers_played set."

                            There is nothing in there about the dictionary having a different history
                            of insertions and deletions. It is having the same insertions and
                            deletions each run, but the items being inserted are located at different
                            memory locations, and _that_ changes their hash value and hence the order
                            they occur in when you iterate over the set.

                            That's quite a subtle thread to follow, and with all respect Robert, it's
                            easy to say it is obvious in hindsight, but I didn't notice you solving
                            the problem in the first place. Maybe you would have, if you had tried...
                            and maybe you would have scratched your head too. Who can tell?

                            As Carsten Haese says in another post:

                            "The documentation shouldn't be expected to list every little thing that
                            might change the order of keys in a dictionary. The documentation does say
                            explicitly what *is* guaranteed: Order of keys is preserved as long as no
                            intervening modifications happen to the dictionary. Tearing down the
                            interpreter, starting it back up, and rebuilding the dictionary from
                            scratch is very definitely an intervening modification."

                            That's all very true, but nevertheless it is a significant gotcha. It is
                            natural to expect two runs of any program to give the same result if there
                            are (1) no random numbers involved; (2) the same input data; (3) and no
                            permanent storage from run to run. One doesn't normally expect the output
                            of a well-written, bug-free program to depend on the memory location of
                            objects. And that's the gotcha -- with dicts and sets, they can.



                            --
                            Steven.

                            Comment

                            • Steven D'Aprano

                              #29
                              Re: change of random state when pyc created??

                              On Wed, 09 May 2007 21:18:25 -0500, Robert Kern wrote:
                              Actually, the root cause of Peter's specific example is the fact that the
                              default implementation of __hash__() and __eq__() rely on identity comparisons.
                              Two separate invocations of the same script give different objects by identity
                              and thus the "history of insertions and deletions" is different.
                              The history is the same. The objects inserted are the same (by equality).
                              The memory address those objects are located at is different.

                              Would you expect that "hello world".find("w" ) should depend on the address
                              of the string "w"? No, of course not. Programming in a high level language
                              like Python, we hope to never need to think about memory addresses. And
                              that's the gotcha.



                              --
                              Steven.

                              Comment

                              • Alan Isaac

                                #30
                                Re: change of random state when pyc created??


                                "Robert Kern" <robert.kern@gm ail.comwrote in message
                                news:mailman.74 97.1178763539.3 2031.python-list@python.org ...
                                Actually, the root cause of Peter's specific example is the fact that the
                                default implementation of __hash__() and __eq__() rely on identity
                                comparisons.
                                Two separate invocations of the same script give different objects by
                                identity
                                and thus the "history of insertions and deletions" is different.

                                OK. Thank you.
                                Alan


                                Comment

                                Working...