dictionary keys, __hash__, __cmp__

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jan-Erik Meyer-Lütgens

    dictionary keys, __hash__, __cmp__

    In the Python Language Reference, I found the following
    statements about using objects as dictionary keys:

    1. "__hash__() should return a 32-bit integer."

    2. "The only required property is that objects which
    compare equal have the same hash value."

    3. "If a class does not define a __cmp__() method it
    should not define a __hash__() operation either."


    Can I asume that:

    -- it is guaranteed that id(obj) returns a unique 32-bit integer

    -- keys are interchangeable (equivalent),
    if the following is valid:

    hash(key1) == hash(key2) and key1 == key2

    -- I can ignore the 2nd statement, if I am aware of
    the fact that: if objects are equal it dosn't mean that
    they are the same key.

    -- I can savely ignore the 3rd statement, because python
    falls back to cmp(id(obj1), id(obj2)), if __cmp__()
    is not defined.

  • Miika Keskinen

    #2
    Re: dictionary keys, __hash__, __cmp__

    On Tue, 04 Nov 2003 21:52:42 +0100, Jan-Erik Meyer-Lütgens wrote:
    [color=blue]
    > In the Python Language Reference, I found the following statements about
    > using objects as dictionary keys:
    >
    > 1. "__hash__() should return a 32-bit integer."
    >
    > 2. "The only required property is that objects which
    > compare equal have the same hash value."
    >
    > 3. "If a class does not define a __cmp__() method it
    > should not define a __hash__() operation either."
    >
    >
    > Can I asume that:
    >
    > -- it is guaranteed that id(obj) returns a unique 32-bit integer[/color]

    Yes it is - id returns current address of object in memory.

    **SNIP**
    Help on built-in function id:

    id(...)
    id(object) -> integer

    Return the identity of an object. This is guaranteed to be unique
    among simultaneously existing objects. (Hint: it's the object's
    memory address.)

    **SNIP**
    [color=blue]
    > -- keys are interchangeable (equivalent),
    > if the following is valid:
    >
    > hash(key1) == hash(key2) and key1 == key2[/color]

    Yes. note that key1 == key2 implies hash(key1) == hash(key2), but if
    hash(key1) == hash(key2) there is no guarantee that key1 == key2. If you
    do define __hash__() build-in function hash would use it and thus you need
    to be sure about quality of your hash-method. However if you do not define
    __hash__ hash() will just fall back into id() that is guaranteed to be
    unique.

    [color=blue]
    > -- I can ignore the 2nd statement, if I am aware of
    > the fact that: if objects are equal it dosn't mean that they are
    > the same key.[/color]

    So you're introducing scenario where different objects are considered
    equal in means of __cmp__ while having different hash. I think that's not
    normal.

    Since if you do not define __cmp__ / __hash__ python will use id and thus
    second rule is valid.

    If you define cmp but not hash you will most likely get TypeError. If you
    define both you should follow second rule as if I'm right most of the
    internal data structures will depend on second rule.

    [color=blue]
    > -- I can savely ignore the 3rd statement, because python
    > falls back to cmp(id(obj1), id(obj2)), if __cmp__() is not defined.[/color]

    Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
    __hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
    NOT in third rule. I'm sure there is situations where it's perfectly
    normal to use id-value's and custom hashes. Anyways you can redefine
    __cmp__() simply to (and thus avoiding to break against third rule):

    def __cmp__(self, other):
    return id(self).__cmp_ _(id(other))



    --
    Miika

    Comment

    • Jan-Erik Meyer-Lütgens

      #3
      Re: dictionary keys, __hash__, __cmp__

      Miika Keskinen wrote:[color=blue]
      > On Tue, 04 Nov 2003 21:52:42 +0100, Jan-Erik Meyer-Lütgens wrote:
      >[color=green]
      >>In the Python Language Reference, I found the following statements about
      >>using objects as dictionary keys:
      >>
      >> 1. "__hash__() should return a 32-bit integer."
      >>
      >> 2. "The only required property is that objects which
      >> compare equal have the same hash value."
      >>
      >> 3. "If a class does not define a __cmp__() method it
      >> should not define a __hash__() operation either."
      >>
      >>
      >>Can I asume that:
      >>
      >> -- keys are interchangeable (equivalent),
      >> if the following is valid:
      >>
      >> hash(key1) == hash(key2) and key1 == key2[/color]
      >
      > Yes. note that key1 == key2 implies hash(key1) == hash(key2)[/color]

      .... only if I define __cmp__() and __hash__() appropriate
      (and not breaking the 2nd rule)

      [color=blue][color=green]
      >> -- I can ignore the 2nd statement, if I am aware of
      >> the fact that: if objects are equal it dosn't mean that they are
      >> the same key.[/color]
      >
      >
      > So you're introducing scenario where different objects are considered
      > equal in means of __cmp__ while having different hash. I think that's not
      > normal.
      >[/color]
      That's the point. I know that this in not normal. But is it possible?
      [color=blue][color=green]
      >> -- I can savely ignore the 3rd statement, because python
      >> falls back to cmp(id(obj1), id(obj2)), if __cmp__() is not defined.[/color]
      >
      >
      > Yes. id(obj1) != id(obj2), so obj1 != obj2. Only requirement left is that
      > __hash__() returns 32 bit integer. Personally i would emphasis word SHOULD
      > NOT in third rule. I'm sure there is situations where it's perfectly
      > normal to use id-value's and custom hashes. Anyways you can redefine
      > __cmp__() simply to (and thus avoiding to break against third rule):
      >
      > def __cmp__(self, other):
      > return id(self).__cmp_ _(id(other))
      >
      >[/color]

      But I want __cmp__() for another comparison. As an real world example
      I have a file object which should be stored in a dictionary. __cmp__()
      compare the contents of two files. Thus I must define the __hash__()
      method. I use id(obj) as the hash function.

      So I've breaking the rule: Objects which compare equal have the same hash.

      I use the file objects as dictionary keys and it seems to work.
      My assumption is that keys are interchangeable (equivalent), if
      hash(key1) == hash(key2) and key1 == key2. In my example keys are
      equivalent, when they are identical.
      id(file_object1 ) == id(file_object2 ) and file1 have the same contents
      as file2.

      Is this assumption valid? Or is there some sophistication in the
      python implemention, that break things sometimes? Resulting in
      deletion of important files :-)

      --
      Jan-Erik

      Comment

      • Michael Hudson

        #4
        Re: dictionary keys, __hash__, __cmp__

        Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:
        [color=blue]
        > In the Python Language Reference, I found the following
        > statements about using objects as dictionary keys:
        >
        > 1. "__hash__() should return a 32-bit integer."
        >
        > 2. "The only required property is that objects which
        > compare equal have the same hash value."
        >
        > 3. "If a class does not define a __cmp__() method it
        > should not define a __hash__() operation either."
        >
        >
        > Can I asume that:
        >
        > -- it is guaranteed that id(obj) returns a unique 32-bit integer[/color]

        No. For instance it doesn't on a 64-bit platform (and on a ridiculous
        64 bit platform *cough*Win64*co ugh* it might even return a long).
        [color=blue]
        > -- keys are interchangeable (equivalent),
        > if the following is valid:
        >
        > hash(key1) == hash(key2) and key1 == key2[/color]

        Not quite sure what you mean by equivalent, but if that is true
        a key inserted under key1 can be retrieved with key2.
        [color=blue]
        > -- I can ignore the 2nd statement, if I am aware of
        > the fact that: if objects are equal it dosn't mean that
        > they are the same key.[/color]

        Um, I don't understand you, but I think the answer is "no".

        If you ever have a situation where hash(a) != hash(b) but a == b then
        you are very much breaking the rules.
        [color=blue]
        > -- I can savely ignore the 3rd statement, because python
        > falls back to cmp(id(obj1), id(obj2)), if __cmp__()
        > is not defined.[/color]

        Don't understand.

        Cheers,
        mwh

        --
        I wouldn't trust the Anglo-Saxons for much anything else. Given
        they way English is spelled, who could trust them on _anything_ that
        had to do with writing things down, anyway?
        -- Erik Naggum, comp.lang.lisp

        Comment

        • Michael Hudson

          #5
          Re: dictionary keys, __hash__, __cmp__

          Michael Hudson <mwh@python.net > writes:
          [color=blue]
          > Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:[/color]
          [snippety][color=blue][color=green]
          > > -- I can ignore the 2nd statement, if I am aware of
          > > the fact that: if objects are equal it dosn't mean that
          > > they are the same key.[/color]
          >
          > Um, I don't understand you, but I think the answer is "no".
          >
          > If you ever have a situation where hash(a) != hash(b) but a == b then
          > you are very much breaking the rules.[/color]

          Having thought about it a bit more, I think the way you're doing
          things *is* actually safe given a platform where sizeof(int) ==
          sizeof(PyObject *) and the current implementation, but sheesh, I
          wouldn't want to rely on it.

          Cheers,
          mwh

          --
          In short, just business as usual in the wacky world of floating
          point <wink>. -- Tim Peters, comp.lang.pytho n

          Comment

          • Jan-Erik Meyer-Lütgens

            #6
            Re: dictionary keys, __hash__, __cmp__

            Michael Hudson wrote:[color=blue]
            > Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:[color=green]
            >>
            >> 3. "If a class does not define a __cmp__() method it
            >> should not define a __hash__() operation either."
            >>
            >>
            >>Can I asume that:
            >>
            >> -- I can savely ignore the 3rd statement, because python
            >> falls back to cmp(id(obj1), id(obj2)), if __cmp__()
            >> is not defined.[/color]
            >
            >
            > Don't understand.
            >[/color]

            Ok, let me ask the following question: What is the reason
            for that rule?


            --
            Jan-Erik

            Comment

            • Michael Hudson

              #7
              Re: dictionary keys, __hash__, __cmp__

              Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:
              [color=blue]
              > Michael Hudson wrote:[color=green]
              > > Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:[color=darkred]
              > >>
              > >> 3. "If a class does not define a __cmp__() method it
              > >> should not define a __hash__() operation either."
              > >>
              > >>
              > >>Can I asume that:
              > >> -- I can savely ignore the 3rd statement, because python
              > >> falls back to cmp(id(obj1), id(obj2)), if __cmp__()
              > >> is not defined.[/color]
              > > Don't understand.
              > >[/color]
              >
              > Ok, let me ask the following question: What is the reason
              > for that rule?[/color]

              Well, the idea is that dictionary lookup is done by equality. If you
              don't define __cmp__ then equality is in fact identity (well, that's
              very nearly true...) so the default implementation of __hash__ (based
              on the pointer address) is as good as it can get (you have hash
              equality iff you have object equality).

              I think.

              Cheers,
              mwh

              --
              Every now and then, Google doesn't throw up what I need so I start
              checking Altavista, Yahoo, etc. In almost every single case, I am
              brutally reminded why I use Google in the first place.
              -- John Riddoch, asr

              Comment

              • Jan-Erik Meyer-Lütgens

                #8
                Re: dictionary keys, __hash__, __cmp__

                Michael Hudson wrote:[color=blue]
                > Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:[color=green]
                >>Michael Hudson wrote:[color=darkred]
                >>>Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:
                >>>
                >>>> 3. "If a class does not define a __cmp__() method it
                >>>> should not define a __hash__() operation either."
                >>>>[/color]
                >>
                >>Ok, let me ask the following question: What is the reason
                >>for that rule?[/color]
                >
                >
                > Well, the idea is that dictionary lookup is done by equality. If you
                > don't define __cmp__ then equality is in fact identity (well, that's
                > very nearly true...) so the default implementation of __hash__ (based[/color]

                ....and the full truth is? :-)
                [color=blue]
                > on the pointer address) is as good as it can get (you have hash
                > equality iff you have object equality).
                >
                > I think.
                >[/color]

                So, this rule is a hint, only. It could break performance,
                not functionality, if I define my own hash function, right?


                To make things totally clear, I repeat my question:

                The only thing to be aware of when working with keys (besides
                of object immutability) seems the following:

                Keys are equivalent (in the sense of: a key inserted under
                key1 can be retrieved with key2), if this is valid:

                hash(key1) == hash(key2) and key1 == key2

                Is this guaranteed? In future implementations ?


                --
                Jan-Erik

                Comment

                • Michael Hudson

                  #9
                  Re: dictionary keys, __hash__, __cmp__

                  Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:
                  [color=blue]
                  > Michael Hudson wrote:[color=green]
                  > > Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:[color=darkred]
                  > >>Michael Hudson wrote:
                  > >>>Jan-Erik Meyer-Lütgens <python@meyer-luetgens.de> writes:
                  > >>>
                  > >>>> 3. "If a class does not define a __cmp__() method it
                  > >>>> should not define a __hash__() operation either."
                  > >>>>
                  > >>
                  > >>Ok, let me ask the following question: What is the reason
                  > >>for that rule?[/color]
                  > > Well, the idea is that dictionary lookup is done by equality. If you
                  > > don't define __cmp__ then equality is in fact identity (well, that's
                  > > very nearly true...) so the default implementation of __hash__ (based[/color]
                  >
                  > ...and the full truth is? :-)[/color]

                  Oh, something to do with __coerce__ and instances of old-style classes
                  (well, I didn't mention __eq__ either, but I assume you know about
                  that).
                  [color=blue][color=green]
                  > > on the pointer address) is as good as it can get (you have hash
                  > > equality iff you have object equality).
                  > > I think.
                  > >[/color]
                  >
                  > So, this rule is a hint, only. It could break performance,
                  > not functionality, if I define my own hash function, right?
                  >
                  >
                  > To make things totally clear, I repeat my question:
                  >
                  > The only thing to be aware of when working with keys (besides
                  > of object immutability) seems the following:
                  >
                  > Keys are equivalent (in the sense of: a key inserted under
                  > key1 can be retrieved with key2), if this is valid:
                  >
                  > hash(key1) == hash(key2) and key1 == key2
                  >
                  > Is this guaranteed?[/color]

                  Yes.
                  [color=blue]
                  > In future implementations ?[/color]

                  One can't predict the entire future of course -- hey, maybe dicts will
                  be splay trees or something one day -- but *I* would be happy
                  depending on it.

                  Cheers,
                  mwh

                  --
                  42. You can measure a programmer's perspective by noting his
                  attitude on the continuing vitality of FORTRAN.
                  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html

                  Comment

                  Working...