about Equals and GetHashCode

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Tony Johansson

    about Equals and GetHashCode

    Hello!

    I can't figure out what point it is to use GetHashCode. I know that this
    GetHashCode is used for obtaining a unique integer
    value.
    Can somebody give me an example that prove the usefulness of this
    GetHashCode or it it of no use at all?

    A mean that if I get an integer from current time in some way what should I
    use it for?

    One more question should these Equals and GetHashCode be used in pair?

    //Tony



  • Marc Gravell

    #2
    Re: about Equals and GetHashCode

    No; GetHashCode() does not guarantee uniqueness - for starters, it is
    an Int32 - it would take a trivial example involving Int64 and i++ to
    show that they aren't unique.

    The point is that if a.GetHashCode() and b.GetHashCode() are
    *different* then a nd b definitely aren't the same. If the hashcodes
    are the same, then you need to check a and b. This provides an
    efficient way of narrowing down the amount of things we actually need
    to check, by first excluding everything that doesn't have the same
    hashcode,
    leaving a much smaller set.
    Further, advanced structures such as hash-tables can use the hash-code
    to create "buckets" - so, for example, based on the number of items,
    it might decide to split it into 20 "buckets" by taking the modulo of
    the hashcode - i.e. each item goes into the bucket with index
    GetHashCode() % bucketCount.

    Then, to find a given item, it can do:
    * get the haschode of the search key
    * calculate the modulo of that hashcode - we now know which bucket to
    look in [down to 1/20th of the data already]
    * look in that bucket for everything with the correct hashcode
    [probably down to 1 or 2 items now]
    * perform the actual Equals between the search key and the stored
    items, to find the one we want

    So: GetHashCode() is very rarely used without Equals() [although
    Equals() can be used without GetHashCode()]; it is definitely very
    important, and underpins all the fast lookups from Dictionary<,>, etc.
    Linear search (testing each item) would be very, very slow on a long
    list.

    Marc

    Comment

    • Marc Gravell

      #3
      Re: about Equals and GetHashCode

      (for info, Dictionary<,and HashTable actually use a more
      sophisticated algorithm - but that gives an overall picture of how a
      hashcode might be used in theory)

      Comment

      • Peter Morris

        #4
        Re: about Equals and GetHashCode

        If you override GetHashCode you should also override Equals, to ensure that
        when two objects are equal their hashcodes are the same as well. As for an
        example of its use, it is used in Hashtable and Dictionary<,>



        --
        Pete
        =============== =============== ===========
        I use Enterprise Core Objects (Domain driven design)

        =============== =============== ===========


        Comment

        • Tony Johansson

          #5
          Re: about Equals and GetHashCode

          Hello!

          When I override the Equals method I should also overide the GetHashCode
          according to others and all documentaion that I have read but why?
          I mean in the Equals method I implement what I mean to say that these two
          object are equals I can't see
          any point what the GetHashCode can add to my implementation of the Equals
          method?

          I mean if I for example have a Person with an integer of age in this class
          Person.
          I just check if the ages in these two object are equal if so then return
          true else false from the Equals method.
          But here if I also override the GetHashCode what can this method add to this
          checking if the two ages in my two
          objects are equal or not?


          //Tony

          "Marc Gravell" <marc.gravell@g mail.comskrev i meddelandet
          news:f7dcfafc-483b-4571-9bd0-1a3ed9ef2e9b@s5 0g2000hsb.googl egroups.com...
          (for info, Dictionary<,and HashTable actually use a more
          sophisticated algorithm - but that gives an overall picture of how a
          hashcode might be used in theory)


          Comment

          • Jon Skeet [C# MVP]

            #6
            Re: about Equals and GetHashCode

            On Jun 12, 4:23 pm, "Tony Johansson" <johansson.ande rs...@telia.com >
            wrote:
            When I override the Equals method I should also overide the GetHashCode
            according to others and all documentaion that I have read but why?
            I mean in the Equals method I implement what I mean to say that these two
            object are equals I can't see
            any point what the GetHashCode can add to my implementation of the Equals
            method?
            Two objects which are considered equal should return the same hash
            code. If they do, that type can be used in a hashtable (such as
            Dictionary). If they don't, a hash lookup will (potentially) fail for
            equal keys, which violates the point of a hashtable.
            I mean if I for example have a Person with an integer of age in this class
            Person.
            I just check if the ages in these two object are equal if so then return
            true else false from the Equals method.
            But here if I also override the GetHashCode what can this method add to this
            checking if the two ages in my two
            objects are equal or not?
            If Equals is just checking the person's age, then returning the age as
            the hash code is a reasonable start.

            Generally you build a hash up from the values which are compared for
            equality.

            Jon

            Comment

            • Tony Johansson

              #7
              Re: about Equals and GetHashCode

              Very good answer Jon
              short and consist exact what I wanted.

              Many thanks!!

              //Tony

              "Jon Skeet [C# MVP]" <skeet@pobox.co mskrev i meddelandet
              news:bc652b09-51d0-4d9e-ab2e-5fadca8a8a97@34 g2000hsf.google groups.com...
              On Jun 12, 4:23 pm, "Tony Johansson" <johansson.ande rs...@telia.com >
              wrote:
              >When I override the Equals method I should also overide the GetHashCode
              >according to others and all documentaion that I have read but why?
              >I mean in the Equals method I implement what I mean to say that these two
              >object are equals I can't see
              >any point what the GetHashCode can add to my implementation of the Equals
              >method?
              >
              Two objects which are considered equal should return the same hash
              code. If they do, that type can be used in a hashtable (such as
              Dictionary). If they don't, a hash lookup will (potentially) fail for
              equal keys, which violates the point of a hashtable.
              >
              >I mean if I for example have a Person with an integer of age in this
              >class
              >Person.
              >I just check if the ages in these two object are equal if so then return
              >true else false from the Equals method.
              >But here if I also override the GetHashCode what can this method add to
              >this
              >checking if the two ages in my two
              >objects are equal or not?
              >
              If Equals is just checking the person's age, then returning the age as
              the hash code is a reasonable start.
              >
              Generally you build a hash up from the values which are compared for
              equality.
              >
              Jon

              Comment

              • Tony Johansson

                #8
                Re: about Equals and GetHashCode

                Hello!

                One more thing how will this GetHashCode be called I assume that it's not
                called explicitly like someObject.GetH asCode();

                //Tony


                "Jon Skeet [C# MVP]" <skeet@pobox.co mskrev i meddelandet
                news:bc652b09-51d0-4d9e-ab2e-5fadca8a8a97@34 g2000hsf.google groups.com...
                On Jun 12, 4:23 pm, "Tony Johansson" <johansson.ande rs...@telia.com >
                wrote:
                >When I override the Equals method I should also overide the GetHashCode
                >according to others and all documentaion that I have read but why?
                >I mean in the Equals method I implement what I mean to say that these two
                >object are equals I can't see
                >any point what the GetHashCode can add to my implementation of the Equals
                >method?
                >
                Two objects which are considered equal should return the same hash
                code. If they do, that type can be used in a hashtable (such as
                Dictionary). If they don't, a hash lookup will (potentially) fail for
                equal keys, which violates the point of a hashtable.
                >
                >I mean if I for example have a Person with an integer of age in this
                >class
                >Person.
                >I just check if the ages in these two object are equal if so then return
                >true else false from the Equals method.
                >But here if I also override the GetHashCode what can this method add to
                >this
                >checking if the two ages in my two
                >objects are equal or not?
                >
                If Equals is just checking the person's age, then returning the age as
                the hash code is a reasonable start.
                >
                Generally you build a hash up from the values which are compared for
                equality.
                >
                Jon

                Comment

                • Jon Skeet [C# MVP]

                  #9
                  Re: about Equals and GetHashCode

                  On Jun 12, 5:35 pm, "Tony Johansson" <johansson.ande rs...@telia.com >
                  wrote:
                  One more thing how will this GetHashCode be called I assume that it's not
                  called explicitly like someObject.GetH asCode();
                  It certainly could be. It's just a normal method. It will usually be
                  called by either:
                  a) an implementation of a hashtable, such as Dictionary
                  b) another object which contains an instance of your type, is using
                  that as part of equality comparison, and wants to use it to build its
                  own hashcode.

                  As an example of b) if your person had a name as well as an age, you
                  might do:

                  public override int GetHashCode()
                  {
                  int ret = 17;
                  ret = 37*ret + age;
                  ret = 37*ret + name.GetHashCod e();
                  return ret;
                  }

                  Jon

                  Comment

                  • Brian Gideon

                    #10
                    Re: about Equals and GetHashCode

                    On Jun 12, 11:48 am, "Jon Skeet [C# MVP]" <sk...@pobox.co mwrote:
                    It certainly could be. It's just a normal method. It will usually be
                    called by either:
                    a) an implementation of a hashtable, such as Dictionary
                    b) another object which contains an instance of your type, is using
                    that as part of equality comparison, and wants to use it to build its
                    own hashcode.
                    >
                    As an example of b) if your person had a name as well as an age, you
                    might do:
                    >
                    public override int GetHashCode()
                    {
                        int ret = 17;
                        ret = 37*ret + age;
                        ret = 37*ret + name.GetHashCod e();
                        return ret;
                    >
                    }
                    >
                    Jon
                    Maybe it's just a typo, but did you mean to use age as-is or did you
                    mean to call GetHashCode on it too? The reason I'm asking is because
                    the way it is written would cause people with the same name to cluster
                    together. The effect would be more severe had age been merged into
                    the hash code after name.

                    Also, in the interest of playing the Monday morning quarterback, I've
                    wondered whether Microsoft would have been better off forcing all
                    dictionary keys to implement a special interface, say IDictionaryKey,
                    that inherits IEquatable and provides the GetHashCode method. That
                    way the requirement for implementing GetHashCode and Equals in tandem
                    could be enforced declaratively instead of having a special case the
                    compiler must check for. Opinion?

                    Comment

                    • Peter Duniho

                      #11
                      Re: about Equals and GetHashCode

                      On Thu, 12 Jun 2008 12:10:22 -0700, Brian Gideon <briangideon@ya hoo.com>
                      wrote:
                      >public override int GetHashCode()
                      >{
                      >    int ret = 17;
                      >    ret = 37*ret + age;
                      >    ret = 37*ret + name.GetHashCod e();
                      >    return ret;
                      >>
                      >}
                      >>
                      >Jon
                      >
                      Maybe it's just a typo, but did you mean to use age as-is or did you
                      mean to call GetHashCode on it too? The reason I'm asking is because
                      the way it is written would cause people with the same name to cluster
                      together. The effect would be more severe had age been merged into
                      the hash code after name.
                      Jon can answer for himself, of course. But that's never stopped me from
                      jumping in before. :)

                      I'm not sure what you mean about "cluster". The general rule is that a
                      type that's already effectively an int can just be used directly as its
                      own hash code. I'd be surprised if int.GetHashCode () does anything except
                      return the int value anyway, but regardless, there's no reason not to just
                      use the int directly as its own hash code.

                      I _do_ believe that pre-initializing the hashcode with 629 is pointless.
                      You could start at 0 just as easily, and the values would mathematically
                      be distributed the same, I believe. At least, the last time I brought
                      that up (in the Java newsgroup) no one provided any counter-proof, and the
                      only reponses were in agreement with my statement. It could be that just
                      means they are all as dumb as me, but I don't think there's an actual
                      mathematical difference.
                      Also, in the interest of playing the Monday morning quarterback, I've
                      wondered whether Microsoft would have been better off forcing all
                      dictionary keys to implement a special interface, say IDictionaryKey,
                      that inherits IEquatable and provides the GetHashCode method. That
                      way the requirement for implementing GetHashCode and Equals in tandem
                      could be enforced declaratively instead of having a special case the
                      compiler must check for. Opinion?
                      I'm also not sure what you mean here. Equals and GetHashCode are defined
                      in Object. All types have a default implementation (for reference types,
                      the default implementation is simply reference equality), so the compiler
                      doesn't need to check for them at all. The default implementation may or
                      may not be what's desired with respect to a dictionary key, but that's not
                      something the compiler can figure out anyway. There's no "special case
                      the compiler must check for".

                      So, no...I don't see the value in having an IDictionaryKey interface.

                      Pete

                      Comment

                      • Peter Morris

                        #12
                        Re: about Equals and GetHashCode

                        >Maybe it's just a typo, but did you mean to use age as-is or did you
                        >mean to call GetHashCode on it too?
                        GetHashCode returns an Int32. int == Int32. Therefore the value of an int
                        is already as unique as you can represent given 4 bytes. Therefore:

                        public struct Int32
                        {
                        ...
                        public override int GetHashCode()
                        {
                        return this;
                        }
                        ...
                        }




                        --
                        Pete
                        =============== =============== ===========
                        I use Enterprise Core Objects (Domain driven design)

                        =============== =============== ===========


                        Comment

                        • Brian Gideon

                          #13
                          Re: about Equals and GetHashCode

                          On Jun 12, 2:35 pm, "Peter Duniho" <NpOeStPe...@nn owslpianmk.com>
                          wrote:
                          I'm not sure what you mean about "cluster".  The general rule is that a  
                          type that's already effectively an int can just be used directly as its  
                          own hash code.  I'd be surprised if int.GetHashCode () does anything except  
                          return the int value anyway, but regardless, there's no reason not to just 
                          use the int directly as its own hash code.
                          You're right Int32 returns a hash code that is equal to the value so
                          it doesn't matter at all if GetHashCode is called. I'm using the term
                          clustering to describe the situation that arises when a set of objects
                          yield hash codes that tend to bunch together within a smaller subset
                          of the entire Int32 range because of similarities in there
                          attributes. Clustering causes inefficiency in a lot hash tables
                          implementations .

                          In the example people with the same name, but different ages would
                          yield hash codes that would cluster around around a value dominated by
                          the name and spread out over range of no more than 3700 (since 100 is
                          a reasonable life expectancy). The effect is that the hashtable may
                          become no better than a linked list if it contains people with the
                          same name.
                          >
                          I _do_ believe that pre-initializing the hashcode with 629 is pointless.  
                          You could start at 0 just as easily, and the values would mathematically  
                          be distributed the same, I believe.  At least, the last time I brought  
                          that up (in the Java newsgroup) no one provided any counter-proof, and the 
                          only reponses were in agreement with my statement.  It could be that just  
                          means they are all as dumb as me, but I don't think there's an actual  
                          mathematical difference.
                          Don't know.
                          I'm also not sure what you mean here.  Equals and GetHashCode are defined  
                          in Object.  All types have a default implementation (for reference types,  
                          the default implementation is simply reference equality), so the compiler  
                          doesn't need to check for them at all.  The default implementation may or  
                          may not be what's desired with respect to a dictionary key, but that's not 
                          something the compiler can figure out anyway.  There's no "special case  
                          the compiler must check for".
                          >
                          So, no...I don't see the value in having an IDictionaryKey interface.
                          >
                          The compiler has to be handling this as a special case otherwise you
                          wouldn't get the warning that "...overrid es Object.Equals but does not
                          override Object.GetHashC ode". An interface based approach would
                          eliminate this special case and be able to strictly enforce this
                          relationship between GetHashCode and Equals. Not mention that it
                          would encourage developers to put more thought into using custom
                          objects as dictionary keys (which they probably should be anyway).

                          Comment

                          • Marc Gravell

                            #14
                            Re: about Equals and GetHashCode

                            > Not mention that it
                            would encourage developers to put more thought into using custom
                            objects as dictionary keys (which they probably should be anyway).
                            My only problem with that is that people tend to forget that the
                            hashcode should be immutable, at least once it has been used as a key
                            - otherwise it can become unfindable...

                            Re the clustering - the most common mistake people make here would be
                            using xor - any matched pairs come out at zero, making for a great big
                            collision. I don't expect I'm telling anybody involved in this
                            dicsussion anything they don't know, but for the list's benefit ;-p

                            Marc

                            Comment

                            • Jon Skeet [C# MVP]

                              #15
                              Re: about Equals and GetHashCode

                              Brian Gideon <briangideon@ya hoo.comwrote:
                              public override int GetHashCode()
                              {
                                  int ret = 17;
                                  ret = 37*ret + age;
                                  ret = 37*ret + name.GetHashCod e();
                                  return ret;

                              }
                              Maybe it's just a typo, but did you mean to use age as-is or did you
                              mean to call GetHashCode on it too? The reason I'm asking is because
                              the way it is written would cause people with the same name to cluster
                              together. The effect would be more severe had age been merged into
                              the hash code after name.
                              I'd probably just leave it at that. Calling GetHashCode on age wouldn't
                              have don't anything, as Int32's implementation of GetHashCode is just
                              to return the original value.

                              I'd hope that a good hashtable implementation would effectively ignore
                              the "closeness" of hashcodes to some extent, for instance by bucketing
                              with some appropriate mod function.
                              Also, in the interest of playing the Monday morning quarterback, I've
                              wondered whether Microsoft would have been better off forcing all
                              dictionary keys to implement a special interface, say IDictionaryKey,
                              that inherits IEquatable and provides the GetHashCode method. That
                              way the requirement for implementing GetHashCode and Equals in tandem
                              could be enforced declaratively instead of having a special case the
                              compiler must check for. Opinion?
                              Agreed. Ironically I was thinking about blogging precisely this point,
                              although for slightly different reasons. Many types - most types, even
                              - aren't really suitable as hash keys unless you're actually talking
                              about identity. That can still be useful, but it's probably worth
                              knowing what you're talking about.

                              So long as you made an implementation of IEqualityCompar er available
                              which took the "native" implementation of the current
                              object.GetHashC ode available to cope with the identity part, I think it
                              would be entirely reasonable to not have GetHashCode and Equals as part
                              of object itself. I think of it as a mistake that Java made and then
                              ..NET copied.

                              (There may have been platforms before Java which had a single type
                              hierarchy where the uber-type had hashcode and equality defined - I
                              just don't know of it.)

                              --
                              Jon Skeet - <skeet@pobox.co m>
                              Web site: http://www.pobox.com/~skeet
                              Blog: http://www.msmvps.com/jon.skeet
                              C# in Depth: http://csharpindepth.com

                              Comment

                              Working...