valid types for Keys (Hashtable, Dictionary)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Joseph Bergevin

    valid types for Keys (Hashtable, Dictionary)

    Why can't I use an int[] for a key? Doing so evidently doesn't result in
    unique IDs being generated. I can convert the array into a delimited string,
    which works fine, but then I have a good deal of overhead parsing/casting the
    ints back.
    I'd obviously use a (jagged)multidi mensional array of my Value type instead
    of a key:value setup, but the int[] sets aren't tightly grouped, so I'd have
    a lot (maybe 85%) of empty slots in the array. Still, speed is my priority,
    so maybe it'd be best to just eat the memory cost? A full array would contain
    ~4 million items.
  • Joseph Bergevin

    #2
    RE: valid types for Keys (Hashtable, Dictionary)

    Ok. I just put the ints into a struct and used that as a key. Problem solved.
    Why didn't I think of that earlier?

    Comment

    • Bruce Wood

      #3
      Re: valid types for Keys (Hashtable, Dictionary)

      Depending upon what you are doing, that may have been the wrong
      solution. structs have very specific applications in .NET, and using
      them without due consideration can lead to grief.

      The reason that your int[]s didn't generate unique hash keys is that
      Array uses the default GetHashCode() method inherited from Object,
      which is documented as being no good for hashing. From the
      documentation:

      "The default implementation of GetHashCode does not guarantee
      uniqueness or consistency; therefore, it must not be used as a unique
      object identifier for hashing purposes."

      The point is that there was no way for the .NET team to come up with a
      one-size-fits all hash function, so they provided merely the signature
      and left it up to individual classes to decide how to generate hash
      keys.

      So, the solution to your original problem is to create your own _class_
      (probably not struct) to hold whatever it is you need to hold. That
      class should then override GetHashCode() (and Equals()... they go hand
      in hand) so that it can be used as a hash key.

      Comment

      • Joseph Bergevin

        #4
        Re: valid types for Keys (Hashtable, Dictionary)

        > Depending upon what you are doing, that may have been the wrong[color=blue]
        > solution. structs have very specific applications in .NET, and using
        > them without due consideration can lead to grief.[/color]

        So what is the specific utility of structs? I assumed that they were chiefly
        a handy way to group value types, like Point and Vector3 do. It seemed
        wasteful to me to make a class for three ints. I just assumed that object
        creation is slower, but I guess that everything in .NET is an object anyway.
        [color=blue]
        > "The default implementation of GetHashCode does not guarantee
        > uniqueness or consistency; therefore, it must not be used as a unique
        > object identifier for hashing purposes."[/color]

        Aha. I never came across that in my MS Press C# book. Most of the things
        I've read about Hashtables stop short of explaining the nuts and bolts.
        They're just some magical search tree to me.
        [color=blue]
        > So, the solution to your original problem is to create your own _class_
        > (probably not struct) to hold whatever it is you need to hold. That
        > class should then override GetHashCode() (and Equals()... they go hand
        > in hand) so that it can be used as a hash key.[/color]

        In my particular situation, I'm primarily interested in being able to look
        up a given object with a specific 3 int location, without having to create
        throwaway object intermediaries for comparisons.
        Perhaps for my purposes, since I know that none of the ints will be larger
        than, say, X, I could create a unique number from my int[a,b,c] that is
        a + (b*(X+1)) + (c*(X+1)*(X+1))
        and use this in the overrides you mentioned.
        I wish I knew exactly how .NET generated the default hash codes.

        Thanks for the feedback.

        Comment

        • Bruce Wood

          #5
          Re: valid types for Keys (Hashtable, Dictionary)


          Joseph Bergevin wrote:[color=blue][color=green]
          > > Depending upon what you are doing, that may have been the wrong
          > > solution. structs have very specific applications in .NET, and using
          > > them without due consideration can lead to grief.[/color]
          >
          > So what is the specific utility of structs? I assumed that they were chiefly
          > a handy way to group value types, like Point and Vector3 do. It seemed
          > wasteful to me to make a class for three ints. I just assumed that object
          > creation is slower, but I guess that everything in .NET is an object anyway.[/color]

          When you declare a struct, you create a _value_ type. That means that
          it acts like an int or a double. Whenever you assign it:

          MyValudType b = new MyValueType(... );
          MyValueType a = b;

          it is copied, just like an int or a double. Whenever you put it in a
          collection (such as an ArrayList or a Hashtable) it is boxed. In
          particular, if you make your value type mutable (not recommended) then
          you have to be very, very careful that you understand the exact
          semantics so that when you change the value you know that it changes
          and you know exactly where it changes and where it doesn't. (Using the
          example above, if you were allowed to say a.MyProperty = 5 then b would
          not change, because the contents of b were copied into a on
          assignment.)

          If in your case your three ints are spacial coordinates (x, y, z) and
          they were immutable (that is, you couldn't say coord.X = 5), then a
          struct would probably be the right way to go. If you make them mutable
          (coord.X = 5 is legal) then you could use a struct, but you would have
          to be very careful. (Search this newsgroup to see the kind of confusion
          that the behaviour of System.Drawing. Point causes.)

          The bottom line is that in .NET structs come with important semantic
          considerations: value semantics versus reference semantics. As such,
          they shouldn't be used in an offhand manner as a sort of "classes lite"
          because they introduce major changes in behaviour.
          [color=blue]
          > In my particular situation, I'm primarily interested in being able to look
          > up a given object with a specific 3 int location, without having to create
          > throwaway object intermediaries for comparisons.[/color]

          Well, keep in mind that the only thing you really save by using structs
          is memory management costs of allocating space for an object and then
          garbage collecting it later. That is, unless your structs are being
          boxed (as they would be if you used them in a Hashtable either as keys
          or values). In that case you don't save anything.

          Microsoft made System.Drawing. Point a struct because (IMHO) they do a
          lot of mathematics with points, and didn't want to create a gazillion
          Point objects as a result of intermediate calculation results and then
          have to garbage collect them all later. If it's just a question of the
          Hashtable, then I would say that it's a non-issue: your decision as to
          whether to use struct or class should rest with higher-level conceptual
          issues, not with fine points of efficiency.
          [color=blue]
          > Perhaps for my purposes, since I know that none of the ints will be larger
          > than, say, X, I could create a unique number from my int[a,b,c] that is
          > a + (b*(X+1)) + (c*(X+1)*(X+1))
          > and use this in the overrides you mentioned.[/color]

          Yes... that would probably be a good strategy, so long as X never
          changes during a particular run of your program. Another way is to
          leverage off of the existing GetHashCode method in int:

          public override int GetHashCode()
          {
          return a.GetHashCode() + b.GetHashCode() + c.GetHashCode() ;
          }

          or even

          public override int GetHashCode()
          {
          return a + b + c;
          }

          Remember... you're not after a unique value, just a value that will
          provide a nice, even distribution across all ints.
          [color=blue]
          > I wish I knew exactly how .NET generated the default hash codes.[/color]

          Well, the default for Object probably just returns zero or something,
          or maybe a fixed value for each object type. Each of the Framework
          classes that overrides GetHashCode has its own algorithm. I tend to
          assume that those are good, and leverage off of them when I can. For
          example, if my class needs to hash on two key string fields, I just say

          return string1.GetHash Code() + string2.GetHash Code();

          assuming that Microsoft pays math wizards big bucks to come up with
          good hash functions for strings. :-)

          Comment

          Working...