HashTable

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Sreekanth

    HashTable

    Hello,
    Is there any better collection than HashTable in terms of performance, when
    the type of the key is integer?
    Regards,
    Sreekanth.


  • Jon Skeet [C# MVP]

    #2
    Re: HashTable

    Sreekanth <sreekanth@yaho o.com> wrote:[color=blue]
    > Is there any better collection than HashTable in terms of performance, when
    > the type of the key is integer?[/color]

    Well, you *could* write your own version - but have you made sure that
    the bottleneck in your code is in Hashtable first?

    --
    Jon Skeet - <skeet@pobox.co m>
    Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.

    If replying to the group, please do not mail me too

    Comment

    • Dennis Myrén

      #3
      Re: HashTable

      I have written such a Hashtable.
      It uses Int32 for keys and values(the type to use for the values one could
      change), beating System.Collecti ons.Hashtable in terms of performance.

      Let me know if this is anything of interest for you.

      --
      Regards,
      Dennis JD Myrén
      Oslo Kodebureau
      "Sreekanth" <sreekanth@yaho o.com> wrote in message
      news:csnc6b$f07 $1@news.mch.sbs .de...[color=blue]
      > Hello,
      > Is there any better collection than HashTable in terms of performance,
      > when
      > the type of the key is integer?
      > Regards,
      > Sreekanth.
      >
      >[/color]


      Comment

      • Ollie Riches

        #4
        Re: HashTable

        By how much?


        "Dennis Myrén" <dennis@oslokb. no> wrote in message
        news:K4KHd.5221 $IW4.107296@new s2.e.nsc.no...[color=blue]
        > I have written such a Hashtable.
        > It uses Int32 for keys and values(the type to use for the values one could
        > change), beating System.Collecti ons.Hashtable in terms of performance.
        >
        > Let me know if this is anything of interest for you.
        >
        > --
        > Regards,
        > Dennis JD Myrén
        > Oslo Kodebureau
        > "Sreekanth" <sreekanth@yaho o.com> wrote in message
        > news:csnc6b$f07 $1@news.mch.sbs .de...[color=green]
        > > Hello,
        > > Is there any better collection than HashTable in terms of performance,
        > > when
        > > the type of the key is integer?
        > > Regards,
        > > Sreekanth.
        > >
        > >[/color]
        >
        >[/color]


        Comment

        • Dennis Myrén

          #5
          Re: HashTable

          You mean by how much, in performance?
          If so, i think it is almost twice as fast.

          If you mean the price, of course it is free.

          --
          Regards,
          Dennis JD Myrén
          Oslo Kodebureau
          "Ollie Riches" <ollie_riches@h otmail.com> wrote in message
          news:uhoGygt$EH A.824@TK2MSFTNG P11.phx.gbl...[color=blue]
          > By how much?
          >
          >
          > "Dennis Myrén" <dennis@oslokb. no> wrote in message
          > news:K4KHd.5221 $IW4.107296@new s2.e.nsc.no...[color=green]
          >> I have written such a Hashtable.
          >> It uses Int32 for keys and values(the type to use for the values one
          >> could
          >> change), beating System.Collecti ons.Hashtable in terms of performance.
          >>
          >> Let me know if this is anything of interest for you.
          >>
          >> --
          >> Regards,
          >> Dennis JD Myrén
          >> Oslo Kodebureau
          >> "Sreekanth" <sreekanth@yaho o.com> wrote in message
          >> news:csnc6b$f07 $1@news.mch.sbs .de...[color=darkred]
          >> > Hello,
          >> > Is there any better collection than HashTable in terms of performance,
          >> > when
          >> > the type of the key is integer?
          >> > Regards,
          >> > Sreekanth.
          >> >
          >> >[/color]
          >>
          >>[/color]
          >
          >[/color]


          Comment

          • Ollie Riches

            #6
            Re: HashTable

            when you 'twice as fast' how many values did you test up to? 100, 1000,
            10000, 1000000 etc

            Cheers

            Ollie

            "Dennis Myrén" <dennis@oslokb. no> wrote in message
            news:ZBLHd.5229 $IW4.107503@new s2.e.nsc.no...[color=blue]
            > You mean by how much, in performance?
            > If so, i think it is almost twice as fast.
            >
            > If you mean the price, of course it is free.
            >
            > --
            > Regards,
            > Dennis JD Myrén
            > Oslo Kodebureau
            > "Ollie Riches" <ollie_riches@h otmail.com> wrote in message
            > news:uhoGygt$EH A.824@TK2MSFTNG P11.phx.gbl...[color=green]
            > > By how much?
            > >
            > >
            > > "Dennis Myrén" <dennis@oslokb. no> wrote in message
            > > news:K4KHd.5221 $IW4.107296@new s2.e.nsc.no...[color=darkred]
            > >> I have written such a Hashtable.
            > >> It uses Int32 for keys and values(the type to use for the values one
            > >> could
            > >> change), beating System.Collecti ons.Hashtable in terms of performance.
            > >>
            > >> Let me know if this is anything of interest for you.
            > >>
            > >> --
            > >> Regards,
            > >> Dennis JD Myrén
            > >> Oslo Kodebureau
            > >> "Sreekanth" <sreekanth@yaho o.com> wrote in message
            > >> news:csnc6b$f07 $1@news.mch.sbs .de...
            > >> > Hello,
            > >> > Is there any better collection than HashTable in terms of[/color][/color][/color]
            performance,[color=blue][color=green][color=darkred]
            > >> > when
            > >> > the type of the key is integer?
            > >> > Regards,
            > >> > Sreekanth.
            > >> >
            > >> >
            > >>
            > >>[/color]
            > >
            > >[/color]
            >
            >[/color]


            Comment

            • Dennis Myrén

              #7
              Re: HashTable

              Well, first i tested adding a million slots to my hashtable and
              System.Collecti ons.Hashtable.
              I repeated that action 5 times.
              The results(on my computer) is as follows:
              Baretta.Int32Ha shtable added 1000000 slots in 500ms
              Baretta.Int32Ha shtable added 1000000 slots in 490ms
              Baretta.Int32Ha shtable added 1000000 slots in 550ms
              Baretta.Int32Ha shtable added 1000000 slots in 490ms
              Baretta.Int32Ha shtable added 1000000 slots in 590ms
              System.Collecti ons.Hashtable added 1000000 slots in 540ms
              System.Collecti ons.Hashtable added 1000000 slots in 741ms
              System.Collecti ons.Hashtable added 1000000 slots in 931ms
              System.Collecti ons.Hashtable added 1000000 slots in 801ms
              System.Collecti ons.Hashtable added 1000000 slots in 821ms

              So, when adding the keys and values to the table, my implementation
              is a little, but not much faster.

              But, the real performance gain we get when we want to access these keys and
              values.
              I looped a million times and accessed each slot in the hashtables.
              I repeated that action 5 times.
              These are the results here:
              Baretta.Int32Ha shtable read 1000000 slots in 40ms
              Baretta.Int32Ha shtable read 1000000 slots in 30ms
              Baretta.Int32Ha shtable read 1000000 slots in 40ms
              Baretta.Int32Ha shtable read 1000000 slots in 40ms
              Baretta.Int32Ha shtable read 1000000 slots in 30ms
              System.Collecti ons.Hashtable read 1000000 slots in 290ms
              System.Collecti ons.Hashtable read 1000000 slots in 280ms
              System.Collecti ons.Hashtable read 1000000 slots in 430ms
              System.Collecti ons.Hashtable read 1000000 slots in 340ms
              System.Collecti ons.Hashtable read 1000000 slots in 280ms

              As you can see, now we are starting to really earn in terms of performance.

              You are free to obtain my implementation and do your own benchmarks.


              --
              Regards,
              Dennis JD Myrén
              Oslo Kodebureau
              "Ollie Riches" <ollie_riches@h otmail.com> wrote in message
              news:ePTUFst$EH A.3596@TK2MSFTN GP12.phx.gbl...[color=blue]
              > when you 'twice as fast' how many values did you test up to? 100, 1000,
              > 10000, 1000000 etc
              >
              > Cheers
              >
              > Ollie
              >
              > "Dennis Myrén" <dennis@oslokb. no> wrote in message
              > news:ZBLHd.5229 $IW4.107503@new s2.e.nsc.no...[color=green]
              >> You mean by how much, in performance?
              >> If so, i think it is almost twice as fast.
              >>
              >> If you mean the price, of course it is free.
              >>
              >> --
              >> Regards,
              >> Dennis JD Myrén
              >> Oslo Kodebureau
              >> "Ollie Riches" <ollie_riches@h otmail.com> wrote in message
              >> news:uhoGygt$EH A.824@TK2MSFTNG P11.phx.gbl...[color=darkred]
              >> > By how much?
              >> >
              >> >
              >> > "Dennis Myrén" <dennis@oslokb. no> wrote in message
              >> > news:K4KHd.5221 $IW4.107296@new s2.e.nsc.no...
              >> >> I have written such a Hashtable.
              >> >> It uses Int32 for keys and values(the type to use for the values one
              >> >> could
              >> >> change), beating System.Collecti ons.Hashtable in terms of performance.
              >> >>
              >> >> Let me know if this is anything of interest for you.
              >> >>
              >> >> --
              >> >> Regards,
              >> >> Dennis JD Myrén
              >> >> Oslo Kodebureau
              >> >> "Sreekanth" <sreekanth@yaho o.com> wrote in message
              >> >> news:csnc6b$f07 $1@news.mch.sbs .de...
              >> >> > Hello,
              >> >> > Is there any better collection than HashTable in terms of[/color][/color]
              > performance,[color=green][color=darkred]
              >> >> > when
              >> >> > the type of the key is integer?
              >> >> > Regards,
              >> >> > Sreekanth.
              >> >> >
              >> >> >
              >> >>
              >> >>
              >> >
              >> >[/color]
              >>
              >>[/color]
              >
              >[/color]


              Comment

              • Helge Jensen

                #8
                Re: HashTable

                Dennis Myrén wrote:
                [color=blue]
                > You mean by how much, in performance?
                > If so, i think it is almost twice as fast.[/color]

                Is there a place I can see this implementation?

                Have you any idea about how much of this performance gain came from not
                doing boxing?

                I have implemented a hashtable mapping object to object, and have
                achieved about 10-30% better performance than
                System.Collecti ons.Hashtable, while using less memory. Testing Add's and
                Remove's was done using ints both as key and value, either ordered or
                randomly selected, sizes of 100 upto 3M was tested, showing my
                implementation to be consistently faster and less memory expending than
                System.Collecti ons.Hashtable.

                System.Collecti ons.Hashtable didn't work very well with sizes above 3M
                which is acceptable for me. BTW: I have 512M real-mem.

                The (important) choice that made a performance impact was the choice of
                datatstructure and indexing: hash in object[2*sz*Math.Log/Math.Log(sz)],
                index using key: x*y_sz+y, value: x*y_sz+y+1, instead of [x,y].

                The resulting datastructure has really good locality of reference for
                searching the collided keys, they are litterally next to eachother in
                memory.

                Some day, when i switch to .NET2 i'll try the impl out as a
                IDictionary<T>, and see what I gain from that in the "int"-case.

                --
                Helge

                Comment

                • Helge Jensen

                  #9
                  Re: HashTable

                  Helge Jensen wrote:
                  [color=blue]
                  > Is there a place I can see this implementation?[/color]

                  Rudely, I forgot to show you mine before asking you to show me yours.
                  Sorry, forgot to put the URL in.

                  Straight from the subversion repo:


                  WARNING: while the code passes all my tests, there may be bugs.
                  Certainly some areas of it need to get cleaned up, (and now for the
                  exuses) but while experinmenting to gain performance strange things
                  happens to code, and in not in a hurry to fix it, since it seems to work
                  so...

                  --
                  Helge

                  Comment

                  • Dennis Myrén

                    #10
                    Re: HashTable

                    I will be happy to see your implementation.

                    I have now put mine in a ZIP so you can download it.

                    Regarding your question about how much i earn from not boxing/unboxing,
                    yes i think actually i earn a lot from that.
                    In .NET/C# 2.0, i might not earn anything from this implementation anymore,
                    since the support for Generics will solve the boxing/unboxing "issue".

                    I also posted a small article to DeveloperLand as i had written this
                    implementation,
                    you can read here if you want:

                    But, do *not* download that source, but rather download from the URL below,
                    as the version posted to DeveloperLand is old.
                    So, use this one, updated today:


                    Please let me know of any eventual bugs you might find.

                    --
                    Regards,
                    Dennis JD Myrén
                    Oslo Kodebureau
                    "Helge Jensen" <helge.jensen@s log.dk> wrote in message
                    news:ObTY79t$EH A.3028@TK2MSFTN GP10.phx.gbl...[color=blue]
                    > Helge Jensen wrote:
                    >[color=green]
                    >> Is there a place I can see this implementation?[/color]
                    >
                    > Rudely, I forgot to show you mine before asking you to show me yours.
                    > Sorry, forgot to put the URL in.
                    >
                    > Straight from the subversion repo:
                    > https://slog.dk/svn/home/jensen/graphutil/trunk/
                    >
                    > WARNING: while the code passes all my tests, there may be bugs. Certainly
                    > some areas of it need to get cleaned up, (and now for the exuses) but
                    > while experinmenting to gain performance strange things happens to code,
                    > and in not in a hurry to fix it, since it seems to work so...
                    >
                    > --
                    > Helge[/color]


                    Comment

                    • Helge Jensen

                      #11
                      Re: HashTable

                      Dennis Myrén wrote:[color=blue]
                      > I will be happy to see your implementation.[/color]
                      [color=blue]
                      > Please let me know of any eventual bugs you might find.[/color]

                      I can see that your implementation uses a linked list for colliding
                      keys. You can try my idea out, with using an array, it really made quite
                      a lot of runtime difference, esp. on the lookups and on big
                      data-structures, since the list of collided keys to be searched when
                      looking up would be on the same 1 or 2 pages of memory.

                      The 2 optimizations: data-structure and and removal of boxing are
                      independant, and you should probably see a considerable performance benefit.

                      --
                      Helge

                      Comment

                      • Dennis Myrén

                        #12
                        Re: HashTable

                        Helge,

                        That is probably a good idea.

                        I will look into it.

                        Thank you for your advice.


                        --
                        Regards,
                        Dennis JD Myrén
                        Oslo Kodebureau
                        "Helge Jensen" <helge.jensen@s log.dk> wrote in message
                        news:41EF9A38.7 000708@slog.dk. ..[color=blue]
                        > Dennis Myrén wrote:[color=green]
                        >> I will be happy to see your implementation.[/color]
                        >[color=green]
                        >> Please let me know of any eventual bugs you might find.[/color]
                        >
                        > I can see that your implementation uses a linked list for colliding keys.
                        > You can try my idea out, with using an array, it really made quite a lot
                        > of runtime difference, esp. on the lookups and on big data-structures,
                        > since the list of collided keys to be searched when looking up would be on
                        > the same 1 or 2 pages of memory.
                        >
                        > The 2 optimizations: data-structure and and removal of boxing are
                        > independant, and you should probably see a considerable performance
                        > benefit.
                        >
                        > --
                        > Helge[/color]


                        Comment

                        • Helge Jensen

                          #13
                          Re: HashTable

                          [color=blue]
                          > Please let me know of any eventual bugs you might find.[/color]

                          Here are a few things that jumped into my eyes:

                          * try/catch:

                          try
                          {
                          return _host [_keys [_i]];
                          }
                          catch (System.IndexOu tOfRangeExcepti on e)
                          {
                          throw e;
                          }

                          Could be written just as:

                          return _host [_keys [_i]];

                          There is really no good reason to catch an rethrow that exception, is there?

                          If you need to do some work on the way out in a callstack, you can use
                          finally, or when you need to use the Exception (for logging for
                          example), using "throw" instead of "throw e" preserves the original
                          stacktrace.

                          * Enumerator:

                          You define enumeration to be sorted, there is no reason for that, the
                          basic iteration of hashtables or dictionaries are not guranteed to be
                          sorted, sorting is expensive (in this context where operations are O(1)
                          :) and requires you to spen O(capacity) memory, which is pretty much for
                          large hashed.

                          To do sorted iteration, the user can easily do:

                          int[] keys = new int[dict.Count]
                          dict.keys.CopyT o(keys, 0);
                          foreach ( int key in keys ) {
                          object val = dict[key];
                          }

                          or you can define a class SortedCollectio n that accepts an ICollection
                          and provides sorted Enumeration.

                          * indexing:

                          why do your do "& int.MaxValue"?

                          int i = (key & int.MaxValue) % _slots.Length;

                          Rather than

                          int index = (uint)key % _slots.Length;

                          It's less expensive, and doesn't disregard the most-significant bit of
                          your key

                          * .Values

                          Return a new instance of ICollection that simply iterates the hashtable,
                          returning only the keys. This saves a lot of memory.

                          If the user want's an array he can do dict.Values.Cop yTo(array, index).

                          * int this[int key] { set }

                          * Add(key, value, overwrite)

                          Why do you accept ref's to key and value?

                          * Clear:

                          use the System.Array methods for clearing an array, or simple assign
                          a new array to _slots

                          --
                          Helge

                          Comment

                          • Dennis Myrén

                            #14
                            Re: HashTable

                            Thank you, Helge.
                            I have done all changes you suggested except in get_Values.
                            The ZIP is updated:


                            The reason of the Add method of signature:
                            Add ( ref int, ref int, bool )
                            is that it is a private method used internally only.
                            I use ref to prevent a copy of the Int32 being made, doing what i can to
                            gain performance.

                            --
                            Regards,
                            Dennis JD Myrén
                            Oslo Kodebureau
                            "Helge Jensen" <helge.jensen@s log.dk> wrote in message
                            news:OR2Maku$EH A.3120@TK2MSFTN GP12.phx.gbl...[color=blue]
                            >[color=green]
                            >> Please let me know of any eventual bugs you might find.[/color]
                            >
                            > Here are a few things that jumped into my eyes:
                            >
                            > * try/catch:
                            >
                            > try
                            > {
                            > return _host [_keys [_i]];
                            > }
                            > catch (System.IndexOu tOfRangeExcepti on e)
                            > {
                            > throw e;
                            > }
                            >
                            > Could be written just as:
                            >
                            > return _host [_keys [_i]];
                            >
                            > There is really no good reason to catch an rethrow that exception, is
                            > there?
                            >
                            > If you need to do some work on the way out in a callstack, you can use
                            > finally, or when you need to use the Exception (for logging for example),
                            > using "throw" instead of "throw e" preserves the original stacktrace.
                            >
                            > * Enumerator:
                            >
                            > You define enumeration to be sorted, there is no reason for that, the
                            > basic iteration of hashtables or dictionaries are not guranteed to be
                            > sorted, sorting is expensive (in this context where operations are O(1) :)
                            > and requires you to spen O(capacity) memory, which is pretty much for
                            > large hashed.
                            >
                            > To do sorted iteration, the user can easily do:
                            >
                            > int[] keys = new int[dict.Count]
                            > dict.keys.CopyT o(keys, 0);
                            > foreach ( int key in keys ) {
                            > object val = dict[key];
                            > }
                            >
                            > or you can define a class SortedCollectio n that accepts an ICollection and
                            > provides sorted Enumeration.
                            >
                            > * indexing:
                            >
                            > why do your do "& int.MaxValue"?
                            >
                            > int i = (key & int.MaxValue) % _slots.Length;
                            >
                            > Rather than
                            >
                            > int index = (uint)key % _slots.Length;
                            >
                            > It's less expensive, and doesn't disregard the most-significant bit of
                            > your key
                            >
                            > * .Values
                            >
                            > Return a new instance of ICollection that simply iterates the hashtable,
                            > returning only the keys. This saves a lot of memory.
                            >
                            > If the user want's an array he can do dict.Values.Cop yTo(array, index).
                            >
                            > * int this[int key] { set }
                            >
                            > * Add(key, value, overwrite)
                            >
                            > Why do you accept ref's to key and value?
                            >
                            > * Clear:
                            >
                            > use the System.Array methods for clearing an array, or simple assign a
                            > new array to _slots
                            >
                            > --
                            > Helge[/color]


                            Comment

                            • Richard Blewett [DevelopMentor]

                              #15
                              Re: HashTable

                              Aren't you just putting a 32 bit reference rather than a 32 bit integer on the stack instead? I don't see the benefit myself

                              Regards

                              Richard Blewett - DevelopMentor



                              Thank you, Helge.
                              I have done all changes you suggested except in get_Values.
                              The ZIP is updated:


                              The reason of the Add method of signature:
                              Add ( ref int, ref int, bool )
                              is that it is a private method used internally only.
                              I use ref to prevent a copy of the Int32 being made, doing what i can to
                              gain performance.

                              Comment

                              Working...