fastest searchable datastructure?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Pieter

    fastest searchable datastructure?

    Hi,

    I need some type of array/list/... In which I can store objects together
    with a unique key. The most important thing is performance: I will need to
    do the whole time searches in the list of given keys. Which datastructure
    will be best suited for this? A Hashtable? The list may contain upto 10^12
    items, bit more proably most of the time +- 10^9 of even 10^6...

    Thanks a lot in advance,


    Pieter


  • Pieter

    #2
    Re: fastest searchable datastructure?

    Thansk both for your answer!

    Actually: it's not a type: it could go theoretical upto 6*10^12 :-)
    But as I said: I expect a practical implementation of +- 10^6...

    I will use indeed SQL Server when the amoutn will be too big... But in case
    it's underneath 10^6: Will a dictionary be better than a Hashtable? Or
    soemthing else?

    "Marc Gravell" <marc.gravell@g mail.comwrote in message
    news:e0a26adb-39fe-4600-bdf4-261730e28d81@h1 1g2000prf.googl egroups.com...
    >I need some type of array/list/... In which I can store objects together
    >with a unique key.
    Sounds like Dictionary<TKey , TValue>...
    >
    >The list may contain upto 10^12
    Seriously? You do realise that even at one byte per item, with no
    index overhead, padding, etc that's a TB?
    >
    I'm going to assume that is a typo - but even so, for large numbers
    you may be better using a database approach, with a non-clustered
    unique index on this field (only; don't span - let it use a bookmark
    lookup to get the value) and ideally with the index on a different
    file-group to the main data.
    >
    Marc

    Comment

    • =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=

      #3
      RE: fastest searchable datastructure?

      Do you know how much memory 10^12 of these "items" is going to require?
      Because if you run out of physical RAM, your whole process will either blow
      up or slow down dramatically due to paging.
      So the best alternative may be a database to hold your items. A search on a
      table with a clustered index on a primary key in this case will be the
      fastest.
      -- Peter
      Site: http://www.eggheadcafe.com
      UnBlog: http://petesbloggerama.blogspot.com
      MetaFinder: http://www.blogmetafinder.com


      "Pieter" wrote:
      Hi,
      >
      I need some type of array/list/... In which I can store objects together
      with a unique key. The most important thing is performance: I will need to
      do the whole time searches in the list of given keys. Which datastructure
      will be best suited for this? A Hashtable? The list may contain upto 10^12
      items, bit more proably most of the time +- 10^9 of even 10^6...
      >
      Thanks a lot in advance,
      >
      >
      Pieter
      >
      >
      >

      Comment

      • Pieter

        #4
        Re: fastest searchable datastructure?

        Hehe I do know :-)
        The problem is: it will be for an experiment, and not every possiblity will
        happen as much as the others. So I want definetly put the most popular in
        some kind of local cache... See it as putting the first block of a B-tree in
        the RAM memory...

        "Peter Bromberg [C# MVP]" <pbromberg@yaho o.NoSpamMaam.co mwrote in message
        news:E081A83D-31AB-4309-A50A-DCEEAD905AD2@mi crosoft.com...
        Do you know how much memory 10^12 of these "items" is going to require?
        Because if you run out of physical RAM, your whole process will either
        blow
        up or slow down dramatically due to paging.
        So the best alternative may be a database to hold your items. A search on
        a
        table with a clustered index on a primary key in this case will be the
        fastest.
        -- Peter
        Site: http://www.eggheadcafe.com
        UnBlog: http://petesbloggerama.blogspot.com
        MetaFinder: http://www.blogmetafinder.com
        >
        >
        "Pieter" wrote:
        >
        >Hi,
        >>
        >I need some type of array/list/... In which I can store objects together
        >with a unique key. The most important thing is performance: I will need
        >to
        >do the whole time searches in the list of given keys. Which datastructure
        >will be best suited for this? A Hashtable? The list may contain upto
        >10^12
        >items, bit more proably most of the time +- 10^9 of even 10^6...
        >>
        >Thanks a lot in advance,
        >>
        >>
        >Pieter
        >>
        >>
        >>

        Comment

        • =?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

          #5
          Re: fastest searchable datastructure?

          Pieter wrote:
          Thansk both for your answer!
          >
          Actually: it's not a type: it could go theoretical upto 6*10^12 :-)
          But as I said: I expect a practical implementation of +- 10^6...
          >
          I will use indeed SQL Server when the amoutn will be too big... But in case
          it's underneath 10^6: Will a dictionary be better than a Hashtable? Or
          soemthing else?
          >
          "Marc Gravell" <marc.gravell@g mail.comwrote in message
          news:e0a26adb-39fe-4600-bdf4-261730e28d81@h1 1g2000prf.googl egroups.com...
          >>I need some type of array/list/... In which I can store objects together
          >>with a unique key.
          >Sounds like Dictionary<TKey , TValue>...
          >>
          >>The list may contain upto 10^12
          >Seriously? You do realise that even at one byte per item, with no
          >index overhead, padding, etc that's a TB?
          >>
          >I'm going to assume that is a typo - but even so, for large numbers
          >you may be better using a database approach, with a non-clustered
          >unique index on this field (only; don't span - let it use a bookmark
          >lookup to get the value) and ideally with the index on a different
          >file-group to the main data.
          >>
          >Marc
          >
          >
          I would carefully avoid solutions that require you to write the same
          type of code multiple times, one type for < 10^6, one for <10^9 and one
          for <10^12. If you need as many as 10^12, use a database.

          If you absolutely want to draw the best performance out of everything,
          abstract out the storage of this list to a new class, so that you can
          inherit for it for an in-memory data structure, and inherit from it for
          a database structure.

          --
          Lasse Vågsæther Karlsen
          mailto:lasse@vk arlsen.no
          Blogger ist ein Veröffentlichungs-Tool von Google, mit dem du ganz einfach deine Gedanken der Welt mitteilen kannst. Mit Blogger kannst du problemlos Texte, Fotos und Videos in deinem persönlichen Blog oder deinem Team-Blog veröffentlichen.

          PGP KeyID: 0xBCDEA2E3

          Comment

          • Pieter

            #6
            Re: fastest searchable datastructure?

            Thanks Lasse. What it actually will do is: every object will be another
            possivble state. But some states will be much more needed than others. So
            the 'popular' ones will be put in a local cache. But it's the structure of
            that cache that I'm worrying about: As mayb 80% of the searching will happen
            in there, it shoudl be as fast as possible: So: which structure to use?

            "Lasse Vågsæther Karlsen" <lasse@vkarlsen .nowrote in message
            news:%23DjAODFW IHA.4712@TK2MSF TNGP04.phx.gbl. ..
            I would carefully avoid solutions that require you to write the same type
            of code multiple times, one type for < 10^6, one for <10^9 and one for
            <10^12. If you need as many as 10^12, use a database.
            >
            If you absolutely want to draw the best performance out of everything,
            abstract out the storage of this list to a new class, so that you can
            inherit for it for an in-memory data structure, and inherit from it for a
            database structure.
            >
            --
            Lasse Vågsæther Karlsen
            mailto:lasse@vk arlsen.no
            Blogger ist ein Veröffentlichungs-Tool von Google, mit dem du ganz einfach deine Gedanken der Welt mitteilen kannst. Mit Blogger kannst du problemlos Texte, Fotos und Videos in deinem persönlichen Blog oder deinem Team-Blog veröffentlichen.

            PGP KeyID: 0xBCDEA2E3

            Comment

            • Alberto Poblacion

              #7
              Re: fastest searchable datastructure?

              "Pieter" <pieterNOSPAMco ucke@hotmail.co mwrote in message
              news:OHd%23$LEW IHA.5448@TK2MSF TNGP04.phx.gbl. ..
              I need some type of array/list/... In which I can store objects together
              with a unique key. The most important thing is performance: I will need to
              do the whole time searches in the list of given keys. Which datastructure
              will be best suited for this? A Hashtable? The list may contain upto 10^12
              items, bit more proably most of the time +- 10^9 of even 10^6...
              You will get the fastest performance with a Hash Table. Assuming that
              you choose a good algorithm to assign the hash values, the hash table has
              the advantage that the average number of accesses to the table to find a key
              does not depend on the size of the table, but only on the percent of the
              table that is full. With a table that is not terribly full (say 80 o 90%, I
              don't remember the figures), the average number of accesses is one point
              something. This beats a search on a B-Tree, which requires a number of
              accesses that grows with the logarithm of the size of the table. Note that
              I'm speaking about the *average* number of accesses. The *worst-case*
              scenario is horribly bad, since it would require a full-scan of the hash
              table. Fortunately, this is extremely improbable, on the condition that the
              hashing algorithm and the collisions algorithm are properly chosen.

              If you are going to deal with in-memory elemets using .Net, you can use
              a Dictionary<key, value>, which will automatically use a hash table when the
              number of stored elements is larger than some internally coded threshold.
              You will need to use 64-bit code (and run it in a huge machine) if you want
              to address 10^12 elements. If you need to store your data on disk, a
              properly programmed hashing algorithm against a flat file can outperform a
              database server, which uses trees to store its indices.

              Comment

              • =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=

                #8
                RE: fastest searchable datastructure?

                I'd think that a Generic List or Generic Dictionary would be faster as there
                would be no boxing / unboxing involved in adding or retrieving typed elements.
                -- Peter
                Site: http://www.eggheadcafe.com
                UnBlog: http://petesbloggerama.blogspot.com
                MetaFinder: http://www.blogmetafinder.com


                "Pieter" wrote:
                Hi,
                >
                I need some type of array/list/... In which I can store objects together
                with a unique key. The most important thing is performance: I will need to
                do the whole time searches in the list of given keys. Which datastructure
                will be best suited for this? A Hashtable? The list may contain upto 10^12
                items, bit more proably most of the time +- 10^9 of even 10^6...
                >
                Thanks a lot in advance,
                >
                >
                Pieter
                >
                >
                >

                Comment

                • Cor Ligthert[MVP]

                  #9
                  Re: fastest searchable datastructure?

                  Pieter,

                  In past there was in this newsgroup somebody from Belgie active who was
                  always answering this question with.

                  Sorted List

                  Represents a collection of key/value pairs that are sorted by the keys and are accessible by key and by index.


                  I have no expirience with that

                  Cor

                  Comment

                  • Kevin Spencer

                    #10
                    Re: fastest searchable datastructure?

                    A Hashtable would be faster in terms of searching, as long as one was
                    searching by a key value.

                    --
                    HTH,

                    Kevin Spencer
                    Chicken Salad Surgeon
                    Microsoft MVP

                    "Peter Bromberg [C# MVP]" <pbromberg@yaho o.NoSpamMaam.co mwrote in message
                    news:E9AA9507-D257-421F-A894-185ED54A0BFF@mi crosoft.com...
                    I'd think that a Generic List or Generic Dictionary would be faster as
                    there
                    would be no boxing / unboxing involved in adding or retrieving typed
                    elements.
                    -- Peter
                    Site: http://www.eggheadcafe.com
                    UnBlog: http://petesbloggerama.blogspot.com
                    MetaFinder: http://www.blogmetafinder.com
                    >
                    >
                    "Pieter" wrote:
                    >
                    >Hi,
                    >>
                    >I need some type of array/list/... In which I can store objects together
                    >with a unique key. The most important thing is performance: I will need
                    >to
                    >do the whole time searches in the list of given keys. Which datastructure
                    >will be best suited for this? A Hashtable? The list may contain upto
                    >10^12
                    >items, bit more proably most of the time +- 10^9 of even 10^6...
                    >>
                    >Thanks a lot in advance,
                    >>
                    >>
                    >Pieter
                    >>
                    >>
                    >>

                    Comment

                    • Michel Posseth  [MCP]

                      #11
                      Re: fastest searchable datastructure?

                      Hello Pieter

                      Balena did some tests and posted the results on his website , however with
                      the transition form VB2themax to dotnet2themax the article seems to be
                      disapeared
                      However the result was that the hashtable was and still is the fastest key
                      value pair object , the higher the amount of data was the more efiicient the
                      hashtable was in comparisation to other methods for sowell insertion as
                      retrieving of the key value pairs ( Balena timed both the time it took to
                      built up the data tree as the time it took to retrieve n elements ) ,
                      however with the hughe amounts you are talking about a database would be a
                      much bether idea


                      HTH

                      Michel







                      "Pieter" <pieterNOSPAMco ucke@hotmail.co mschreef in bericht
                      news:OHd%23$LEW IHA.5448@TK2MSF TNGP04.phx.gbl. ..
                      Hi,
                      >
                      I need some type of array/list/... In which I can store objects together
                      with a unique key. The most important thing is performance: I will need to
                      do the whole time searches in the list of given keys. Which datastructure
                      will be best suited for this? A Hashtable? The list may contain upto 10^12
                      items, bit more proably most of the time +- 10^9 of even 10^6...
                      >
                      Thanks a lot in advance,
                      >
                      >
                      Pieter
                      >

                      Comment

                      • Michel Posseth  [MCP]

                        #12
                        Re: fastest searchable datastructure?


                        Found some numbers ( needed to do some translation )

                        The test was with 100.000 items

                        compared was ArrayList, HashTable en SortedList

                        Arraylist was 4 times faster as the hashtable , hashtable was 8 - 100 times
                        faster as the sortedlist

                        please if you write some comparisation routines yourself then don`t forget
                        to set compile options in VB.Net remove overflow checks and enable
                        optimizations
                        otherwise we get again trolling messages here that C# is so much faster as
                        VB.Net with these options the same as they are in C# they should evaluate at
                        the same speed




                        "Pieter" <pieterNOSPAMco ucke@hotmail.co mschreef in bericht
                        news:OHd%23$LEW IHA.5448@TK2MSF TNGP04.phx.gbl. ..
                        Hi,
                        >
                        I need some type of array/list/... In which I can store objects together
                        with a unique key. The most important thing is performance: I will need to
                        do the whole time searches in the list of given keys. Which datastructure
                        will be best suited for this? A Hashtable? The list may contain upto 10^12
                        items, bit more proably most of the time +- 10^9 of even 10^6...
                        >
                        Thanks a lot in advance,
                        >
                        >
                        Pieter
                        >

                        Comment

                        • Marc Gravell

                          #13
                          Re: fastest searchable datastructure?

                          Arraylist was 4 times faster as the hashtable
                          The other question would be: "at what, exactly?"

                          If the timings include the insertion time, then yes: a hash-table
                          (Hashtable or Dictionary<TKey ,TValue>) will take longer: it needs to
                          obtain hashes, build buckets, etc (an ArrayList or List<Tjust has to
                          copy an array a few times). But the *read* performance should then be
                          /significantly/ better, assuming that the items are being fetched by
                          key (and not by looping).

                          Without inspectable, reproducable code I'm going to take this with a
                          pinch of salt...

                          Marc


                          Comment

                          • Michel Posseth  [MCP]

                            #14
                            Re: fastest searchable datastructure?


                            Before anyone takes this out of perspective

                            1 . these tests were done by Balena not by me , however the results and code
                            are not annymore on the website , but i have read on a dutch website that
                            you can find this in his 2003 book ( i have the 2002 and 2005 version )

                            2 . The conclusion was that the hashtable was the fastest when values were
                            retrieved with the key , however a arraylist was faster when looping
                            through the values

                            "Although the statistics might tell you that the Amazone rivers average
                            depth is 50 cm this does not mean that you can walk to the other side"

                            Michel




                            "Marc Gravell" <marc.gravell@g mail.comschreef in bericht
                            news:%23ym20SPW IHA.5984@TK2MSF TNGP06.phx.gbl. ..
                            >Arraylist was 4 times faster as the hashtable
                            The other question would be: "at what, exactly?"
                            >
                            If the timings include the insertion time, then yes: a hash-table
                            (Hashtable or Dictionary<TKey ,TValue>) will take longer: it needs to
                            obtain hashes, build buckets, etc (an ArrayList or List<Tjust has to
                            copy an array a few times). But the *read* performance should then be
                            /significantly/ better, assuming that the items are being fetched by key
                            (and not by looping).
                            >
                            Without inspectable, reproducable code I'm going to take this with a pinch
                            of salt...
                            >
                            Marc
                            >

                            Comment

                            • Jon Skeet [C# MVP]

                              #15
                              Re: fastest searchable datastructure?

                              Michel Posseth [MCP] <MSDN@posseth.c omwrote:
                              Before anyone takes this out of perspective
                              >
                              1 . these tests were done by Balena not by me , however the results and code
                              are not annymore on the website , but i have read on a dutch website that
                              you can find this in his 2003 book ( i have the 2002 and 2005 version )
                              >
                              2 . The conclusion was that the hashtable was the fastest when values were
                              retrieved with the key , however a arraylist was faster when looping
                              through the values
                              Right. That makes a lot more sense. As far as I'm aware, this thread is
                              about looking up by key, at which point the hashtable should win
                              easily.

                              --
                              Jon Skeet - <skeet@pobox.co m>
                              http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                              World class .NET training in the UK: http://iterativetraining.co.uk

                              Comment

                              Working...