Benchmarks

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Phil Carmody

    #31
    Re: Benchmarks

    Juha Nieminen <nospam@thanks. invalidwrites:
    user923005 wrote:
    >Cuckoo hashing has guaranteed O(1) lookup and delete and amortized
    >O(1) insert.
    >
    Wikipedia says: "Lookup requires inspection of just two locations in
    the hash table, which takes constant time in the worst case".
    >
    I don't understand how that is possible. Suppose an element is
    inserted into the table. Then later a new element displaces this first
    element to a second location. Then later another new element, to be put
    in this second location, displaces this first element to a third
    location. How exactly do you find this first element anymore? It's not
    in either of the two alternative locations defined by the two hashing
    functions for that element.
    The insert can be quite a complicated procedure. It might be
    amortized O(1), but worst-case, it's O(epic fail).

    Phil
    --
    We must respect the other fellow's religion, but only in the sense and to the
    extent that we respect his theory that his wife is beautiful and his children
    smart. -- Henry Louis Mencken (1880-1956), American editor and critic

    Comment

    • Jerry Coffin

      #32
      Re: Benchmarks

      In article <87zlkbph9z.fsf @nonospaz.fatph il.org>,
      thefatphil_demu nged@yahoo.co.u k says...

      [ ... hash table operations ]
      The insert can be quite a complicated procedure. It might be
      amortized O(1), but worst-case, it's O(epic fail).
      That depends on how the hash table is designed. It's entirely possible
      (for one example) to create a hash table that uses balanced trees for
      collision resolution. This gives a worst case of O(log N), which isn't
      really all that awful at all.

      --
      Later,
      Jerry.

      The universe is a figment of its own imagination.

      Comment

      • user923005

        #33
        Re: Benchmarks

        On Nov 7, 10:54 am, Jerry Coffin <jcof...@taeus. comwrote:
        In article <87zlkbph9z.... @nonospaz.fatph il.org>,
        thefatphil_demu n...@yahoo.co.u k says...
        >
        [ ... hash table operations ]
        >
        The insert can be quite a complicated procedure. It might be
        amortized O(1), but worst-case, it's O(epic fail).
        >
        That depends on how the hash table is designed. It's entirely possible
        (for one example) to create a hash table that uses balanced trees for
        collision resolution. This gives a worst case of O(log N), which isn't
        really all that awful at all.
        I like this scheme:
        Say that you want a hash table of 2^20th elements...
        Create a hash table of 2^20th elements,
        Create a shadow hash table of 2^16th elements
        Create a shadow hash table of 2^12th elements
        Create a shadow hash table of 2^8th elements, each of which is a
        skiplist holding the data type.

        You form your 32 bit hash using some nice algorithm like Bob Jenkin's
        hash or Goulburn hash.
        You mask off the lower 20 bits and this is the address for the large
        table.
        If there is a collision, you mask off 4 more bits and use the second
        largest table.
        If there is a collision, you mask off 4 more bits and use the third
        largest table.
        If there is a collision you take the low byte of the hash and insert
        into that skiplist.

        If you get a lot more data than you expect, you construct a new larger
        table of 2^24th elements and put it in the front as the next new
        layer.

        Of course, if you want, you can rehash the data each time, but I think
        it is better to simply mask. That way, if you suddenly start growing
        your skiplists and the first level tables are lightly populated, then
        you can detect what is obviously a denial of service attack.



        Comment

        • Phil Carmody

          #34
          Re: Benchmarks

          Jerry Coffin <jcoffin@taeus. comwrites:
          In article <87zlkbph9z.fsf @nonospaz.fatph il.org>,
          thefatphil_demu nged@yahoo.co.u k says...
          >
          [ ... hash table operations ]
          uhuh - specifically cuckoo hash operations.
          >The insert can be quite a complicated procedure. It might be
          >amortized O(1), but worst-case, it's O(epic fail).
          >
          That depends on how the hash table is designed. It's entirely possible
          [... to not use a cuckoo hash ...]

          Yes, but then it wouldn't be the hash design whose Big-Ohs were
          being debated.

          Phil
          --
          We must respect the other fellow's religion, but only in the sense and to the
          extent that we respect his theory that his wife is beautiful and his children
          smart. -- Henry Louis Mencken (1880-1956), American editor and critic

          Comment

          • Juha Nieminen

            #35
            Re: Benchmarks

            Richard Harter wrote:
            It doesn't work like that, though you had me perplexed for a
            moment. The first element goes back to its original location,
            displacing the second. Thus, suppose that h0, h1, etc are
            locations and the first three element are e0, e1, e2, and that
            their address pairs are:
            >
            e0: h0,h1
            e1: h0,h2
            e2: h1,h3
            >
            Initially e0 is inserted in location h0. When e1 is inserted, e0
            is moved to h1, and e1 is inserted in h0. The contents of the
            table look like this:
            >
            h0: e1
            h1: e0
            h2: -
            h3: -
            >
            Now e2 is inserted; it goes into location h1, displacing e0. In
            turn e0 goes back to its alternate location h0, displacing e1,
            which has to go to its alternate location, h2. The table now
            looks like this:
            >
            h0: e0
            h1: e2
            h2: e1
            h3: -
            I'm not 100% sure I fully understand it, but anyways, it sound to me
            like insertion is O(n) like that (each displaced element could
            potentially displace an existing element, which could potentially
            displace an existing element, etc, potentially going through O(n)
            elements without it going into an infinite loop).

            I don't even see how insertion is amortized constant-time. It sounds
            to me like insertion is not only O(n), but also could potentially be
            amortized linear-time, especially if poor hashing functions are chosen.

            There was no conditional ("with properly designed hashing functions")
            in the description given by wikipedia for the amortized constant time
            insertion, but it made it sound like it's *always* amortized constant
            time, regardless of the chosen hashing functions... I can't understand
            how it's possible.

            Heck, in fact I think it's rather trivial to demonstrate that
            insertion can be O(infinite), by simply using these two hashing functions:

            f1(x) = 0
            f2(x) = 1

            With these hashing functions it's impossible to insert more than two
            elements in the hash table. Trying to insert a third one is impossible.

            Of course these two functions are absolutely extreme, but I think they
            demonstrate that the hash table can *not* have constant-time insertion
            (or even linear-time, for that matter) for all possible hashing
            functions. At most this may be possible for *some* hashing functions.

            Comment

            • CBFalconer

              #36
              Re: Benchmarks

              Phil Carmody wrote:
              Jerry Coffin <jcoffin@taeus. comwrites:
              >thefatphil_demu nged@yahoo.co.u k says...
              >>
              >[ ... hash table operations ]
              >
              uhuh - specifically cuckoo hash operations.
              >
              >>The insert can be quite a complicated procedure. It might be
              >>amortized O(1), but worst-case, it's O(epic fail).
              >>
              >That depends on how the hash table is designed. It's entirely
              >possible ...
              >
              [... to not use a cuckoo hash ...]
              >
              Yes, but then it wouldn't be the hash design whose Big-Ohs were
              being debated.
              I gather this sub-thread applies specifically to the cuckoo hash,
              with which I am not familiar. However note that the design of
              hashlib specifically prevents the hashtable being more than about
              88% full, and normally is in the 44 to 88% range. This
              specifically prevents the occurence of long trails with any
              reasonable hash function. The organization does specifically
              requires table reorganization at intervals. The result is O(1)
              operation. The reorganization is considerably cheaper than the
              initial construction of that table, because it is not necessary to
              check equality in the rehashing operation (we know that the
              existing table has no duplicates). At any rate, see:

              <http://cbfalconer.home .att.net/download/hashlib.zip>

              --
              [mail]: Chuck F (cbfalconer at maineline dot net)
              [page]: <http://cbfalconer.home .att.net>
              Try the download section.

              Comment

              • user923005

                #37
                Re: Benchmarks

                On Nov 7, 2:47 pm, Juha Nieminen <nos...@thanks. invalidwrote:
                Richard Harter wrote:
                It doesn't work like that, though you had me perplexed for a
                moment.  The first element goes back to its original location,
                displacing the second.  Thus, suppose that h0, h1, etc are
                locations and the first three element are e0, e1, e2, and that
                their address pairs are:
                >
                e0: h0,h1
                e1: h0,h2
                e2: h1,h3
                >
                Initially e0 is inserted in location h0.  When e1 is inserted, e0
                is moved to h1, and e1 is inserted in h0.  The contents of the
                table look like this:
                >
                   h0: e1
                   h1: e0
                   h2: -
                   h3: -
                >
                Now e2 is inserted; it goes into location h1, displacing e0.  In
                turn e0 goes back to its alternate location h0, displacing e1,
                which has to go to its alternate location, h2.  The table now
                looks like this:
                >
                   h0: e0
                   h1: e2
                   h2: e1
                   h3: -
                >
                  I'm not 100% sure I fully understand it, but anyways, it sound to me
                like insertion is O(n) like that (each displaced element could
                potentially displace an existing element, which could potentially
                displace an existing element, etc, potentially going through O(n)
                elements without it going into an infinite loop).
                >
                  I don't even see how insertion is amortized constant-time. It sounds
                to me like insertion is not only O(n), but also could potentially be
                amortized linear-time, especially if poor hashing functions are chosen.
                >
                  There was no conditional ("with properly designed hashing functions")
                in the description given by wikipedia for the amortized constant time
                insertion, but it made it sound like it's *always* amortized constant
                time, regardless of the chosen hashing functions... I can't understand
                how it's possible.
                >
                  Heck, in fact I think it's rather trivial to demonstrate that
                insertion can be O(infinite), by simply using these two hashing functions:
                >
                f1(x) = 0
                f2(x) = 1
                >
                  With these hashing functions it's impossible to insert more than two
                elements in the hash table. Trying to insert a third one is impossible.
                >
                  Of course these two functions are absolutely extreme, but I think they
                demonstrate that the hash table can *not* have constant-time insertion
                (or even linear-time, for that matter) for all possible hashing
                functions. At most this may be possible for *some* hashing functions.
                Sure. If hash(x) == return 1; then bad behavior is to be expected.
                The assumption is of maximally cascade free hash functions like UMAC
                or Bob Jenkin's hash.

                There are also little tweaky things done to make the "real-life"
                behavior of Cuckoo hash better (there is an article with a title
                something like "Hash with a Stash" that explains it in detail.)

                Big-O analysis is an examination of average case behavior. If your
                hash table is too small and you do not make provision to enlarge it,
                then it can suffer from attack by an opponent, and so we should also
                carefully examine worst-case behavior.

                For example, suppose you have a small hash table with 2000 entries
                that overflows into link-lists.
                Someone tests one trillion strings and finds three million strings
                that map to 4 distinct hash codes (because they have somehow learned
                your hash algorithm).
                The opponent sends these three million strings to your hash table.
                Obviously, we are going to get some long chains, and we will have to
                linear search them to discover if an item is in the table or not.
                These attacks are realistic and do happen in real-life (it turns out
                that the world has a surprisingly large population of whinging twits).

                Now, if we had made the table a lot larger (e.g. 64000 entries) it
                would be a lot harder to find strings that map to the same value.
                And if we had made the spill strategy more robust (e.g. a skiplist
                instead of an ordinary linked list) then the behavior never goes
                catastrophic.

                It's too bad that we have to plan for whinging twits when designing
                data structures, but bad or skewed sequences do happen from time to
                time in real life also, so it is not entirely wasted effort.

                Comment

                • Richard Heathfield

                  #38
                  Re: Benchmarks

                  Nick Keighley said:

                  <snip>
                  CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
                  >
                  I may get this printed on a tee shirt
                  If you do, fist get it spell-checked.

                  --
                  Richard Heathfield <http://www.cpax.org.uk >
                  Email: -http://www. +rjh@
                  Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                  "Usenet is a strange place" - dmr 29 July 1999

                  Comment

                  • Richard Heathfield

                    #39
                    Re: Benchmarks

                    Jerry Coffin said:
                    In article <gf120o$jnq$2@h oshi.visyn.net> , spamtrap@gmx.de says...
                    >Juha Nieminen wrote:
                    [...]
                    Personally I prefer to write 100 lines of clear, readable and
                    maintainable code, even if it's 1% slower than 1000 lines of code
                    worthy of the IOCCC.
                    >>
                    > It's always easier to make a bug-free program fast than to
                    > bug-fix a fast program. :)
                    >
                    Oft repeated, but not really true.
                    All generalisations are false...

                    Omitting nail-you-to-the-wall words like "always", I would argue that it's
                    normally easier to make a *clear* program correct than it is to make a
                    correct program clear, and it's normally easier to make a correct program
                    fast than to make a fast program correct. Therefore, my priorities when
                    writing code are:

                    [1] Clarity
                    [2] Correctness
                    [3] Performance

                    These are NOT mutually exclusive! The job isn't done until the program is
                    sufficiently clear *and* sufficiently correct *and* sufficiently fast.

                    <snip>

                    --
                    Richard Heathfield <http://www.cpax.org.uk >
                    Email: -http://www. +rjh@
                    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                    "Usenet is a strange place" - dmr 29 July 1999

                    Comment

                    • Juha Nieminen

                      #40
                      Re: Benchmarks

                      user923005 wrote:
                      Big-O analysis is an examination of average case behavior.
                      Not true. Big-O denotes the asymptotic worst case behavior.

                      One can say "the algorithm is O(n lg n) *in average*", but AFAIK
                      that's a rather shaky definition. The big-O notation always denotes the
                      upper bound of the asymptotic behavior of the algorithm. In other words,
                      any asymptotic behavior of the algorithm with any given data will always
                      fall below the big-O function curve. If with some data the algorithm
                      behaves asymptotically slower than the given function, then that's not
                      the true big-O function for that algorithm.

                      For example quicksort is O(n^2). (Don't believe what people say. It
                      *is* O(n^2), period.)

                      Comment

                      • Kai-Uwe Bux

                        #41
                        Re: Benchmarks

                        Juha Nieminen wrote:
                        user923005 wrote:
                        >Big-O analysis is an examination of average case behavior.
                        >
                        Not true. Big-O denotes the asymptotic worst case behavior.
                        >
                        One can say "the algorithm is O(n lg n) *in average*", but AFAIK
                        that's a rather shaky definition. The big-O notation always denotes the
                        upper bound of the asymptotic behavior of the algorithm. In other words,
                        any asymptotic behavior of the algorithm with any given data will always
                        fall below the big-O function curve. If with some data the algorithm
                        behaves asymptotically slower than the given function, then that's not
                        the true big-O function for that algorithm.
                        >
                        For example quicksort is O(n^2). (Don't believe what people say. It
                        *is* O(n^2), period.)
                        Big-O notation is oblivious to whether it is used to say something about the
                        worst case or about the average case. It also does not care whether you say
                        something about time or space complexity. In fact, Big-O is a purely
                        mathematical concept: Given two functions f and g, we say that f is a
                        O(g) if there is a constant C such that f(x) <= C g(x) for all x.

                        For computer science, f is often the worst case runtime of an algorithm
                        depending on some input length n (in some measure). So, we say that this
                        algorithm has worst case time complexity O(n^2) if the worst case run time
                        is bounded from above by C n^2 for some C. However, f could be the
                        average runtime or it could be the worst case space complexity, or some
                        other function. Thus, you can use Big-O notation also to say something
                        about worst case space complexity, average case time complexity, or other
                        complexities. The Big-O won't care.


                        Best

                        Kai-Uwe Bux

                        Comment

                        • CBFalconer

                          #42
                          Re: Benchmarks

                          Juha Nieminen wrote:
                          user923005 wrote:
                          >
                          >Big-O analysis is an examination of average case behavior.
                          >
                          Not true. Big-O denotes the asymptotic worst case behavior.
                          >
                          One can say "the algorithm is O(n lg n) *in average*", but AFAIK
                          that's a rather shaky definition. The big-O notation always
                          denotes the upper bound of the asymptotic behavior of the
                          algorithm. In other words, any asymptotic behavior of the
                          algorithm with any given data will always fall below the big-O
                          function curve. If with some data the algorithm behaves
                          asymptotically slower than the given function, then that's not
                          the true big-O function for that algorithm.
                          >
                          For example quicksort is O(n^2). (Don't believe what people say.
                          It *is* O(n^2), period.)
                          I suspect you know what you are talking about, but you are leaving
                          the wrong impression. The sort of sequence that produces O(n*n)
                          quicksort performance is extremely rare, and very slight controls
                          make it even rarer (such as selecting the median of 3 items as the
                          value to partition about).

                          The result is that for most cases quicksort will be the fastest
                          sort. Right behind it are other O(nLOGn) methods. I prefer
                          mergesort and data input in lists, because then I don't have to
                          worry about the size to be sorted. In addition, mergesort is
                          stable (quicksort is not).

                          --
                          [mail]: Chuck F (cbfalconer at maineline dot net)
                          [page]: <http://cbfalconer.home .att.net>
                          Try the download section.

                          Comment

                          • CBFalconer

                            #43
                            Re: Benchmarks

                            Richard Heathfield wrote:
                            Nick Keighley said:
                            >
                            <snip>
                            >
                            >CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
                            >>
                            >I may get this printed on a tee shirt
                            >
                            If you do, fist get it spell-checked.
                            ^^^^
                            ?? :-) F'ups set.

                            --
                            [mail]: Chuck F (cbfalconer at maineline dot net)
                            [page]: <http://cbfalconer.home .att.net>
                            Try the download section.

                            Comment

                            • Phil Carmody

                              #44
                              Re: Benchmarks

                              CBFalconer <cbfalconer@yah oo.comwrites:
                              Richard Heathfield wrote:
                              >Nick Keighley said:
                              >>
                              ><snip>
                              >>
                              >>CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
                              >>>
                              >>I may get this printed on a tee shirt
                              >>
                              >If you do, fist get it spell-checked.
                              ^^^^
                              ?? :-) F'ups set.
                              Set, followed, and had fun with.



                              extern volatile poster Chuck;

                              void Richard(post fromNick)
                              {
                              string s("W");
                              while(!Chuck.po sted()) { s+="o"; }
                              s+="sh";
                              Chuck.respond(s );
                              }

                              Phil
                              --
                              I tried the Vista speech recognition by running the tutorial. I was
                              amazed, it was awesome, recognised every word I said. Then I said the
                              wrong word ... and it typed the right one. It was actually just
                              detecting a sound and printing the expected word! -- pbhj on /.

                              Comment

                              • Richard Heathfield

                                #45
                                Re: Benchmarks

                                CBFalconer said:
                                Richard Heathfield wrote:
                                >Nick Keighley said:
                                >>
                                ><snip>
                                >>
                                >>CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
                                >>>
                                >>I may get this printed on a tee shirt
                                >>
                                >If you do, fist get it spell-checked.
                                ^^^^
                                ?? :-)
                                Since all spelling flames are supposed to include spelling errors, if one
                                doesn't arise naturally it must be inserted, as on this occasion.

                                --
                                Richard Heathfield <http://www.cpax.org.uk >
                                Email: -http://www. +rjh@
                                Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                                "Usenet is a strange place" - dmr 29 July 1999

                                Comment

                                Working...