Benchmarks

**Phil Carmody** · Nov 7 '08, 05:45 PM

Re: Benchmarks

Juha Nieminen <nospam@thanks. invalidwrites:

user923005 wrote:

>Cuckoo hashing has guaranteed O(1) lookup and delete and amortized
>O(1) insert.

>
Wikipedia says: "Lookup requires inspection of just two locations in
the hash table, which takes constant time in the worst case".
>
I don't understand how that is possible. Suppose an element is
inserted into the table. Then later a new element displaces this first
element to a second location. Then later another new element, to be put
in this second location, displaces this first element to a third
location. How exactly do you find this first element anymore? It's not
in either of the two alternative locations defined by the two hashing
functions for that element.

The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).

Phil
--
We must respect the other fellow's religion, but only in the sense and to the
extent that we respect his theory that his wife is beautiful and his children
smart. -- Henry Louis Mencken (1880-1956), American editor and critic

**Jerry Coffin** · Nov 7 '08, 06:55 PM

Re: Benchmarks

In article <87zlkbph9z.fsf @nonospaz.fatph il.org>,
thefatphil_demu nged@yahoo.co.u k says...

[ ... hash table operations ]

The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).

That depends on how the hash table is designed. It's entirely possible
(for one example) to create a hash table that uses balanced trees for
collision resolution. This gives a worst case of O(log N), which isn't
really all that awful at all.

--
Later,
Jerry.

The universe is a figment of its own imagination.

**user923005** · Nov 7 '08, 08:15 PM

Re: Benchmarks

On Nov 7, 10:54 am, Jerry Coffin <jcof...@taeus. comwrote:

In article <87zlkbph9z.... @nonospaz.fatph il.org>,
thefatphil_demu n...@yahoo.co.u k says...
>
[ ... hash table operations ]
>

The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).

>
That depends on how the hash table is designed. It's entirely possible
(for one example) to create a hash table that uses balanced trees for
collision resolution. This gives a worst case of O(log N), which isn't
really all that awful at all.

I like this scheme:
Say that you want a hash table of 2^20th elements...
Create a hash table of 2^20th elements,
Create a shadow hash table of 2^16th elements
Create a shadow hash table of 2^12th elements
Create a shadow hash table of 2^8th elements, each of which is a
skiplist holding the data type.

You form your 32 bit hash using some nice algorithm like Bob Jenkin's
hash or Goulburn hash.
You mask off the lower 20 bits and this is the address for the large
table.
If there is a collision, you mask off 4 more bits and use the second
largest table.
If there is a collision, you mask off 4 more bits and use the third
largest table.
If there is a collision you take the low byte of the hash and insert
into that skiplist.

If you get a lot more data than you expect, you construct a new larger
table of 2^24th elements and put it in the front as the next new
layer.

Of course, if you want, you can rehash the data each time, but I think
it is better to simply mask. That way, if you suddenly start growing
your skiplists and the first level tables are lightly populated, then
you can detect what is obviously a denial of service attack.

**Phil Carmody** · Nov 7 '08, 10:35 PM

Re: Benchmarks

Jerry Coffin <jcoffin@taeus. comwrites:

In article <87zlkbph9z.fsf @nonospaz.fatph il.org>,
thefatphil_demu nged@yahoo.co.u k says...
>
[ ... hash table operations ]

uhuh - specifically cuckoo hash operations.

>The insert can be quite a complicated procedure. It might be
>amortized O(1), but worst-case, it's O(epic fail).

>
That depends on how the hash table is designed. It's entirely possible

[... to not use a cuckoo hash ...]

Yes, but then it wouldn't be the hash design whose Big-Ohs were
being debated.

Phil
--
We must respect the other fellow's religion, but only in the sense and to the
extent that we respect his theory that his wife is beautiful and his children
smart. -- Henry Louis Mencken (1880-1956), American editor and critic

**Juha Nieminen** · Nov 7 '08, 10:55 PM

Re: Benchmarks

Richard Harter wrote:

It doesn't work like that, though you had me perplexed for a
moment. The first element goes back to its original location,
displacing the second. Thus, suppose that h0, h1, etc are
locations and the first three element are e0, e1, e2, and that
their address pairs are:
>
e0: h0,h1
e1: h0,h2
e2: h1,h3
>
Initially e0 is inserted in location h0. When e1 is inserted, e0
is moved to h1, and e1 is inserted in h0. The contents of the
table look like this:
>
h0: e1
h1: e0
h2: -
h3: -
>
Now e2 is inserted; it goes into location h1, displacing e0. In
turn e0 goes back to its alternate location h0, displacing e1,
which has to go to its alternate location, h2. The table now
looks like this:
>
h0: e0
h1: e2
h2: e1
h3: -

I'm not 100% sure I fully understand it, but anyways, it sound to me
like insertion is O(n) like that (each displaced element could
potentially displace an existing element, which could potentially
displace an existing element, etc, potentially going through O(n)
elements without it going into an infinite loop).

I don't even see how insertion is amortized constant-time. It sounds
to me like insertion is not only O(n), but also could potentially be
amortized linear-time, especially if poor hashing functions are chosen.

There was no conditional ("with properly designed hashing functions")
in the description given by wikipedia for the amortized constant time
insertion, but it made it sound like it's *always* amortized constant
time, regardless of the chosen hashing functions... I can't understand
how it's possible.

Heck, in fact I think it's rather trivial to demonstrate that
insertion can be O(infinite), by simply using these two hashing functions:

f1(x) = 0
f2(x) = 1

With these hashing functions it's impossible to insert more than two
elements in the hash table. Trying to insert a third one is impossible.

Of course these two functions are absolutely extreme, but I think they
demonstrate that the hash table can *not* have constant-time insertion
(or even linear-time, for that matter) for all possible hashing
functions. At most this may be possible for *some* hashing functions.

**CBFalconer** · Nov 7 '08, 11:25 PM

Re: Benchmarks

Phil Carmody wrote:

Jerry Coffin <jcoffin@taeus. comwrites:

>thefatphil_demu nged@yahoo.co.u k says...
>>
>[ ... hash table operations ]

>
uhuh - specifically cuckoo hash operations.
>

>>The insert can be quite a complicated procedure. It might be
>>amortized O(1), but worst-case, it's O(epic fail).

>>
>That depends on how the hash table is designed. It's entirely
>possible ...

>
[... to not use a cuckoo hash ...]
>
Yes, but then it wouldn't be the hash design whose Big-Ohs were
being debated.

I gather this sub-thread applies specifically to the cuckoo hash,
with which I am not familiar. However note that the design of
hashlib specifically prevents the hashtable being more than about
88% full, and normally is in the 44 to 88% range. This
specifically prevents the occurence of long trails with any
reasonable hash function. The organization does specifically
requires table reorganization at intervals. The result is O(1)
operation. The reorganization is considerably cheaper than the
initial construction of that table, because it is not necessary to
check equality in the rehashing operation (we know that the
existing table has no duplicates). At any rate, see:

<http://cbfalconer.home .att.net/download/hashlib.zip>

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

**user923005** · Nov 7 '08, 11:35 PM

Re: Benchmarks

On Nov 7, 2:47 pm, Juha Nieminen <nos...@thanks. invalidwrote:

Richard Harter wrote:

It doesn't work like that, though you had me perplexed for a
moment. The first element goes back to its original location,
displacing the second. Thus, suppose that h0, h1, etc are
locations and the first three element are e0, e1, e2, and that
their address pairs are:

>

e0: h0,h1
e1: h0,h2
e2: h1,h3

>

Initially e0 is inserted in location h0. When e1 is inserted, e0
is moved to h1, and e1 is inserted in h0. The contents of the
table look like this:

>

h0: e1
h1: e0
h2: -
h3: -

>

Now e2 is inserted; it goes into location h1, displacing e0. In
turn e0 goes back to its alternate location h0, displacing e1,
which has to go to its alternate location, h2. The table now
looks like this:

>

h0: e0
h1: e2
h2: e1
h3: -

>
I'm not 100% sure I fully understand it, but anyways, it sound to me
like insertion is O(n) like that (each displaced element could
potentially displace an existing element, which could potentially
displace an existing element, etc, potentially going through O(n)
elements without it going into an infinite loop).
>
I don't even see how insertion is amortized constant-time. It sounds
to me like insertion is not only O(n), but also could potentially be
amortized linear-time, especially if poor hashing functions are chosen.
>
There was no conditional ("with properly designed hashing functions")
in the description given by wikipedia for the amortized constant time
insertion, but it made it sound like it's *always* amortized constant
time, regardless of the chosen hashing functions... I can't understand
how it's possible.
>
Heck, in fact I think it's rather trivial to demonstrate that
insertion can be O(infinite), by simply using these two hashing functions:
>
f1(x) = 0
f2(x) = 1
>
With these hashing functions it's impossible to insert more than two
elements in the hash table. Trying to insert a third one is impossible.
>
Of course these two functions are absolutely extreme, but I think they
demonstrate that the hash table can *not* have constant-time insertion
(or even linear-time, for that matter) for all possible hashing
functions. At most this may be possible for *some* hashing functions.

Sure. If hash(x) == return 1; then bad behavior is to be expected.
The assumption is of maximally cascade free hash functions like UMAC
or Bob Jenkin's hash.

There are also little tweaky things done to make the "real-life"
behavior of Cuckoo hash better (there is an article with a title
something like "Hash with a Stash" that explains it in detail.)

Big-O analysis is an examination of average case behavior. If your
hash table is too small and you do not make provision to enlarge it,
then it can suffer from attack by an opponent, and so we should also
carefully examine worst-case behavior.

For example, suppose you have a small hash table with 2000 entries
that overflows into link-lists.
Someone tests one trillion strings and finds three million strings
that map to 4 distinct hash codes (because they have somehow learned
your hash algorithm).
The opponent sends these three million strings to your hash table.
Obviously, we are going to get some long chains, and we will have to
linear search them to discover if an item is in the table or not.
These attacks are realistic and do happen in real-life (it turns out
that the world has a surprisingly large population of whinging twits).

Now, if we had made the table a lot larger (e.g. 64000 entries) it
would be a lot harder to find strings that map to the same value.
And if we had made the spill strategy more robust (e.g. a skiplist
instead of an ordinary linked list) then the behavior never goes
catastrophic.

It's too bad that we have to plan for whinging twits when designing
data structures, but bad or skewed sequences do happen from time to
time in real life also, so it is not entirely wasted effort.

**Richard Heathfield** · Nov 8 '08, 05:45 AM

Re: Benchmarks

Nick Keighley said:

<snip>

CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
>
I may get this printed on a tee shirt

If you do, fist get it spell-checked.

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

**Richard Heathfield** · Nov 8 '08, 05:45 AM

Re: Benchmarks

Jerry Coffin said:

In article <gf120o$jnq$2@h oshi.visyn.net> , spamtrap@gmx.de says...

>Juha Nieminen wrote:

[...]
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.

>>
> It's always easier to make a bug-free program fast than to
> bug-fix a fast program. :)

>
Oft repeated, but not really true.

All generalisations are false...

Omitting nail-you-to-the-wall words like "always", I would argue that it's
normally easier to make a *clear* program correct than it is to make a
correct program clear, and it's normally easier to make a correct program
fast than to make a fast program correct. Therefore, my priorities when
writing code are:

[1] Clarity
[2] Correctness
[3] Performance

These are NOT mutually exclusive! The job isn't done until the program is
sufficiently clear *and* sufficiently correct *and* sufficiently fast.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

**Juha Nieminen** · Nov 8 '08, 11:05 PM

Re: Benchmarks

user923005 wrote:

Big-O analysis is an examination of average case behavior.

Not true. Big-O denotes the asymptotic worst case behavior.

One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always denotes the
upper bound of the asymptotic behavior of the algorithm. In other words,
any asymptotic behavior of the algorithm with any given data will always
fall below the big-O function curve. If with some data the algorithm
behaves asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.

For example quicksort is O(n^2). (Don't believe what people say. It
*is* O(n^2), period.)

**Kai-Uwe Bux** · Nov 9 '08, 12:15 AM

Re: Benchmarks

Juha Nieminen wrote:

user923005 wrote:

>Big-O analysis is an examination of average case behavior.

>
Not true. Big-O denotes the asymptotic worst case behavior.
>
One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always denotes the
upper bound of the asymptotic behavior of the algorithm. In other words,
any asymptotic behavior of the algorithm with any given data will always
fall below the big-O function curve. If with some data the algorithm
behaves asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.
>
For example quicksort is O(n^2). (Don't believe what people say. It
*is* O(n^2), period.)

Big-O notation is oblivious to whether it is used to say something about the
worst case or about the average case. It also does not care whether you say
something about time or space complexity. In fact, Big-O is a purely
mathematical concept: Given two functions f and g, we say that f is a
O(g) if there is a constant C such that f(x) <= C g(x) for all x.

For computer science, f is often the worst case runtime of an algorithm
depending on some input length n (in some measure). So, we say that this
algorithm has worst case time complexity O(n^2) if the worst case run time
is bounded from above by C n^2 for some C. However, f could be the
average runtime or it could be the worst case space complexity, or some
other function. Thus, you can use Big-O notation also to say something
about worst case space complexity, average case time complexity, or other
complexities. The Big-O won't care.

Best

Kai-Uwe Bux

**CBFalconer** · Nov 9 '08, 12:55 AM

Re: Benchmarks

Juha Nieminen wrote:

user923005 wrote:
>

>Big-O analysis is an examination of average case behavior.

>
Not true. Big-O denotes the asymptotic worst case behavior.
>
One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always
denotes the upper bound of the asymptotic behavior of the
algorithm. In other words, any asymptotic behavior of the
algorithm with any given data will always fall below the big-O
function curve. If with some data the algorithm behaves
asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.
>
For example quicksort is O(n^2). (Don't believe what people say.
It *is* O(n^2), period.)

I suspect you know what you are talking about, but you are leaving
the wrong impression. The sort of sequence that produces O(n*n)
quicksort performance is extremely rare, and very slight controls
make it even rarer (such as selecting the median of 3 items as the
value to partition about).

The result is that for most cases quicksort will be the fastest
sort. Right behind it are other O(nLOGn) methods. I prefer
mergesort and data input in lists, because then I don't have to
worry about the size to be sorted. In addition, mergesort is
stable (quicksort is not).

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

**CBFalconer** · Nov 9 '08, 12:55 AM

Re: Benchmarks

Richard Heathfield wrote:

Nick Keighley said:
>
<snip>
>

>CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
>>
>I may get this printed on a tee shirt

>
If you do, fist get it spell-checked.

^^^^
?? :-) F'ups set.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

**Phil Carmody** · Nov 9 '08, 01:05 AM

Re: Benchmarks

CBFalconer <cbfalconer@yah oo.comwrites:

Richard Heathfield wrote:

>Nick Keighley said:
>>
><snip>
>>

>>CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
>>>
>>I may get this printed on a tee shirt

>>
>If you do, fist get it spell-checked.

^^^^
?? :-) F'ups set.

Set, followed, and had fun with.

extern volatile poster Chuck;

void Richard(post fromNick)
{
string s("W");
while(!Chuck.po sted()) { s+="o"; }
s+="sh";
Chuck.respond(s );
}

Phil
--
I tried the Vista speech recognition by running the tutorial. I was
amazed, it was awesome, recognised every word I said. Then I said the
wrong word ... and it typed the right one. It was actually just
detecting a sound and printing the expected word! -- pbhj on /.

**Richard Heathfield** · Nov 9 '08, 04:25 AM

Re: Benchmarks

CBFalconer said:

Richard Heathfield wrote:

>Nick Keighley said:
>>
><snip>
>>

>>CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION
>>>
>>I may get this printed on a tee shirt

>>
>If you do, fist get it spell-checked.

^^^^
?? :-)

Since all spelling flames are supposed to include spelling errors, if one
doesn't arise naturally it must be inserted, as on this occasion.

--
Richard Heathfield <http://www.cpax.org.uk >
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Benchmarks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment