Benchmarks

**Nick Keighley** · Nov 7 '08, 08:45 AM

Re: Benchmarks

On 6 Nov, 15:53, s0s...@gmail.co m wrote:

The task: Write a program that reads a set of words from standard
input and prints the number of distinct words.
>
I came across a website that listed a few programs to accomplish this
task:http://unthought.net/c++/c_vs_c++.html(ignore all the language
flaming :-), and thought that all of them did unnecessary operations,
so I wrote my own. But for some reason, my version turned out slower
that ALL of the versions in the website, even though it seems to
perform less operations (yes, I benchmarked them on my own computer).
>
According to the website, the slowest version is:
>
#include <set>
#include <string>
#include <iostream>
>
int main(int argc, char **argv)
{
// Declare and Initialize some variables
std::string word;
std::set<std::s tringwordcount;
// Read words and insert in rb-tree
while (std::cin >word) wordcount.inser t(word);
// Print the result
std::cout << "Words: " << wordcount.size( ) << std::endl;
return 0;
>
}

the above uses an rb tree (or equivalent) in the set class

My version is about 12 times slower than that. It uses lower-level
constructs. Here it is:

[snip version using linear search]

Any ideas as to what causes the big slowdown?

this is a very important lesson. Print it out in big letters
and post it on your wall.

CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION

I may get this printed on a tee shirt

--
Nick Keighley

**CBFalconer** · Nov 7 '08, 08:45 AM

Re: Benchmarks

Juha Nieminen wrote:

s0suk3@gmail.co m wrote:
>

>Any ideas as to what causes the big slowdown?

>
Why do you expect that searching a linked list could be even
close to the speed of searching a balanced binary tree?

Which is O(log N), as compared to O(N). However a hashtable is
even faster, being O(1) for suitable organization.

F'ups set to c.l.c. only.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

**CBFalconer** · Nov 7 '08, 08:45 AM

Re: Benchmarks

user923005 wrote:

s0s...@gmail.co m wrote:
>

>The task: Write a program that reads a set of words from
>standard input and prints the number of distinct words.

>
[snip]
>
I am surprised chuck has not chimed in here.
A hash table is *ideal* for this.

I just did, a few minutes ago. Been having problems with the
news-server.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

**Juha Nieminen** · Nov 7 '08, 09:15 AM

Re: Benchmarks

CBFalconer wrote:

Juha Nieminen wrote:

>s0suk3@gmail.co m wrote:
>>

>>Any ideas as to what causes the big slowdown?

>Why do you expect that searching a linked list could be even
>close to the speed of searching a balanced binary tree?

>
Which is O(log N), as compared to O(N). However a hashtable is
even faster, being O(1) for suitable organization.
>
F'ups set to c.l.c. only.

Why? Because computational complexities only apply to C?

**Phil Carmody** · Nov 7 '08, 09:25 AM

Re: Benchmarks

Juha Nieminen <nospam@thanks. invalidwrites:

user923005 wrote:

>The hash table solution is O(n).

>
You would have hard time proving that.

He shouldn't. It should be pretty clear. No hash table needs
to examine each element more than once.

Phil
--
We must respect the other fellow's religion, but only in the sense and to the
extent that we respect his theory that his wife is beautiful and his children
smart. -- Henry Louis Mencken (1880-1956), American editor and critic

**Juha Nieminen** · Nov 7 '08, 09:35 AM

Re: Benchmarks

Nick Keighley wrote:

CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION

That's so true. However, we shouldn't completely dismiss
"micro-optimization" and "hacker-optimization" once we have come
up with the fastest possible overall algorithm. Even though two
implementations might both by eg. O(n lg n) with a rather small
constant factor, one of them might still be 10 times faster than
the other if it performs clever low-level tricks inside the
innermost loop (which might be run eg. millions of times).

Of course when dealing with hacker optimization, there's always
a point where it's too detrimental to the readability and
maintainability of the code in relation to the speed benefit it
gives. If the code becomes significantly more obfuscated while
giving a speed boost of 1%, it might not be worth the trouble.
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.

**Phil Carmody** · Nov 7 '08, 09:35 AM

Re: Benchmarks

Phil Carmody <thefatphil_dem unged@yahoo.co. ukwrites:

Juha Nieminen <nospam@thanks. invalidwrites:

>user923005 wrote:

>>The hash table solution is O(n).

>>
> You would have hard time proving that.

>
He shouldn't. It should be pretty clear. No hash table needs
to examine each element more than once.

Oooops, suffering from Falconeritis - but you did over-snip rather.
The whole task is indeed o(n^2), but omega(n).

Phil
--
We must respect the other fellow's religion, but only in the sense and to the
extent that we respect his theory that his wife is beautiful and his children
smart. -- Henry Louis Mencken (1880-1956), American editor and critic

**Hendrik Schober** · Nov 7 '08, 09:35 AM

Re: Benchmarks

Juha Nieminen wrote:

[...]
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.

It's always easier to make a bug-free program fast than to
bug-fix a fast program. :)

Schobi

**Juha Nieminen** · Nov 7 '08, 09:45 AM

Re: Benchmarks

user923005 wrote:

Cuckoo hashing has guaranteed O(1) lookup and delete and amortized
O(1) insert.

Wikipedia says: "Lookup requires inspection of just two locations in
the hash table, which takes constant time in the worst case".

I don't understand how that is possible. Suppose an element is
inserted into the table. Then later a new element displaces this first
element to a second location. Then later another new element, to be put
in this second location, displaces this first element to a third
location. How exactly do you find this first element anymore? It's not
in either of the two alternative locations defined by the two hashing
functions for that element.

**James Kanze** · Nov 7 '08, 10:05 AM

Re: Benchmarks

On Nov 7, 10:25 am, Juha Nieminen <nos...@thanks. invalidwrote:

Nick Keighley wrote:

CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION

That's so true. However, we shouldn't completely dismiss
"micro-optimization" and "hacker-optimization" once we have
come up with the fastest possible overall algorithm.

Yes, but this should only be undertaken when necessary, under
the control of a profiler---what optimizes on one platform may
slow down on another. And don't forget that a good compiler
will do many of these optimizations for you. A long time ago, I
tried replacing "h = 127 * h + *p" (in a hashing algorithm) with
"h = (h << 7 - h) + *p", knowing that my hardware didn't have
hardware multiply. The results were slower than the original;
the compiler also converted the multiplication into a shift and
subtract, and since it knew the final purpose in those
operations, was actually able to save one machine instruction
over an explicit shift and subtract.

Even though two implementations might both by eg. O(n lg n)
with a rather small constant factor, one of them might still
be 10 times faster than the other if it performs clever
low-level tricks inside the innermost loop (which might be run
eg. millions of times).

I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).

--
James Kanze (GABI Software) email:james.kan ze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

**James Dow Allen** · Nov 7 '08, 10:15 AM

Re: Benchmarks

On Nov 7, 4:51 am, c...@tiac.net (Richard Harter) wrote:

Just as a note, the common claim that hash table are O(1) per
access whereas search trees are O(log n) per access is
misleading, ...computing the hash code ...
is necessarily greater
than log2(n)....
hash table - O(log n)

Note that, for sufficiently pedantic O(.)
estimates, many of the usual O(.) measures are wrong.
A single arithmetic op may be considered to require
O(log N) time since log N bits are needed to represent
an index on a set of size N.

Returning to OP's problem, I wonder: are Compact
Hash Trees are in wide use? If there's interest
I'll post my implementation.

James Dow Allen

**Juha Nieminen** · Nov 7 '08, 10:55 AM

Re: Benchmarks

James Kanze wrote:

>Even though two implementations might both by eg. O(n lg n)
>with a rather small constant factor, one of them might still
>be 10 times faster than the other if it performs clever
>low-level tricks inside the innermost loop (which might be run
>eg. millions of times).

>
I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).

There's just so much a compiler can do. There are cases where only the
programmer can understand what's going on (rather than the compiler) and
be able to optimize something.

For example, comparing two strings for inequality is O(1) if the
maximum size of the string is fixed (eg. you know that no string has
more than 30 characters), but still much slower than comparing two
integers. If you can somehow change the string comparison to an integer
comparison, and these comparisons are performed millions of times in the
innermost loop of your code, the program may speed up considerably. This
is a kind of optimization which is completely out or the realm of
compiler optimizations.

**Jerry Coffin** · Nov 7 '08, 02:55 PM

Re: Benchmarks

In article <79b10b2f-6abf-4440-9d04-f731e6b51bf2
@f40g2000pri.go oglegroups.com> , james.kanze@gma il.com says...

[ ... ]

Even though two implementations might both by eg. O(n lg n)
with a rather small constant factor, one of them might still
be 10 times faster than the other if it performs clever
low-level tricks inside the innermost loop (which might be run
eg. millions of times).

>
I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).

I've seen a factor of 10, and even a _little_ bit more -- but it was
_extremely_ hardware specific, depending almost entirely on the cache
organization of the processor. As the data was originally placed in
memory, two crucial items ended up contending for the same cache lines.
Rearranging the data allowed those items to live in different cache
lines.

In this case, the CPU involved wasn't even made when the compiler was
written, so I have a hard time blaming the compiler for the poor
optimization. Unfortunately, I don't know of any newer compilers that
would do much better (and when they do, it's mostly by accident...)

--
Later,
Jerry.

The universe is a figment of its own imagination.

**Jerry Coffin** · Nov 7 '08, 02:55 PM

Re: Benchmarks

In article <gf120o$jnq$2@h oshi.visyn.net> , spamtrap@gmx.de says...

Juha Nieminen wrote:

[...]
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.

>
It's always easier to make a bug-free program fast than to
bug-fix a fast program. :)

Oft repeated, but not really true. Sometimes a fast program still
contains a trivial bug, but I can remember one program in particular
that was bug-free as far as I know, but a horrible mess, to the point
that by _far_ the easiest way to improve it was write a set of
specifications for what it did, and start over. By the time I realized
that, however, I'd spent more time trying to optimize one small part
than it eventually took to rewrite the whole thing...

--
Later,
Jerry.

The universe is a figment of its own imagination.

**Richard Harter** · Nov 7 '08, 03:45 PM

Re: Benchmarks

On Fri, 07 Nov 2008 09:41:08 GMT, Juha Nieminen
<nospam@thanks. invalidwrote:

>user923005 wrote:

>Cuckoo hashing has guaranteed O(1) lookup and delete and amortized
>O(1) insert.

>
Wikipedia says: "Lookup requires inspection of just two locations in
>the hash table, which takes constant time in the worst case".
>
I don't understand how that is possible. Suppose an element is
>inserted into the table. Then later a new element displaces this first
>element to a second location. Then later another new element, to be put
>in this second location, displaces this first element to a third
>location. How exactly do you find this first element anymore? It's not
>in either of the two alternative locations defined by the two hashing
>functions for that element.

It doesn't work like that, though you had me perplexed for a
moment. The first element goes back to its original location,
displacing the second. Thus, suppose that h0, h1, etc are
locations and the first three element are e0, e1, e2, and that
their address pairs are:

e0: h0,h1
e1: h0,h2
e2: h1,h3

Initially e0 is inserted in location h0. When e1 is inserted, e0
is moved to h1, and e1 is inserted in h0. The contents of the
table look like this:

h0: e1
h1: e0
h2: -
h3: -

Now e2 is inserted; it goes into location h1, displacing e0. In
turn e0 goes back to its alternate location h0, displacing e1,
which has to go to its alternate location, h2. The table now
looks like this:

h0: e0
h1: e2
h2: e1
h3: -

Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Benchmarks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment