Hash function of structs

**James Dow Allen** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

On May 19, 3:10 pm, CBFalconer <cbfalco...@yah oo.comwrote:

Alexander Mahone wrote:
>

Hello, I'm looking for an hash function to be used for an hash
table that will contain structs of a certain kind. I've looked
into Sourceforge.net , but so far I've found only hash functions
for strings (string->index).

>
Try: <http://cbfalconer.home .att.net/download/hashlib.zip>
>
Written in standard C, and released under GPL.

Alexander has keys which are not strings
(terminated by null characters) but defined by lengths.
Chuck's hashlib handles strings, so you will need you to code
your own routines to handle your keys (structs).

As others have mentioned, there are many good hash-table
handling routines available. I do *not* recommend GNU's
hsearch() (or hsearch_r() for reentrance), but will mention
it anyway since it is "standard" (available in some form
in standard GNU and BSD libraries), and very similar to
Chuck's in most ways: GPL license, handles strings, etc.
Like Chuck's hashlib, FSF hsearch() is available
in source code form.

hsearch() has a much *much* simpler interface than
Chuck's hashlib. Chuck's overly complex interface will
just get in the way unless you need it -- and if you do
need a more flexible interface than hsearch_r()'s you're
likely to find Chuck's interface also inadequate.

Hsearch_r() is not appropriate when memory cost is an
issue: it does no "quotientin g" and typically wastes
8 bytes of overhead on every table entry. Again,
however, Chuck's hashlib suffers the same deficiency.

But the main reason I find it impossible to recommend
Chuck's code is that it is infested with gross time
inefficiencies. I'm sure it would get a passing grade
in an undergraduate programming class when the
instructor stipulates that speed is of no concern,
but I find it bizarre that anyone would tout this as a
developed routine for professional use.

As one example of gross time inefficiency, Chuck's code
performs a completely unnecessary division on *every*
reprobe! Of course these gross inefficiencies won't
matter to you if speed isn't critical, but still
certainly make one wonder about the competence and sincerity
of the coding.

The bizarre division (which will be a severe time waster
on some architectures) was pointed out to Chuck several
years ago, along with the trivial source code fix required,
but AFAIK he's never bothered to fix this bug. Again, this
makes one wonder why we should be expected to treat hashlib
as a serious "product".

Without trying to impugn Chuck specifically, I think
many programmers should hone their skills with simple
arithmetic. No special expertise is needed to see and fix
the time-waster in Chuck's hashlib.

On the topic of facility with simple arithmetic, in
<1137404157.948 899.75770@o13g2 000cwo.googlegr oups.com>
two years ago I described a neat multiplication method:

If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y] - arr[x-y]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.

Try to solve this puzzle yourself if you didn't see it then.
Several responders were intrigued by this method,
although it took some follow-on messages to confirm
that the single line above did the trick with no further
testing or shifting.

Amusingly, Chuck Falconer posted *three* follow-ups to this
code, first complaining that it didn't work, then finally
saying he'd need to review algebra to confirm it did work!

Summary: Get your hash routine from someone who understands
simple arithmetic.

James Dow Allen

**Flash Gordon** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

CBFalconer wrote, On 21/05/08 05:42:

Eligiusz Narutowicz wrote:

>Harald van DÄ³k <truedfx@gmail. comwrites:

>>CBFalconer wrote:
>>>Eligiusz Narutowicz wrote:
>>>>CBFalcone r <cbfalconer@yah oo.comwrites:
>>>>>
>>>>>I suspect you will find hashlib better than anything on
>>>>>sourceforg e (but I could be wrong). The problem is the finding.
>>>>You are very wrong. Your hashlib is very primitive. There are
>>>>much better offering things in various packages on source forge.
>>>You are a troll, and have no knowledge of these things.
>>Calling anyone a troll for suggesting your hashlib is not the best
>>hashlib in the world is an extremely poor way to be taken
>>seriously, if you ask me.

>
True enough. I lost my head at the idiotic comment. EN has
obviously never read and/or used hashlib.

>Someone explains already better than my English could do about
>what is on source forge. I find this CBFalconer to be a bit of
>a big head and more wrong than right in his postings to these
>groups. There are many of projects in source forge which contain
>code from far better programmers than CBFalconer is every wanting
>to be capable of. And for wider issues than he can imagine. I
>have seen his code and it is not of the best calibre to be honest
>with you. It is ok for sure but is not of the commercial types
>found in source forge projects like rdbms sw.

>
Similarly your comments above. When you can't make specific
complaints, the comment is totally worthless. For example, I tried
to take a look at:
>
<http://sourceforge.net/projects/uthash/>
>
recommended by Dann Corbit a while back. That turns out to be a
non-portable system (it depends on Posix) implemented by macros. I
couldn't find a way to access any documentation on it.

It has documentation now. It specifies Posix, but the only non-standard
thing I can see is "exit(-1)" in a couple of places which could easily
be changed and some conversion of pointers to long (change that to
intptr_t) and the build system for the tests. I changed the build system
to add "-ansi -pedantic -Wall -Wextra" and got warnings about:
Reaching the end of main without a return
Unused argv/argc parameters
In one test a statement with no effect
In some (but not all) tests comparisons between signed and unsigned
In a few tests an unused variable 'key'
String lengths greater than 1024 in some tests

I don't believe any of those are indications that the hashing code is
actually dependant on Posix.

The macros use the "do { } while (0)" trick to make them easy to use.

Hashlib is
a LIBRARY, very compact (multiple data-bases don't require multiple
loads of the library) and well isolated. It is written in purely
standard C, so it is extremly portable. After linking to it you
can't get at the critical data to foul it up with legitimate code.

uthash looks pretty small to me so is unlikely to impact significantly
on size (the largest executable in the test set is 33K *with* debugging
information, only 13K optimised). Being macros rather than function
calls gives the compiler more opportunities to optimise.

It may be that hashlib is better than uthash, but you have just been
guilty of criticising a package without checking it yourself.

I'm sure that some of the quality OSS databases will have hashing
libraries as part of the package, and I believe that someone mentioned
one of these being on sourceforge.
--
Flash Gordon

**user923005** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

On May 21, 12:29 am, Flash Gordon <s...@flash-gordon.me.ukwro te:

CBFalconer wrote, On 21/05/08 05:42:
>
>
>
>
>

Eligiusz Narutowicz wrote:

Harald van D©¦k <true...@gmail. comwrites:
>CBFalconer wrote:
>>Eligiusz Narutowicz wrote:
>>>CBFalconer <cbfalco...@yah oo.comwrites:

>

>>>>I suspect you will find hashlib better than anything on
>>>>sourcefor ge (but I could be wrong). The problem is the finding.
>>>You are very wrong. Your hashlib is very primitive. There are
>>>much better offering things in various packages on source forge.
>>You are a troll, and have no knowledge of these things.
>Calling anyone a troll for suggesting your hashlib is not the best
>hashlib in the world is an extremely poor way to be taken
>seriously, if you ask me.

>

True enough. I lost my head at the idiotic comment. EN has
obviously never read and/or used hashlib.

Someone explains already better than my English could do about
what is on source forge. I find this CBFalconer to be a bit of
a big head and more wrong than right in his postings to these
groups. There are many of projects in source forge which contain
code from far better programmers than CBFalconer is every wanting
to be capable of. And for wider issues than he can imagine. I
have seen his code and it is not of the best calibre to be honest
with you. It is ok for sure but is not of the commercial types
found in source forge projects like rdbms sw.

>

Similarly your comments above. When you can't make specific
complaints, the comment is totally worthless. For example, I tried
to take a look at:

>

<http://sourceforge.net/projects/uthash/>

>

recommended by Dann Corbit a while back. That turns out to be a
non-portable system (it depends on Posix) implemented by macros. I
couldn't find a way to access any documentation on it.

>
It has documentation now. It specifies Posix, but the only non-standard
thing I can see is "exit(-1)" in a couple of places which could easily
be changed and some conversion of pointers to long (change that to
intptr_t) and the build system for the tests. I changed the build system
to add "-ansi -pedantic -Wall -Wextra" and got warnings about:
Reaching the end of main without a return
Unused argv/argc parameters
In one test a statement with no effect
In some (but not all) tests comparisons between signed and unsigned
In a few tests an unused variable 'key'
String lengths greater than 1024 in some tests
>
I don't believe any of those are indications that the hashing code is
actually dependant on Posix.
>
The macros use the "do { } while (0)" trick to make them easy to use.
>

Hashlib is
a LIBRARY, very compact (multiple data-bases don't require multiple
loads of the library) and well isolated. It is written in purely
standard C, so it is extremly portable. After linking to it you
can't get at the critical data to foul it up with legitimate code.

>
uthash looks pretty small to me so is unlikely to impact significantly
on size (the largest executable in the test set is 33K *with* debugging
information, only 13K optimised). Being macros rather than function
calls gives the compiler more opportunities to optimise.
>
It may be that hashlib is better than uthash, but you have just been
guilty of criticising a package without checking it yourself.
>
I'm sure that some of the quality OSS databases will have hashing
libraries as part of the package, and I believe that someone mentioned
one of these being on sourceforge.

I was able to build it on Windows in a few minutes. Had to make a few
simple changes to get it to work.
There are lots of other alternatives if you don't like uthash. One
nice thing about SourceForge is that there are lots of choices for
license types. You can usually find what you need.
This query gives 138 results:
(hash hashing hashmap) AND -has_file:(0)

**user923005** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

On May 21, 11:22 am, user923005 <dcor...@connx. comwrote:

On May 21, 12:29 am, Flash Gordon <s...@flash-gordon.me.ukwro te:
>
>
>
>
>

CBFalconer wrote, On 21/05/08 05:42:

>

Eligiusz Narutowicz wrote:
>Harald van D©¦k <true...@gmail. comwrites:
>>CBFalconer wrote:
>>>Eligiusz Narutowicz wrote:
>>>>CBFalcone r <cbfalco...@yah oo.comwrites:

>

>>>>>I suspect you will find hashlib better than anything on
>>>>>sourceforg e (but I could be wrong). The problem is the finding.
>>>>You are very wrong. Your hashlib is very primitive. There are
>>>>much better offering things in various packages on source forge.
>>>You are a troll, and have no knowledge of these things.
>>Calling anyone a troll for suggesting your hashlib is not the best
>>hashlib in the world is an extremely poor way to be taken
>>seriously, if you ask me.

>

True enough. I lost my head at the idiotic comment. EN has
obviously never read and/or used hashlib.
>Someone explains already better than my English could do about
>what is on source forge. I find this CBFalconer to be a bit of
>a big head and more wrong than right in his postings to these
>groups. There are many of projects in source forge which contain
>code from far better programmers than CBFalconer is every wanting
>to be capable of. And for wider issues than he can imagine. I
>have seen his code and it is not of the best calibre to be honest
>with you. It is ok for sure but is not of the commercial types
>found in source forge projects like rdbms sw.

>

Similarly your comments above. When you can't make specific
complaints, the comment is totally worthless. For example, I tried
to take a look at:

>

<http://sourceforge.net/projects/uthash/>

>

recommended by Dann Corbit a while back. That turns out to be a
non-portable system (it depends on Posix) implemented by macros. I
couldn't find a way to access any documentation on it.

>

It has documentation now. It specifies Posix, but the only non-standard
thing I can see is "exit(-1)" in a couple of places which could easily
be changed and some conversion of pointers to long (change that to
intptr_t) and the build system for the tests. I changed the build system
to add "-ansi -pedantic -Wall -Wextra" and got warnings about:
Reaching the end of main without a return
Unused argv/argc parameters
In one test a statement with no effect
In some (but not all) tests comparisons between signed and unsigned
In a few tests an unused variable 'key'
String lengths greater than 1024 in some tests

>

I don't believe any of those are indications that the hashing code is
actually dependant on Posix.

>

The macros use the "do { } while (0)" trick to make them easy to use.

>

Hashlib is
a LIBRARY, very compact (multiple data-bases don't require multiple
loads of the library) and well isolated. It is written in purely
standard C, so it is extremly portable. After linking to it you
can't get at the critical data to foul it up with legitimate code.

>

uthash looks pretty small to me so is unlikely to impact significantly
on size (the largest executable in the test set is 33K *with* debugging
information, only 13K optimised). Being macros rather than function
calls gives the compiler more opportunities to optimise.

>

It may be that hashlib is better than uthash, but you have just been
guilty of criticising a package without checking it yourself.

>

I'm sure that some of the quality OSS databases will have hashing
libraries as part of the package, and I believe that someone mentioned
one of these being on sourceforge.

>
I was able to build it on Windows in a few minutes. Had to make a few
simple changes to get it to work.
There are lots of other alternatives if you don't like uthash. One
nice thing about SourceForge is that there are lots of choices for
license types. You can usually find what you need.
This query gives 138 results:
(hash hashing hashmap) AND -has_file:(0)

Here is another bsd alternative:

google-sparsehash

http://sourceforge.net/projects/goog-sparsehash/

Download google-sparsehash for free. An extremely memory-efficient hash_map implementation. 2 bits/entry overhead!

Moved to here:

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/google-sparsehash/

It's C++, so maybe it does not fit the user requirements.

This is also BSD:

Libcfu

http://sourceforge.net/projects/libcfu/

Download Libcfu for free. Libcfu is a library of tools that I have found useful, particularly when developing multithreaded software. It includes a hash table, a linked list, self-extending strings, a config file parser, a simple timer, a thread queue, and command-line parser.

This has an MIT license, which is very similar to BSD:

libtc

http://sourceforge.net/projects/libtc/

Download libtc for free. Libtc is a collection of useful things. It currently includes a linked list, hash table, binary tree, configuration file parser, some string utilities, plus some functions often missing on some systems.

There are quite a few LGPL alternatives as well, which should be
acceptable for commercial applications.

**CBFalconer** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

Flash Gordon wrote:

CBFalconer wrote, On 21/05/08 05:42:
>

.... snip ...

>

>Hashlib is a LIBRARY, very compact (multiple data-bases don't
>require multiple loads of the library) and well isolated. It is
>written in purely standard C, so it is extremly portable. After
>linking to it you can't get at the critical data to foul it up
>with legitimate code.

>
uthash looks pretty small to me so is unlikely to impact
significantly on size (the largest executable in the test set is
33K *with* debugging information, only 13K optimised). Being
macros rather than function calls gives the compiler more
opportunities to optimise.
>
It may be that hashlib is better than uthash, but you have just
been guilty of criticising a package without checking it yourself.
>
I'm sure that some of the quality OSS databases will have hashing
libraries as part of the package, and I believe that someone
mentioned one of these being on sourceforge.

For comparison, the hashlib object code (linkable and relocatable)
amounts to 1964 bytes (under djgpp). With all the possible
debuggery etc. data included in the object file (i.e. before strip)
it expands to 11856 bytes. It will use more memory to store the
items, but that is totally unavoidable. To me, this makes 33k look
monstrous. :-)

Don't forget that that code is reusable. Multiple databases don't
require multiple loads of the object code. The data stored is all
in the users memory.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

** Posted from http://www.teranews.com **

**user923005** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

On May 21, 12:03 pm, user923005 <dcor...@connx. comwrote:

On May 21, 11:22 am, user923005 <dcor...@connx. comwrote:
>
>
>
>
>

On May 21, 12:29 am, Flash Gordon <s...@flash-gordon.me.ukwro te:

>

CBFalconer wrote, On 21/05/08 05:42:

>

Eligiusz Narutowicz wrote:
Harald van D©¦k <true...@gmail. comwrites:
>CBFalconer wrote:
>>Eligiusz Narutowicz wrote:
>>>CBFalconer <cbfalco...@yah oo.comwrites:

>

>>>>I suspect you will find hashlib better than anything on
>>>>sourcefor ge (but I could be wrong). The problem is the finding.
>>>You are very wrong. Your hashlib is very primitive. There are
>>>much better offering things in various packages on source forge.
>>You are a troll, and have no knowledge of these things.
>Calling anyone a troll for suggesting your hashlib is not the best
>hashlib in the world is an extremely poor way to be taken
>seriously, if you ask me.

>

True enough. I lost my head at the idiotic comment. EN has
obviously never read and/or used hashlib.
Someone explains already better than my English could do about
what is on source forge. I find this CBFalconer to be a bit of
a big head and more wrong than right in his postings to these
groups. There are many of projects in source forge which contain
code from far better programmers than CBFalconer is every wanting
to be capable of. And for wider issues than he can imagine. I
have seen his code and it is not of the best calibre to be honest
with you. It is ok for sure but is not of the commercial types
found in source forge projects like rdbms sw.

>

Similarly your comments above. When you can't make specific
complaints, the comment is totally worthless. For example, I tried
to take a look at:

>

<http://sourceforge.net/projects/uthash/>

>

recommended by Dann Corbit a while back. That turns out to be a
non-portable system (it depends on Posix) implemented by macros. I
couldn't find a way to access any documentation on it.

>

It has documentation now. It specifies Posix, but the only non-standard
thing I can see is "exit(-1)" in a couple of places which could easily
be changed and some conversion of pointers to long (change that to
intptr_t) and the build system for the tests. I changed the build system
to add "-ansi -pedantic -Wall -Wextra" and got warnings about:
Reaching the end of main without a return
Unused argv/argc parameters
In one test a statement with no effect
In some (but not all) tests comparisons between signed and unsigned
In a few tests an unused variable 'key'
String lengths greater than 1024 in some tests

>

I don't believe any of those are indications that the hashing code is
actually dependant on Posix.

>

The macros use the "do { } while (0)" trick to make them easy to use.

>

Hashlib is
a LIBRARY, very compact (multiple data-bases don't require multiple
loads of the library) and well isolated. It is written in purely
standard C, so it is extremly portable. After linking to it you
can't get at the critical data to foul it up with legitimate code.

>

uthash looks pretty small to me so is unlikely to impact significantly
on size (the largest executable in the test set is 33K *with* debugging
information, only 13K optimised). Being macros rather than function
calls gives the compiler more opportunities to optimise.

>

It may be that hashlib is better than uthash, but you have just been
guilty of criticising a package without checking it yourself.

>

I'm sure that some of the quality OSS databases will have hashing
libraries as part of the package, and I believe that someone mentioned
one of these being on sourceforge.

>

I was able to build it on Windows in a few minutes. Had to make a few
simple changes to get it to work.
There are lots of other alternatives if you don't like uthash. One
nice thing about SourceForge is that there are lots of choices for
license types. You can usually find what you need.
This query gives 138 results:
(hash hashing hashmap) AND -has_file:(0)

>
Here is another bsd alternative:http://sourceforge.net/projects/goog-sparsehash/
Moved to here:http://code.google.com/p/google-sparsehash/
It's C++, so maybe it does not fit the user requirements.
>
This is also BSD:http://sourceforge.net/projects/libcfu/
>
This has an MIT license, which is very similar to BSD:http://sourceforge.net/projects/libtc/
>
There are quite a few LGPL alternatives as well, which should be
acceptable for commercial applications.

This is an interesting project with MIT license:

libmba

http://www.ioplex.com/~miallen/libmba/

**Flash Gordon** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

CBFalconer wrote, On 21/05/08 20:31:

Flash Gordon wrote:

>CBFalconer wrote, On 21/05/08 05:42:

For comparison, the hashlib object code (linkable and relocatable)
amounts to 1964 bytes (under djgpp). With all the possible
debuggery etc. data included in the object file (i.e. before strip)
it expands to 11856 bytes. It will use more memory to store the
items, but that is totally unavoidable. To me, this makes 33k look
monstrous. :-)
>
Don't forget that that code is reusable. Multiple databases don't
require multiple loads of the object code. The data stored is all
in the users memory.

One app I work on is very small by modern standards, and it is a few
megabytes. Adding 33K or even 500K to that would be a drop in the ocean.
At that level I would be concerned with other things before executable size.
--
Flash Gordon

**CBFalconer** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

Flash Gordon wrote:

CBFalconer wrote, On 21/05/08 20:31:

>Flash Gordon wrote:

>>CBFalconer wrote, On 21/05/08 05:42:

>
<snip hashlib vs uthash>
>

>For comparison, the hashlib object code (linkable and relocatable)
>amounts to 1964 bytes (under djgpp). With all the possible
>debuggery etc. data included in the object file (i.e. before strip)
>it expands to 11856 bytes. It will use more memory to store the
>items, but that is totally unavoidable. To me, this makes 33k look
>monstrous. :-)
>>
>Don't forget that that code is reusable. Multiple databases don't
>require multiple loads of the object code. The data stored is all
>in the users memory.

>
One app I work on is very small by modern standards, and it is a few
megabytes. Adding 33K or even 500K to that would be a drop in the
ocean. At that level I would be concerned with other things before
executable size.

However small object size is not a disadvantage.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home .att.net>
Try the download section.

** Posted from http://www.teranews.com **

**user923005** · Jun 27 '08, 07:37 PM

Re: Hash function of structs

On May 21, 2:54 pm, CBFalconer <cbfalco...@yah oo.comwrote:

Flash Gordon wrote:

CBFalconer wrote, On 21/05/08 20:31:

Flash Gordon wrote:
>CBFalconer wrote, On 21/05/08 05:42:

>

>

For comparison, the hashlib object code (linkable and relocatable)
amounts to 1964 bytes (under djgpp). With all the possible
debuggery etc. data included in the object file (i.e. before strip)
it expands to 11856 bytes. It will use more memory to store the
items, but that is totally unavoidable. To me, this makes 33k look
monstrous. :-)

>

Don't forget that that code is reusable. Multiple databases don't
require multiple loads of the object code. The data stored is all
in the users memory.

>

One app I work on is very small by modern standards, and it is a few
megabytes. Adding 33K or even 500K to that would be a drop in the
ocean. At that level I would be concerned with other things before
executable size.

>
However small object size is not a disadvantage.

It may even be a requirement (e.g. embedded work).

**Flash Gordon** · Jun 27 '08, 07:38 PM

Re: Hash function of structs

user923005 wrote, On 21/05/08 23:27:

On May 21, 2:54 pm, CBFalconer <cbfalco...@yah oo.comwrote:

>Flash Gordon wrote:

>>CBFalconer wrote, On 21/05/08 20:31:
>>>Flash Gordon wrote:
>>>>CBFalcone r wrote, On 21/05/08 05:42:
>><snip hashlib vs uthash>
>>>For comparison, the hashlib object code (linkable and relocatable)
>>>amounts to 1964 bytes (under djgpp). With all the possible
>>>debuggery etc. data included in the object file (i.e. before strip)
>>>it expands to 11856 bytes. It will use more memory to store the
>>>items, but that is totally unavoidable. To me, this makes 33k look
>>>monstrous. :-)
>>>Don't forget that that code is reusable. Multiple databases don't
>>>require multiple loads of the object code. The data stored is all
>>>in the users memory.
>>One app I work on is very small by modern standards, and it is a few
>>megabytes. Adding 33K or even 500K to that would be a drop in the
>>ocean. At that level I would be concerned with other things before
>>executable size.

>However small object size is not a disadvantage.

>
It may even be a requirement (e.g. embedded work).

I did not say that small is a disadvantage. What I did say is that
uthash is not large and that the optimiser might be able to play some
additional tricks because it is macros.
--
Flash Gordon

Hash function of structs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment