STL hash_map hash

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Murali

    STL hash_map hash

    I have a requirement where I have to use two unsigned ints as a key in a STL
    hash map.
    A couple of ways to do this is
    1. create a struct with two unsigned ints and use that as key (write my own
    HashFcn and EqualKey template args) or,
    2. convert the two unsigned ints to char*s, concatenate them and use that as
    Key.

    For method 1, the difficulty I am having is in writing the HashFcn. HashFcn
    requires the following method
    size_t operator()(cons t T& x)
    I cannot create a unique hash for two unsigned ints for the hash_map that is
    a size_t.

    To try method 2, I thought of first trying with just one unsigned int
    converted to a char* and use that as Key.
    The result of it was that the hash key generated happened to be the same for
    two different unsigned ints
    causing one of them to be dropped.

    Please suggest solution that would work to create an STL hash_map with 2
    unsigned ints as Key.

    Here is the code that does that:

    #include <iostream>
    #include <ext/hash_map>

    using namespace std;
    using namespace __gnu_cxx;

    struct eqstr
    {
    bool operator()(cons t char* s1, const char* s2) const
    {
    return strcmp(s1, s2) == 0;
    }
    };

    int main()
    {
    hash_map<const char*, unsigned int, hash<const char*>, eqstr> hash_obj;

    hash<const char*> hashfun;

    char key[32];

    sprintf(key, "%u", (((unsigned int)~0) / 2) + 250000000);
    hash_obj[key] = (((unsigned int)~0) / 2) + 250000000;
    cout << "Hash for " << key << " = " << hashfun(key) << endl;

    sprintf(key, "%u", (((unsigned int)~0) / 2) + 300000000);
    hash_obj[key] = (((unsigned int)~0) / 2) + 300000000;
    cout << "Hash for " << key << " = " << hashfun(key) << endl;

    for(hash_map<co nst char*, unsigned int, hash<const char*>,
    eqstr>::const_i terator i = hash_obj.begin( );
    i != hash_obj.end(); i++)
    {
    cout << "hash_obj[" << (*i).first << "] = " << (*i).second << endl;
    }
    }

    /* g++ /usr/include/c++/3.3 -o post post.cc -lstdc++ */

    The output was
    Hash for 2397483647 = 123096165
    Hash for 2447483647 = 123096165
    hash_obj[2447483647] = 2447483647



  • Jeff Schwab

    #2
    Re: STL hash_map hash

    Murali wrote:[color=blue]
    > I have a requirement where I have to use two unsigned ints as a key in a STL
    > hash map.[/color]

    There is no "hash map" in the STL.

    Comment

    • David Harmon

      #3
      Re: STL hash_map hash

      On Sun, 08 Feb 2004 20:10:03 GMT in comp.lang.c++, "Murali"
      <murali@email.c om> was alleged to have written:[color=blue]
      >I cannot create a unique hash for two unsigned ints for the hash_map that is
      >a size_t.[/color]

      You fundamentally cannot create a unique hash of size N bits for all
      keys of size M > N bits. That is part of what hashing is all about.
      You want a hash function such that the mapping of the keys to the hash
      values is uncorrelated with and looks random relative to the key values
      that you are likely to encounter.

      Doesn't hash_map come with a suitable default hashing function? It's
      not part of standard C++ yet so I wouldn't know. ;-)

      Lacking a default I would probably try using CRC32 over all the bytes of
      the key. Google "CRC32 source code"
      [color=blue]
      >The result of it was that the hash key generated happened to be the same for
      >two different unsigned ints causing one of them to be dropped.[/color]

      Neither should be dropped if EqualKey says they are not equal, even if
      the hash is the same.

      Don't waste time converting your keys to strings, if you can help it.

      Comment

      • John Harrison

        #4
        Re: STL hash_map hash


        "Murali" <murali@email.c om> wrote in message
        news:vIwVb.748$ t16.890106@news svr28.news.prod igy.com...[color=blue]
        > I have a requirement where I have to use two unsigned ints as a key in a[/color]
        STL[color=blue]
        > hash map.
        > A couple of ways to do this is
        > 1. create a struct with two unsigned ints and use that as key (write my[/color]
        own[color=blue]
        > HashFcn and EqualKey template args) or,
        > 2. convert the two unsigned ints to char*s, concatenate them and use that[/color]
        as[color=blue]
        > Key.
        >
        > For method 1, the difficulty I am having is in writing the HashFcn.[/color]
        HashFcn[color=blue]
        > requires the following method
        > size_t operator()(cons t T& x)
        > I cannot create a unique hash for two unsigned ints for the hash_map that[/color]
        is[color=blue]
        > a size_t.[/color]

        You don't have to create a unique hash. Hash tables work perfectly well
        without unique hash values. Just exclusive-or the two integers together,
        that's a prefectly good hash function.
        [color=blue]
        >
        > To try method 2, I thought of first trying with just one unsigned int
        > converted to a char* and use that as Key.
        > The result of it was that the hash key generated happened to be the same[/color]
        for[color=blue]
        > two different unsigned ints
        > causing one of them to be dropped.[/color]

        No that does not happen when you have duplicate hash values. Duplicate hash
        values are called 'collisions'. The skill in writing a hashing algorithm is
        in dealing with the collisions, but they do not stop a hash table working.
        [color=blue]
        >
        > Please suggest solution that would work to create an STL hash_map with 2
        > unsigned ints as Key.
        >
        > Here is the code that does that:
        >
        > #include <iostream>
        > #include <ext/hash_map>
        >
        > using namespace std;
        > using namespace __gnu_cxx;
        >
        > struct eqstr
        > {
        > bool operator()(cons t char* s1, const char* s2) const
        > {
        > return strcmp(s1, s2) == 0;
        > }
        > };
        >
        > int main()
        > {
        > hash_map<const char*, unsigned int, hash<const char*>, eqstr>[/color]
        hash_obj;[color=blue]
        >
        > hash<const char*> hashfun;
        >
        > char key[32];
        >
        > sprintf(key, "%u", (((unsigned int)~0) / 2) + 250000000);
        > hash_obj[key] = (((unsigned int)~0) / 2) + 250000000;
        > cout << "Hash for " << key << " = " << hashfun(key) << endl;
        >
        > sprintf(key, "%u", (((unsigned int)~0) / 2) + 300000000);[/color]

        Here's your problem, you are using key as your key, but you are overwriting
        it when you add the second value.

        Try this

        char key1[32];
        char key2[32];

        sprintf(key1, "%u", (((unsigned int)~0) / 2) + 250000000);
        hash_obj[key1] = (((unsigned int)~0) / 2) + 250000000;
        cout << "Hash for " << key << " = " << hashfun(key1) << endl;
        sprintf(key2, "%u", (((unsigned int)~0) / 2) + 300000000);
        hash_obj[key2] = (((unsigned int)~0) / 2) + 300000000;
        cout << "Hash for " << key << " = " << hashfun(key2) << endl;>

        or just go back to using two unsigned int, instead of strings.

        Incidentally how dod you think that the STL generates unique hash values for
        indefinte length strings and fits them into a size_t? Impossible, no?
        [color=blue]
        > hash_obj[key] = (((unsigned int)~0) / 2) + 300000000;
        > cout << "Hash for " << key << " = " << hashfun(key) << endl;>
        > for(hash_map<co nst char*, unsigned int, hash<const char*>,
        > eqstr>::const_i terator i = hash_obj.begin( );
        > i != hash_obj.end(); i++)
        > {
        > cout << "hash_obj[" << (*i).first << "] = " << (*i).second <<[/color]
        endl;[color=blue]
        > }
        > }
        >
        > /* g++ /usr/include/c++/3.3 -o post post.cc -lstdc++ */
        >
        > The output was
        > Hash for 2397483647 = 123096165
        > Hash for 2447483647 = 123096165
        > hash_obj[2447483647] = 2447483647
        >
        >[/color]

        john



        Comment

        Working...