Hash map implementation problem

**weaknessforcats** · Jun 5 '14, 04:33 PM

This looks like you need a second map. A map of groups:

Code:

map<pair<group_id, vector<hash_key> > groups

;

Now if there is a group, say group 34, you access the groups map using the group_id. There, you find the vector of hash_keys that represent the strings in the group. You iterate that vector and find 26,73,450. Using those numbers in that order you access the string database to fetch the strings for those hash keys. You create a new string and append the string for hash 25, then the string for hash 73 and finally the string for hash 450. And there is your group string.

**ushdusai** · Jun 5 '14, 05:23 PM

Then if in group[1] i have 26,73,450 and in group[2] i have 15,28,3, and i have to move hashkey 3 into the group of hashkey 73, I should not search in which group hashkey 3 and 73 are?this means linear search in group[1]..ecc until group[n], this is what i want to avoid, becuase i very slow, but maybe it's the only way to do it

**weaknessforcats** · Jun 5 '14, 10:45 PM

OK. Here is where you use an association index.

One side of the association index shows which hash_keys belong to which groups. This is

Code:

map<pair<hash_key, vector<group_id> >
The data in your example is:
3 <2>
15 <2>
26 <1>
28 <2>
73 <1>
450 <1>

The second side of the index associates group_id with hash_key.

Code:

map<pair<group_id, vector<hash_key> >

Code:

1 <26, 73, 450>
2 <15,28, 3>

So to move a string from group 2 to group 1 you:
1) Access the map<<pair<hash_ key, vector<group_id > > using 3 as the hash_key and remove 2 from the vector<2>. Then add 1 to this vector to get 3 <1>
2) Access the other index using the destination group_id, which is 1 and add hash_key 3 to the vector<26, 73, 450>
in the correct position. Maybe in the middle to get <26, 73,3, 450>
3) Access the other index again using the source group_id, which is 2 and remove 3 from the vector<15,28,3> to get vector<15,28>.

and your string has moved between groups. If you keep hash_key 3 in both groups, then you have done a copy just by having the hash_key appear in the group_id vector of 2 groups.

Notice you never did have to actually find the string itself.

**weaknessforcats** · Jun 5 '14, 10:59 PM

Check this out:

Associative entity - Wikipedia

http://en.wikipedia.org/wiki/Junction_table

**ushdusai** · Jun 6 '14, 01:40 AM

Ok i tried in this way, but with a vector as a groups instead of a map, so when i have the group_id, i can access to it directly, vector[group_id], and in each element of the vector i saved a map instead of a vector, with all the hashkeys, because now i have to search the element and the map's find is faster. I'll do some test to see if there is an improvement in the speed

**weaknessforcats** · Jun 6 '14, 04:27 AM

There should be a big improvement. A map<> uses a red/black tree based on a B-tree. The number of accesses using the tree should be much smaller than iterating a vector sequentially.

Let me know what happens.

**ushdusai** · Jun 6 '14, 05:01 PM

I tryed with
vector< map <int,int> > groups; // vector[group_id]= map<hash_key, string.size>
map<int,int> keyGroup; // map<hash_key, group_id>
It returns me correct values but is not faster than a vector<vector> that have a list of hashkey inside each position, where i have to do a linear search to find the hashkeys in input.
That's strange, maybe there is a waste of time with the indexing after merge.

**weaknessforcats** · Jun 7 '14, 02:09 AM

That's probably because there is no map method to advance to the next element. With a map each seek starts with the root of the three. A tree does not have a "next in the list" item.

Advancing a vector is just pointer arithmetic from the current position.

I would think a map would be faster if you are trying to locate a single key whereas a vector would be faster when iterating the entire vector.

BTW: A vector must be implemented as an array. That means overhead in expanding the array by adding elements. You might try a list<> instead of a vector<> since list<> is just a double linked list. list<> might be faster to add but slower to iterate. vector<> might be slower to add but faster to iterate.

Of course, you could create your vector reserving the maximum elements you expect. That would remove the slower to add but would add bloat to your app.

I am pleased you are getting correct values.

**ushdusai** · Jun 7 '14, 02:02 PM

what i'm not understanding is if i can use for my problem an hashtable instead of a map. Now i'm using two maps that have O(log(n)) for insert, find and remove because map is a RB-tree, while an hash table, that use buckets, for the same functions have O(1).
I think you have suggested this to me already, but is not clear to me the implementation of it for my problem. You think i can have improvments with this one and it's applicable to my problem? Considering also that my hashfunctions returns me high int value, and i don't use a modulus.

**weaknessforcats** · Jun 7 '14, 04:11 PM

You have a many-to-many situation. A string can belong to many groups and a group can have many strings. This makes the association index the solution for the situation.

First question: Does your program work?
If not, then get it working.

Here I would be writing a class with these containers as member variables which are accessed only through member functions of this class. This is basic encapsulation.

I would no code whatsoever outside this class member functions the used any of the maps, vectors, etc..

Once this is done, then you can alter the implementation of the class (but not the public interface) and experiment with various data access schemes.

One that occurs to me is to use something like Pro*C for this class and implement Oracle. Or MySQL, etc... These are all set up for foreign keys, which is what you are using.

**ushdusai** · Jun 8 '14, 09:44 PM

OK i'm here again, for the happiness of weaknessforcats . Yes it works, but it's slow. I made some modifications, that have speeded up my program of about 20 seconds, now it take about 1min to finish the test, but it have to take 20sec or less to finish.
I have used a vector<vector<i nt>>group for insert, search and merge the hashkeys and a map<int,int> for indexing the key with the hashkeys and the value with the groups id. Unless I change it, I think that, less than this, I can't do with this structure.
Now i'm asking what can i use, maybe some pointer, vector of class pointer. I'm sorry but I do not understand well what you said here:
"Here I would be writing a class with these containers as member variables which are accessed only through member functions of this class. This is basic encapsulation.
I would no code whatsoever outside this class member functions the used any of the maps, vectors, etc..".
You're suggesting to use some class method? Why would I benefit?

**weaknessforcats** · Jun 8 '14, 10:48 PM

You should hide the implementation of your vectors and maps.

For example:

obj.insert("wea knessforcats", "petshop");

Let's say that this method:
1)hashes the string and inserts it the string database
2)adds the string to the petshop group
3)adds the petshop group to the string groups. So how is the method implemented?

Answer: You don't care.

There are no hash keys and group ids. These have all been buried. The application uses strings.

If your code looks like this you can change the structure and implementation of obj without impacting any code that uses methods on this object.

You might even decide to assign specific virtual processors to handle the various databases and indexes instead of having the program thread do everything sequentially.

If you make a change in obj all you do is rebuild your code with the new obj code and your change is implemented in your program.

On the other hand, if you expose all these vector and map calls in 500 places in your program , now it's not so easy to make a change.

Part of using C++ is reuse. How can I reuse this already written and debugged code?

You are now branching off to object-oriented programming, which is beyond the subject of this thread. If you are not familiar with these concepts, this is a golden opportunity to research some of them and use them in the solution of your problem.

Hash map implementation problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment