Hash map implementation problem

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ushdusai
    New Member
    • Jun 2014
    • 12

    Hash map implementation problem

    My problem is to find a good implementation that will allow me, once transformed strings in hash keys, to do, if I had at the beginning this:
    Group 1: string12, string21
    Group 2: string364 string128
    and after i do this:
    move string128 string21
    the result will be:
    Group 1: string12, string21, string128
    Group 2: string364
    I know, nothing so complicated but my implementation is still slow, it should be faster.
    I'm using a vector<vector<i nt>>, where i save the hash_key.
    What should i use?(i can't use unordered_map of c++11!)
    I have to save the hash keys in a map?in a vector? or i have to use a vector<vector> or vector<map>??? I'm confused and I hope someone can help me because it's very important for me.
  • weaknessforcats
    Recognized Expert Expert
    • Mar 2007
    • 9214

    #2
    If your hash keys are in vector<vector<i nt> then you need t iterate the vector to find the vector<int> for the hash key you are looking up.

    Have you tried a map<pair <int, vector<int> >?

    You should be able to quickly look up the key and then iterate vector<int> to fetch up all the strings that hashed to the key value.

    Comment

    • ushdusai
      New Member
      • Jun 2014
      • 12

      #3
      I have not explained the problem well because i can have also this case:

      Group 1: string12, string21, string128
      Group 2: string364
      and after i do this:
      move string128 string364
      the result will be:
      Group 1: string12, string21
      Group 2: string364 string128
      So i have to iterate over all vectors to search for the key??

      Comment

      • weaknessforcats
        Recognized Expert Expert
        • Mar 2007
        • 9214

        #4
        The idea is to use a chain-link table. One implementation is an array and a linked list. In this scenario let's assume your hash vales are between 0 and 1000. You create an array where the element index is the hash value. Then when you hash to, say,275 you go to element 275 of the array and there is the address of a linked list of all items that have hashed to 275. Then you iterate the linked list to find the one you want.

        My suggestion was to use a map<> instead of an array and to use a vector<> (or perhaps a list<>) for the linked list.

        In my suggestion when you hash to 275 you locate the pair in the map<> and then iterate the vector<> in that pair<>. You would need to iterate only this one vector<>. Depending upon how many keys are in the map<> and the efficiency of your hash algorithm, this vector<> may have very few elements.

        There are new hash templates for C++ that might help you.

        Comment

        • lu8890
          New Member
          • Jun 2014
          • 1

          #5
          can you do something like this (in c#):

          dictionary<stri ng, List<string>>?

          or, if input strings needs to be unique, then
          dictionary<stri ng, dictionary<stri ng, string>>?

          Comment

          • weaknessforcats
            Recognized Expert Expert
            • Mar 2007
            • 9214

            #6
            We are trying to do this in C++.

            Start a thread in C# forum to ask C# questions.

            Comment

            • ushdusai
              New Member
              • Jun 2014
              • 12

              #7
              I've tried using many data structures but is still slow, nothing has increased the speed. Other operation is also(thought it was not necessary but maybe it is), merge group of string1 with group of string2. Example:
              Group 1: string12, string21, string128
              Group 2: string364
              merge string128 string364
              Group 1:
              Group 2: string364,strin g128,string12,s tring21
              I've used(most successful):
              - vector< vector<int>> but is slow! (linear search N*N)
              - vector<map<hash _key,int>> but is also slow, more than vector?!(maybe operations like insert or erase are not costant like vector, but logN)
              Weaknessforcats what i have to save in the vector, the hask_key right?

              Comment

              • weaknessforcats
                Recognized Expert Expert
                • Mar 2007
                • 9214

                #8
                Did you try map<pair<hash_k ey, vector<string> >?

                vector<string> will contain all strings hashed to the hash_key value.

                Comment

                • ushdusai
                  New Member
                  • Jun 2014
                  • 12

                  #9
                  Yes i did , but if i have:
                  string1 = hashKey1
                  string2 = hashKey2
                  ...
                  stringN = hashKeyN

                  and this case:
                  move string2 in string 1
                  move string3 in string 1
                  move string4 in string 1
                  move string5 in string 2
                  move string6 in string 2
                  hashKey1,vector <string>{string 2,string3,strin g4}
                  hashKey2,vector <string>{string 5,string6}
                  hashKey3,vector <string>{}
                  hashKey4,vector <string>{}
                  hashKey5,vector <string>{}
                  hashKey6,vector <string>{}

                  now if i want to merge string4 with stringN or print with many elements string4 is, first i have to iterate over all vectors to find string4, is this what you mean? I thought there was a way to accelerate all. anyway thanks

                  Comment

                  • weaknessforcats
                    Recognized Expert Expert
                    • Mar 2007
                    • 9214

                    #10
                    How would you know there is a string4? All that's in the vector is the string contents. If you already know the string contents, then you don't need to look the data up.

                    All the data has been hashed. Presumably all you have are the hash keys so you can look up the data. There is no string4.

                    string4 is a variable in your program. All that's in your executable is binary content. The "string4" is for the compiler.

                    For lookup you start with the hash key. Presumably, the hash keys have been saved and not the strings. You locate the has key in the map and if more than one string has hashed to that hash value, there will be a vector of strings that have hashed to the same value.

                    This is not a good setup to go look for data without hash key.

                    Comment

                    • ushdusai
                      New Member
                      • Jun 2014
                      • 12

                      #11
                      I'm confused, you said to use map<pair<hash_k ey, vector<string> >, so I figured to put inside the vector, the string and not the hash value, btw was an example. I believe that my solution is more like an hash table, that i'm studying , because i can find an element in O(1). But if i give you this case, the simplest one, so you can understand if i'm thinking wrong,if i have: move string1 in string2
                      1) where this string must be saved, in the vector? and vector of which hashkey?
                      2) if i have: print element of string1, it must print 2 and string2 too, because they are in the same ipotetic "group", this means to insert string1 in string2 and vieversa?
                      thank very much for your help, sorry my bad english, but i'm having a little bit of trouble with this argouments

                      Comment

                      • weaknessforcats
                        Recognized Expert Expert
                        • Mar 2007
                        • 9214

                        #12
                        You need to make a pair<>. The first member is the hash key and the second member is the vector<string>. You put the string in the vector.

                        The pair<> is inserted in the map<>. Next time you hash a string you access the map to see if the hash key already exists. If it does, you add the new string to the vector. Otherwise you create a new pair<> and add it to the map<>.

                        So the structure is the pair<> which associates a vector of all strings that hash to the same value. The pairs are inserted in the map so you can access based on hash key.

                        BTW: Your English is fine.

                        Comment

                        • ushdusai
                          New Member
                          • Jun 2014
                          • 12

                          #13
                          What you are saying is for when i have two strings with the same hashkey, so i have to save in the same hashkey a vector with all the strings that are referred to that hashkey. This is not a real problem for me because i use a function(and tested and it works) that returns me a unique hashkey for the all strings that i have in input. The problem is when i have to move, merge the strings. I need a fast looukup that knows where the string is, or that it may know through a link.

                          Comment

                          • weaknessforcats
                            Recognized Expert Expert
                            • Mar 2007
                            • 9214

                            #14
                            If your keys are unique then don't use a vector in the pair<>. Instead use the string itself:

                            pair<hash_key, string>

                            Insert this in the map and you can access the string by using the unique hash key.

                            Remember, pair<> has two members 1)first and 2)second. So you So if your map is now:

                            Code:
                            map<pair<hash_key, string> >  database;
                            then database[251] or database.find(2 51) will access the pair<> with that hash_key. Therefore, database[251].first is your hash_key and database[251].second is your string.

                            Comment

                            • ushdusai
                              New Member
                              • Jun 2014
                              • 12

                              #15
                              Ok perfect, this is what i did, a map<pair<hash_k ey, string> > with all the hashkey and the strings. But now, i should do a vector(that i figured that will be my groups) with the string that i move from a group to another, that i merge ecc..
                              so in vector[1] i put hashkey1 and in vector[2] hashkey2.., when i have to move string1 with string2 i move hashkey2 from vector[2] into vector[1], idem with the merge. But problem is that i have to scan all the vectors to find the hashkeys in input in which vector(figured how groups) are, and this is very very slow.

                              Comment

                              Working...