Multidimensional sparse arrays using linked lists

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • skr13
    New Member
    • Mar 2010
    • 7

    Multidimensional sparse arrays using linked lists

    Hi,

    I am implementing an aggregation model to compute data cubes.
    I have implemented the multiway array aggregation statically. That is if there are, say, 4 dimensions each of cardinality 16 and the chunk size of 4, I create chunkArray[4][4][4][4] and insert the measure at the appropriate position and aggregate the measure. The chunkArray is a sparse array and static implementation is not efficient. In order the dynamically create the chunkArray with measure stored only for the instances that occur I need the offset of each of the cell in the chunk.

    Eg: Dimension 3
    A[8], B[8], C[8] - each dimension with cardinality 8.
    If the relational database table is fully populated then there will be 8*8*8 instances in the table.
    If not, then some of the instances will be missing.
    a1,b1,c1,20
    a1,b1,c2,3
    .......
    .......
    a1,b1,c8,12

    This will be like the truth table with 3 variables each of which can take upto 8 values as compared to 2 values in the digital truth table.

    The aggregation will be done on chunks. If I have instances a1,b1,c1 - a1,b1,c3 - a1,b1,c8 - some of the instances will be missing. What I need then is an offset of the instance from the first instance.
    a1,b1,c1 offset=0
    a1,b1,c3 offset=2
    a1,b1,c8 offset=7

    Suppose the chunk size is 2 in all dimensions, the the total number of chunks will be 8. I will bring in the first 8 chunk cells and aggregate. But the problem is, in the first 8 cells only 3 are dense and rest 5 are not there. I am stuck in finding the offset of the chunk cell. If you have any idea, please let me know.
  • jkmyoung
    Recognized Expert Top Contributor
    • Mar 2006
    • 2057

    #2
    Could you give us a better idea of how you will be using the chunks in the future? I'm not sure what the difference is between this and a table sorted on a then b then c.

    Comment

    • skr13
      New Member
      • Mar 2010
      • 7

      #3
      I will be using the chunks to aggregate the values present in them. For eg,

      chunkArray[0][0][0] = 5;
      chunkArray[0][0][7] = 9;

      when i aggregate keeping the first 2 indices constant, i ll get 13.

      Say A represents CITY, B represent ITEM and C represents TIME.
      These are the 3 dimensions for the chunk.

      CITY ITEM TIME
      0- X 0-i1 0-t1
      1- Y 1-i2 1-t2
      2- Z 2-i3 2-t3

      the measure may be sales. keeing A=0 B=0 and aggregating alon C, I will get the sales in city X for item Y for all the times

      Comment

      • Banfa
        Recognized Expert Expert
        • Feb 2006
        • 9067

        #4
        Are you using C or C++? If you are using C++ and you expect the array to be quite sparse I imagine a map would work quite well.

        On the other hand if the array is only slightly sparse (only a few missing values) then it might be just as easy and less complicated to use a static array.

        Comment

        • weaknessforcats
          Recognized Expert Expert
          • Mar 2007
          • 9214

          #5
          One suggestion is to use an array of struct variables. One member is your data and the other is a binary that is true of the data is present and zero otherwise. This allows you to have zombie elements.

          That way your calculations to get to an element will always work out.

          Comment

          • skr13
            New Member
            • Mar 2010
            • 7

            #6
            I used linked lists to create an element in the list only if the data is present and used the dimensionality to calculate the offset. Now i am storing the data and the offset in each element of the list, thereby I have saved lot of memory.

            Comment

            • jkmyoung
              Recognized Expert Top Contributor
              • Mar 2006
              • 2057

              #7
              I agree with using an array (sorted) that only contains the values as opposed to a linked list, giving a small memory savings, and negating the need for offsets.

              The advantages I can see to linked lists occur if you are querying the first few elements repeatedly, or adding many elements after the initial creation.

              Code:
              struct Row{
              short a, b, c, value;
              }
              
              Row[] Table = new Row[100];
              This does really feel like database work

              Comment

              • skr13
                New Member
                • Mar 2010
                • 7

                #8
                I need to query the table. For eg, SELECT FROM TABLE where CITY=X and TIME=t1.
                For this i need to pre-compute the data cubes and store them. Actually I have the table for chunks.

                struct chunks
                {
                int value,offset;
                struct chunks *link;
                };

                It would be easier if the used arrays to compute data cubes but the problem is I will know the array dimension only during runtime. I would appreciate any other idea to compute the data cubes(aggregati on).

                Comment

                • weaknessforcats
                  Recognized Expert Expert
                  • Mar 2007
                  • 9214

                  #9
                  Originally posted by skr13
                  It would be easier if the used arrays to compute data cubes but the problem is I will know the array dimension only during runtime.
                  You can create the array at run-time after you know the dimensions.

                  I have written a SQL cursor manager to capture the results of a SELECT using arrays. You really need to avoid linked lists as they are too slow. You also avoid the stack since memory there is often restricted.

                  BTW: You didn't answer Banfa about whether you are using C or C++. If C++ you should be using a vector.

                  Comment

                  • skr13
                    New Member
                    • Mar 2010
                    • 7

                    #10
                    I am using C++. For each attribute in the table there will be 1 dimension. If the table has 20 dimensions then I will need a 20-dimensional array. This 20 dimensional array will be a sparse array. Linked lists are slow but the memory will not be wasted. And just curious, how do you create 20D array at the runtime?

                    Comment

                    • weaknessforcats
                      Recognized Expert Expert
                      • Mar 2007
                      • 9214

                      #11
                      C and C++ have only one-dimensional arrays. Read: http://bytes.com/topic/c/insights/77...rrays-revealed

                      The easiest thing to do is create an inheritance hierarchy for your attributes. The Attribute is the base class with derived classes for the varoius kinds-of attributes.

                      You create an object of a derived class an use it through a base class pointer.


                      So, create a vector<Attribut e*> and add all of the attributes for a row of your array:

                      Code:
                      vector<Attribute*> aThing;
                      Attribute* ptr = new Color(RED);   //Color derives from Attribute
                      aThing. push_back(ptr);
                      //
                      //Add the other attrutes that apply to aThing.
                      //
                      Then add the vector<Attribut e*> to your database:


                      Code:
                      vector< vector<Attribute*> > theDataBase;
                      
                      theDataBase.push_back(aThing);


                      Now your array has just the attributes for each elemen of the database.

                      You may need to use a Visitor with your Attribute hierarchy. There is an Insight article about how to do that.

                      Comment

                      • jkmyoung
                        Recognized Expert Top Contributor
                        • Mar 2006
                        • 2057

                        #12
                        I don't think there's an understanding of how we suggest you use the array:

                        Vector contents (V):

                        index : row(city, item, time, value)
                        [0] : (1, 1, 1, 20)
                        [1] : (1, 1, 2, 3)
                        [2] : (1, 1, 7, 5)
                        [3] : (1, 2, 2, 4)
                        ...
                        etc.

                        So if we want to sum where CITY=X and TIME=t1, we go:
                        Code:
                        long Sum(int city, int item, int time)
                        {
                        sum = 0;
                        for(int i = 0; i < V.length; i++){
                          Row r = V.ElementAt(i);
                          if (  ((city == 0) | | (city == r.city)) && //city restriction
                                ((item == 0) | | (item == r.item)) && //item restriction
                                ((time== 0) | | (time == r.time))) && //time restriction 
                              sum += r.value;
                        }
                        return sum;
                        }
                        
                        call this function like 
                        Sum(X, 0, t1)
                        0 means that the argument is not used.

                        Comment

                        • skr13
                          New Member
                          • Mar 2010
                          • 7

                          #13
                          I do get what you are saying. But I need to precompute the data cubes and store them and the fire the queries. If there are, say 3 dimension(attri butes), I need to abstract the dimensions and compute the cuboids AB, BC, AC, A, B and C. I have done this statically. After that I use a bit map indexing scheme to handle the queries. But I can't imagine n dimensions using linked lists. I have this idea. In the linked list, each node will have n pointers 1 for each dimension. It will be basically a tree wit multiple children. When I traverse along a path, I will be aggregating that particular direction. I would appreciate any idea regarding how to go about this implementation

                          Comment

                          • jkmyoung
                            Recognized Expert Top Contributor
                            • Mar 2006
                            • 2057

                            #14
                            Alright, basing it on your previous implementation somewhat, are you looking for something like:
                            Code:
                             class Node
                                {
                                    int index, value;
                                    ArrayList links;
                                    
                                    Node(){
                                        index = 0;
                                        value = 0;
                                        links = new ArrayList();
                                    }
                            
                                    Node(ArrayList indexes, int v)
                                    {
                                        int i = (int)(indexes[0]);
                                        index = i;
                                        value = v;
                                        links = new ArrayList();
                                        if (indexes.Count != 0)
                                        {
                                            indexes.RemoveAt(0);
                                            links.Add(new Node(indexes, v));
                                        }
                                    }
                                    
                                    void AddNode(ArrayList indexes, int v)
                                    {
                                        int i = (int)(indexes[0]);
                                        Node n = FindNode(i);
                                        if (n == null) //make new node
                                        {
                                            links.Add(new Node(indexes, v));
                                        } else { // add to existing node
                                            indexes.RemoveAt(0);
                                            n.addNode(indexes, v);
                                        }
                                        value += v;
                                    }
                            
                                    Node FindNode(int index){
                                        for (int i = 0; i < links.Count; i++)
                                        {
                                            Node n = (Node)(links[i]);
                                            if (n.index == index)
                                                return n;
                                        }
                                        return null;
                                    }
                            
                                    int QueryNode(ArrayList indexes)
                                    {
                                        if (indexes.Count == 0)
                                            return value;
                                        int i = (int)(indexes[0]);
                                        Node n = FindNode(i);
                                        if (n == null)
                                            return 0;
                            
                                        indexes.RemoveAt(0);
                                        return n.QueryNode(indexes);
                                    }
                            }
                            
                            Start with Node root = new Node(), and add, query your nodes from root. 
                            Eg City=X and TIME=t1
                            create new array list q: (x, t1);
                            return root.QueryNode(q);
                            Sorry, this is in C#; don't have a C++ editor at the moment. Just refactor ArrayLists to a Vector or int[].
                            realized after writing it, that I should have put the indexes in reverse order, but you can do that if you want.

                            Comment

                            • weaknessforcats
                              Recognized Expert Expert
                              • Mar 2007
                              • 9214

                              #15
                              index : row(city, item, time, value)
                              [0] : (1, 1, 1, 20)
                              [1] : (1, 1, 2, 3)
                              [2] : (1, 1, 7, 5)
                              [3] : (1, 2, 2, 4)
                              What I had in mind was:

                              1)create an array where [0] is city, [1] is item, [2] is time and [3] is value:

                              Code:
                                 //[0] : (1, 1, 1, 20)
                                  //[1] : (1, 1, 2, 3)
                                 // [2] : (1, 1, 7, 5)
                                 // [3] : (1, 2, 2, 4)
                              	int arr0[4] = {1,1,1,20};
                              	vector<int> v0(arr0, arr0+4);
                              	int arr1[4] = {1,1,2,3};
                              	vector<int> v1(arr1, arr1+4);
                              	int arr2[4] = {1,1,7,5};
                              	vector<int> v2(arr2, arr2+4);
                              	int arr3[4] = {1,2,2,4};
                              	vector<int> v3(arr2, arr3+4);
                              Then add these arrays to a larger array which is the database:

                              Code:
                              	vector<vector<int> > DataBase;
                              	DataBase.push_back(v0);
                              	DataBase.push_back(v1);
                              	DataBase.push_back(v2);
                              	DataBase.push_back(v3);
                              Then create a multimap for each attribute, Here is the one for city:

                              Code:
                              	pair<int, vector<int> > KeyAndValue;
                              	multimap<int, vector<int> > CityMap;
                              
                              	KeyAndValue.first = 1;
                              	KeyAndValue.second = DataBase[0];	
                              	CityMap.insert(KeyAndValue);
                              
                              	KeyAndValue.first = 1;
                              	KeyAndValue.second = DataBase[1];	
                              	CityMap.insert(KeyAndValue);
                              	
                              	KeyAndValue.first = 1;
                              	KeyAndValue.second = DataBase[2];	
                              	CityMap.insert(KeyAndValue);
                              	
                              	KeyAndValue.first = 1;
                              	KeyAndValue.second = DataBase[3];	
                              	CityMap.insert(KeyAndValue);
                              CityMap now has an entry for all city=1 in the DataBase.

                              To look for City = X, you just query the CityMap for all city=1. You do this with iterators. First, get the lower bound. Second, get the upper bound. Betweem these two iterators are all the entries for city = X:
                              Code:
                              	//Look for City = X:
                                  int X = 1;
                              	multimap<int, vector<int> >::iterator lower;
                              	lower = CityMap.lower_bound(1);
                              	multimap<int, vector<int> >::iterator upper;
                              	upper = CityMap.upper_bound(1);
                              Next, for time = t1, just iterator between the lower and upper bound checking each entry for a time == t1.
                              Code:
                                              int t1 = 1;
                              
                              	//All the City= X are between the upper and lower iterator
                              	multimap<int, vector<int> >::iterator  temp;
                              	temp = lower;
                              	while (temp != upper)
                              	{
                                      //check for time = t1
                              		//temp is actually pointing at a vector<int>.
                              		vector<int>::iterator itr = (temp->second).begin();
                              		while (itr != (temp->second).end())
                              		{
                              			//[0] is city [1] is item [2] is time and [3] is value
                                           if (itr[2] == t1)
                              			 {
                                                     //found a city=X and time=t1
                              			 }
                              			 ++itr;
                              		}
                              		++temp;
                              	}

                              Comment

                              Working...