Multidimensional sparse arrays using linked lists

**jkmyoung** · Mar 1 '10, 08:29 PM

Could you give us a better idea of how you will be using the chunks in the future? I'm not sure what the difference is between this and a table sorted on a then b then c.

**skr13** · Mar 1 '10, 08:36 PM

I will be using the chunks to aggregate the values present in them. For eg,

chunkArray[0][0][0] = 5;
chunkArray[0][0][7] = 9;

when i aggregate keeping the first 2 indices constant, i ll get 13.

Say A represents CITY, B represent ITEM and C represents TIME.
These are the 3 dimensions for the chunk.

CITY ITEM TIME
0- X 0-i1 0-t1
1- Y 1-i2 1-t2
2- Z 2-i3 2-t3

the measure may be sales. keeing A=0 B=0 and aggregating alon C, I will get the sales in city X for item Y for all the times

**Banfa** · Mar 1 '10, 11:51 PM

Are you using C or C++? If you are using C++ and you expect the array to be quite sparse I imagine a map would work quite well.

On the other hand if the array is only slightly sparse (only a few missing values) then it might be just as easy and less complicated to use a static array.

**weaknessforcats** · Mar 2 '10, 05:14 PM

One suggestion is to use an array of struct variables. One member is your data and the other is a binary that is true of the data is present and zero otherwise. This allows you to have zombie elements.

That way your calculations to get to an element will always work out.

**skr13** · Mar 2 '10, 08:05 PM

I used linked lists to create an element in the list only if the data is present and used the dimensionality to calculate the offset. Now i am storing the data and the offset in each element of the list, thereby I have saved lot of memory.

**jkmyoung** · Mar 2 '10, 10:34 PM

I agree with using an array (sorted) that only contains the values as opposed to a linked list, giving a small memory savings, and negating the need for offsets.

The advantages I can see to linked lists occur if you are querying the first few elements repeatedly, or adding many elements after the initial creation.

Code:

struct Row{
short a, b, c, value;
}

Row[] Table = new Row[100];

This does really feel like database work

**skr13** · Mar 2 '10, 10:45 PM

I need to query the table. For eg, SELECT FROM TABLE where CITY=X and TIME=t1.
For this i need to pre-compute the data cubes and store them. Actually I have the table for chunks.

struct chunks
{
int value,offset;
struct chunks *link;
};

It would be easier if the used arrays to compute data cubes but the problem is I will know the array dimension only during runtime. I would appreciate any other idea to compute the data cubes(aggregati on).

**weaknessforcats** · Mar 3 '10, 05:26 PM

Originally posted by skr13

It would be easier if the used arrays to compute data cubes but the problem is I will know the array dimension only during runtime.

You can create the array at run-time after you know the dimensions.

I have written a SQL cursor manager to capture the results of a SELECT using arrays. You really need to avoid linked lists as they are too slow. You also avoid the stack since memory there is often restricted.

BTW: You didn't answer Banfa about whether you are using C or C++. If C++ you should be using a vector.

**skr13** · Mar 3 '10, 05:37 PM

I am using C++. For each attribute in the table there will be 1 dimension. If the table has 20 dimensions then I will need a 20-dimensional array. This 20 dimensional array will be a sparse array. Linked lists are slow but the memory will not be wasted. And just curious, how do you create 20D array at the runtime?

**weaknessforcats** · Mar 3 '10, 06:24 PM

C and C++ have only one-dimensional arrays. Read: http://bytes.com/topic/c/insights/77...rrays-revealed

The easiest thing to do is create an inheritance hierarchy for your attributes. The Attribute is the base class with derived classes for the varoius kinds-of attributes.

You create an object of a derived class an use it through a base class pointer.

So, create a vector<Attribut e*> and add all of the attributes for a row of your array:

Code:

vector<Attribute*> aThing;
Attribute* ptr = new Color(RED);   //Color derives from Attribute
aThing. push_back(ptr);
//
//Add the other attrutes that apply to aThing.
//

Then add the vector<Attribut e*> to your database:

Code:

vector< vector<Attribute*> > theDataBase;

theDataBase.push_back(aThing);

Now your array has just the attributes for each elemen of the database.

You may need to use a Visitor with your Attribute hierarchy. There is an Insight article about how to do that.

**jkmyoung** · Mar 3 '10, 07:01 PM

I don't think there's an understanding of how we suggest you use the array:

Vector contents (V):

index : row(city, item, time, value)
[0] : (1, 1, 1, 20)
[1] : (1, 1, 2, 3)
[2] : (1, 1, 7, 5)
[3] : (1, 2, 2, 4)
...
etc.

So if we want to sum where CITY=X and TIME=t1, we go:

Code:

long Sum(int city, int item, int time)
{
sum = 0;
for(int i = 0; i < V.length; i++){
  Row r = V.ElementAt(i);
  if (  ((city == 0) | | (city == r.city)) && //city restriction
        ((item == 0) | | (item == r.item)) && //item restriction
        ((time== 0) | | (time == r.time))) && //time restriction 
      sum += r.value;
}
return sum;
}

call this function like 
Sum(X, 0, t1)

0 means that the argument is not used.

**skr13** · Mar 4 '10, 03:52 AM

I do get what you are saying. But I need to precompute the data cubes and store them and the fire the queries. If there are, say 3 dimension(attri butes), I need to abstract the dimensions and compute the cuboids AB, BC, AC, A, B and C. I have done this statically. After that I use a bit map indexing scheme to handle the queries. But I can't imagine n dimensions using linked lists. I have this idea. In the linked list, each node will have n pointers 1 for each dimension. It will be basically a tree wit multiple children. When I traverse along a path, I will be aggregating that particular direction. I would appreciate any idea regarding how to go about this implementation

**jkmyoung** · Mar 4 '10, 06:03 PM

Alright, basing it on your previous implementation somewhat, are you looking for something like:

Code:

 class Node
    {
        int index, value;
        ArrayList links;
        
        Node(){
            index = 0;
            value = 0;
            links = new ArrayList();
        }

        Node(ArrayList indexes, int v)
        {
            int i = (int)(indexes[0]);
            index = i;
            value = v;
            links = new ArrayList();
            if (indexes.Count != 0)
            {
                indexes.RemoveAt(0);
                links.Add(new Node(indexes, v));
            }
        }
        
        void AddNode(ArrayList indexes, int v)
        {
            int i = (int)(indexes[0]);
            Node n = FindNode(i);
            if (n == null) //make new node
            {
                links.Add(new Node(indexes, v));
            } else { // add to existing node
                indexes.RemoveAt(0);
                n.addNode(indexes, v);
            }
            value += v;
        }

        Node FindNode(int index){
            for (int i = 0; i < links.Count; i++)
            {
                Node n = (Node)(links[i]);
                if (n.index == index)
                    return n;
            }
            return null;
        }

        int QueryNode(ArrayList indexes)
        {
            if (indexes.Count == 0)
                return value;
            int i = (int)(indexes[0]);
            Node n = FindNode(i);
            if (n == null)
                return 0;

            indexes.RemoveAt(0);
            return n.QueryNode(indexes);
        }
}

Start with Node root = new Node(), and add, query your nodes from root. 
Eg City=X and TIME=t1
create new array list q: (x, t1);
return root.QueryNode(q);

Sorry, this is in C#; don't have a C++ editor at the moment. Just refactor ArrayLists to a Vector or int[].
realized after writing it, that I should have put the indexes in reverse order, but you can do that if you want.

**weaknessforcats** · Mar 4 '10, 06:13 PM

index : row(city, item, time, value)
[0] : (1, 1, 1, 20)
[1] : (1, 1, 2, 3)
[2] : (1, 1, 7, 5)
[3] : (1, 2, 2, 4)

What I had in mind was:

1)create an array where [0] is city, [1] is item, [2] is time and [3] is value:

Code:

   //[0] : (1, 1, 1, 20)
    //[1] : (1, 1, 2, 3)
   // [2] : (1, 1, 7, 5)
   // [3] : (1, 2, 2, 4)
	int arr0[4] = {1,1,1,20};
	vector<int> v0(arr0, arr0+4);
	int arr1[4] = {1,1,2,3};
	vector<int> v1(arr1, arr1+4);
	int arr2[4] = {1,1,7,5};
	vector<int> v2(arr2, arr2+4);
	int arr3[4] = {1,2,2,4};
	vector<int> v3(arr2, arr3+4);

Then add these arrays to a larger array which is the database:

Code:

	vector<vector<int> > DataBase;
	DataBase.push_back(v0);
	DataBase.push_back(v1);
	DataBase.push_back(v2);
	DataBase.push_back(v3);

Then create a multimap for each attribute, Here is the one for city:

Code:

	pair<int, vector<int> > KeyAndValue;
	multimap<int, vector<int> > CityMap;

	KeyAndValue.first = 1;
	KeyAndValue.second = DataBase[0];	
	CityMap.insert(KeyAndValue);

	KeyAndValue.first = 1;
	KeyAndValue.second = DataBase[1];	
	CityMap.insert(KeyAndValue);
	
	KeyAndValue.first = 1;
	KeyAndValue.second = DataBase[2];	
	CityMap.insert(KeyAndValue);
	
	KeyAndValue.first = 1;
	KeyAndValue.second = DataBase[3];	
	CityMap.insert(KeyAndValue);

CityMap now has an entry for all city=1 in the DataBase.

To look for City = X, you just query the CityMap for all city=1. You do this with iterators. First, get the lower bound. Second, get the upper bound. Betweem these two iterators are all the entries for city = X:

Code:

	//Look for City = X:
    int X = 1;
	multimap<int, vector<int> >::iterator lower;
	lower = CityMap.lower_bound(1);
	multimap<int, vector<int> >::iterator upper;
	upper = CityMap.upper_bound(1);

Next, for time = t1, just iterator between the lower and upper bound checking each entry for a time == t1.

Code:

                int t1 = 1;

	//All the City= X are between the upper and lower iterator
	multimap<int, vector<int> >::iterator  temp;
	temp = lower;
	while (temp != upper)
	{
        //check for time = t1
		//temp is actually pointing at a vector<int>.
		vector<int>::iterator itr = (temp->second).begin();
		while (itr != (temp->second).end())
		{
			//[0] is city [1] is item [2] is time and [3] is value
             if (itr[2] == t1)
			 {
                       //found a city=X and time=t1
			 }
			 ++itr;
		}
		++temp;
	}

Multidimensional sparse arrays using linked lists

Multidimensional sparse arrays using linked lists

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment