Hi,
we use DB2 V8.1 on a Linux machine.
Situation: To store salary data we have designed a star schema
containing the dimension tables product, office, time and the the fact
table salaray comprising the measures.
The schema is like this:
products(id, product_code, product_name, product_categor y)
offices(id, office_code, location, state, country)
time(id,date)
salaries(produc t_id, office_id, time_id, salary)
Assume that the salary table is quite large containing up to 30,000,000
tuples.
Problem: There is a external software available to analyze the data more
advanced but need to be feed with a matrix like this
|office 1 ... n
------------------------
product 1| corresponding
.... | salaries
product m|
The time is predefined by the user for such an analysis and thus fix.
The number of offices n varies for the user specified country and ranges
currently between 20 and 130.
The number of products m is fix for every office i (1<=i<=n) and ranges
currently between 20,000 and 30,000 depending on the user specified
product category.
The attempt to join the product salaries for each office is very time
consuming and tedious. A query looks like this:
select product_code,t1 .salary,...,tn. salary
from products a, salaries t1, ..., salaries tn
where a.product_categ ory="..."
and a.id=ti.product _id
and ti.office_id=i --(for 1<=i<=n)
and ti.time_id=x
The last three line are appears n times (for each office)
In addition, often fail such queries with the message "Query too
complex" - I think the query is to large.
Therefore, we create temp tables containing the salaries for 5 offices
temp(product_id ,salary_office1 , ..., salary_office5) and so on
and then recursively materialize this temp tables into a new temp table
until the complete matrix is materialized.
We also tried to manipulate the data structure by a written Java program
but it also fails with "OutOfMemoryExc eption"
We are thankful for every hint that shows a better way to do this task.
Toralf
we use DB2 V8.1 on a Linux machine.
Situation: To store salary data we have designed a star schema
containing the dimension tables product, office, time and the the fact
table salaray comprising the measures.
The schema is like this:
products(id, product_code, product_name, product_categor y)
offices(id, office_code, location, state, country)
time(id,date)
salaries(produc t_id, office_id, time_id, salary)
Assume that the salary table is quite large containing up to 30,000,000
tuples.
Problem: There is a external software available to analyze the data more
advanced but need to be feed with a matrix like this
|office 1 ... n
------------------------
product 1| corresponding
.... | salaries
product m|
The time is predefined by the user for such an analysis and thus fix.
The number of offices n varies for the user specified country and ranges
currently between 20 and 130.
The number of products m is fix for every office i (1<=i<=n) and ranges
currently between 20,000 and 30,000 depending on the user specified
product category.
The attempt to join the product salaries for each office is very time
consuming and tedious. A query looks like this:
select product_code,t1 .salary,...,tn. salary
from products a, salaries t1, ..., salaries tn
where a.product_categ ory="..."
and a.id=ti.product _id
and ti.office_id=i --(for 1<=i<=n)
and ti.time_id=x
The last three line are appears n times (for each office)
In addition, often fail such queries with the message "Query too
complex" - I think the query is to large.
Therefore, we create temp tables containing the salaries for 5 offices
temp(product_id ,salary_office1 , ..., salary_office5) and so on
and then recursively materialize this temp tables into a new temp table
until the complete matrix is materialized.
We also tried to manipulate the data structure by a written Java program
but it also fails with "OutOfMemoryExc eption"
We are thankful for every hint that shows a better way to do this task.
Toralf