Re: MDC vs. cluster index: which to use?

**Henry J.** · Oct 21 '08, 02:25 PM

Re: MDC vs. cluster index: which to use?

On Oct 21, 1:24 am, Ian <ianb...@mobile audio.comwrote:

Henry J. wrote:
>

MDC on type_id? So cluster index is not a good idea?

>
MDC *guarantees* clustering, whereas a table with a clustering index
will eventually require maintenance (a.k.a. reorg) to maintain the
cluster ratio.
>
That's not to say that a clustering index isn't still valuable
(especially for high cardinality columns that aren't a reasonable
candidate as an MDC dimension).

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

**ChrisC** · Oct 21 '08, 03:25 PM

Re: MDC vs. cluster index: which to use?

On Oct 21, 7:18 am, "Henry J." <tank209...@yah oo.comwrote:

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

MDC still might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC + index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.

-Chris

**Henry J.** · Oct 21 '08, 05:05 PM

Re: MDC vs. cluster index: which to use?

On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:

On Oct 21, 7:18 am, "Henry J." <tank209...@yah oo.comwrote:
>

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

>
MDC still might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC + index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.
>
You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.
>
-Chris

Thanks Chris.

The primary key of the table is (type_id, product_id). Now if we add
a MDC on type_id, will the MDC be actually used if we query like the
following, or the primary key will be used only? I guess even if only
the primary key is used, the I/O will still be benefited.

select client.client_i d, sum( prod.quantity * prod.price *
client.ordersiz e )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_ id
group by client.client_i d

Also, would the following query benefit from both the MDC and the
primary key?

select *
from prod
where prod.type_id = 4 and prod.product_id = 101

Or, instead of the primary key, just use an index on (product_id)
would cover both of the above queries?

I'll certainly test to find out. But I'd like to gain better
understanding about indexing. Also, the turnaround of making such
changes are not as fast as I want and I hope to get to the best
solution a little faster. Thanks.

**Ian** · Oct 21 '08, 06:35 PM

Re: MDC vs. cluster index: which to use?

Henry J. wrote:

On Oct 21, 1:24 am, Ian <ianb...@mobile audio.comwrote:

>Henry J. wrote:
>>

>>MDC on type_id? So cluster index is not a good idea?

>MDC *guarantees* clustering, whereas a table with a clustering index
>will eventually require maintenance (a.k.a. reorg) to maintain the
>cluster ratio.
>>
>That's not to say that a clustering index isn't still valuable
>(especially for high cardinality columns that aren't a reasonable
>candidate as an MDC dimension).

>
For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

By purging it every day, you're going to make it nearly impossible
for a clustering index to be useful. Unless your data comes in
already sorted by type_id,product _id the cluster ratio is probably
going to be very low, regardless of what you set pctfree to.

Furthermore, will a clustering index really help with queries that
aggregating on the clustering key? Maybe, but maybe not -- that
depends on whether the query is doing an index scan / fetch. If
it's just doing a tablescan, you'll still need to do a sort.

**Henry J.** · Oct 21 '08, 10:55 PM

Re: MDC vs. cluster index: which to use?

On Oct 21, 2:32 pm, Ian <ianb...@mobile audio.comwrote:

Henry J. wrote:

On Oct 21, 1:24 am, Ian <ianb...@mobile audio.comwrote:

Henry J. wrote:

>

>MDC on type_id? So cluster index is not a good idea?
MDC *guarantees* clustering, whereas a table with a clustering index
will eventually require maintenance (a.k.a. reorg) to maintain the
cluster ratio.

>

That's not to say that a clustering index isn't still valuable
(especially for high cardinality columns that aren't a reasonable
candidate as an MDC dimension).

>

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

>
By purging it every day, you're going to make it nearly impossible
for a clustering index to be useful. Unless your data comes in
already sorted by type_id,product _id the cluster ratio is probably
going to be very low, regardless of what you set pctfree to.
>
Furthermore, will a clustering index really help with queries that
aggregating on the clustering key? Maybe, but maybe not -- that
depends on whether the query is doing an index scan / fetch. If
it's just doing a tablescan, you'll still need to do a sort.- Hide quotedtext -
>
- Show quoted text -

I'm totally confused -- doesn't a clustering index make sure the rows
are inserted as sorted and hence with high cluster ratio? And only
updates/deletes/inserts over time would reduce the cluster ratio? And
what I understood is that a truly clustered index would avoid a sort.

**Henry J.** · Oct 28 '08, 05:55 PM

Re: MDC vs. cluster index: which to use?

On Oct 21, 1:05 pm, "Henry J." <tank209...@yah oo.comwrote:

On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
>
>
>
>
>

On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:

>

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column for MDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

>

MDC still might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC + index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

>

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.

>

-Chris

>
Thanks Chris.
>
The primary key of the table is (type_id, product_id). Now if we add
a MDC on type_id, will the MDC be actually used if we query like the
following, or the primary key will be used only? I guess even if only
the primary key is used, the I/O will still be benefited.
>
select client.client_i d, sum( prod.quantity * prod.price *
client.ordersiz e )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_ id
group by client.client_i d
>
Also, would the following query benefit from both the MDC and the
primary key?
>
select *
from prod
where prod.type_id = 4 and prod.product_id = 101
>
Or, instead of the primary key, just use an index on (product_id)
would cover both of the above queries?
>
I'll certainly test to find out. But I'd like to gain better
understanding about indexing. Also, the turnaround of making such
changes are not as fast as I want and I hope to get to the best
solution a little faster. Thanks.- Hide quoted text -
>
- Show quoted text -

Now I've changed the primary key of the table to (type_id, product_id)
while making (type_id) as MDC.

When examing the plan, I found that the following query:

select client.client_i d, sum( prod.quantity * prod.price
* client.ordersiz e )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_ id
group by client.client_i d

does use the new primary key on (type_id, product_id). However, if I
use a bind variable for type_id, i.e.:

select client.client_i d, sum( prod.quantity * prod.price
* client.ordersiz e )
from prod, client
where prod.type_id = ? and prod.product_id =
client.product_ id
group by client.client_i d

It only uses the MDC on (type_id).

Now, my questions are:

1) should I drop the bind variable on type_id so it can utilize the
primary key? The cardinality of type_id is about 5 to 8.

2) Would an addition on (product_id) help?

Thanks.

**Henry J.** · Oct 28 '08, 07:55 PM

Re: MDC vs. cluster index: which to use?

On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:

On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:
>

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column forMDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

>
MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.
>
You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.
>
-Chris

I have the primary key changed to (type_id, product_id), and added a
MDC on type_id. Now the following query:

select client.client_i d, sum( prod.quantity * prod.price *
client.ordersiz e )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_ id
group by client.client_i d

it will utilize the primary key. However, if I use a host variable
for type_id, it will only pick up the MDC on type_id, which I think is
not optimal.

My questions are:

1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). Note that the
cardinality of type_id is pretty small, around 6.

2) should I drop the host variable on type_id? Note that each type_id
has the same number of product_id's. Perhaps this is not a good case
for host variable. BTW, this SQL is embedded in a Java (as static SQL
I think?).

Thanks!

Thanks!

**Henry J.** · Oct 28 '08, 08:05 PM

Re: MDC vs. cluster index: which to use?

On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:

On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:
>

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column forMDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

>
MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.
>
You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.
>
-Chris

I have the primary key changed to (type_id, product_id), and added a
MDC on type_id. Now the following query:

select client.client_i d, sum( prod.quantity * prod.price *
client.ordersiz e )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_ id
group by client.client_i d

will utilize the primary key. However, if I use a host variable for
type_id (i.e., type_id = ?), it will only pick up the MDC on type_id,
which I think is not optimal.

My questions are:

1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). Note that the
cardinality of type_id is pretty small, around 6.

2) should I drop the host variable on type_id? Note that each type_id
has the same number of product_id's. Perhaps this is not a good case
for host variable. BTW, this SQL is embedded in a Java (as static SQL
I think?).

Thanks!

**Saurabh.Nr@gmail.com** · Oct 29 '08, 02:05 PM

Re: MDC vs. cluster index: which to use?

On Oct 29, 12:56 am, "Henry J." <tank209...@yah oo.comwrote:

On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
>
>
>
>
>

On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:

>

For my case, the table is purged and re-populated every day. And the
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column forMDC. Then it won't be as helpful as
a cluster index on (type_id, product_id)?

>

MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

>

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.

>

-Chris

>
I have the primary key changed to (type_id, product_id), and added a
MDC on type_id. Now the following query:
>
select client.client_i d, sum( prod.quantity * prod.price *
client.ordersiz e )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_ id
group by client.client_i d
>
will utilize the primary key. However, if I use a host variable for
type_id (i.e., type_id = ?), it will only pick up the MDC on type_id,
which I think is not optimal.
>
My questions are:
>
1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). Note that the
cardinality of type_id is pretty small, around 6.
>
2) should I drop the host variable on type_id? Note that each type_id
has the same number of product_id's. Perhaps this is not a good case
for host variable. BTW, this SQL is embedded in a Java (as static SQL
I think?).
>
Thanks!- Hide quoted text -
>
- Show quoted text -

Hi,
there is a shell script know extent utilisation.sh which can be
downloaded from IBM website.it will answers all your queries regarding
which will be a better column and how much space will you save and how
will it improve the performance.

**Henry J.** · Oct 29 '08, 08:15 PM

Re: MDC vs. cluster index: which to use?

On Oct 29, 9:55 am, Saurabh...@gmai l.com wrote:

On Oct 29, 12:56 am, "HenryJ." <tank209...@yah oo.comwrote:
>
>
>
>
>

On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:

>

On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:

>

For my case, the table is purged and re-populated every day. Andthe
other column, product_id, has high cardinality.
Perhaps a cluster index makes more sense? I guess I'd choose type_id
if I have to pick one column forMDC. Then it won't be as helpfulas
a cluster index on (type_id, product_id)?

>

MDCstill might be faster - probably faster at loading and equivalent
for searches based on type_id. For searches on type_id and
product_id, it could outperform the cluster index, if you also create
an index on product_id - and the combination (MDC+ index on
product_id) will definitely outperform a cluster index on type_id,
product_id for product_id searches.

>

You'll want to verify these claims - but since you rebuild the table
every day, you are in a perfect position to try these various options
out. Pick a new one each day until you've found the right
combination.

>

-Chris

>

I have the primary key changed to (type_id, product_id), and added a
MDCon type_id. Now the following query:

>

select client.client_i d, sum( prod.quantity * prod.price*
client.ordersiz e )
from prod, client
where prod.type_id = 4 and prod.product_id =
client.product_ id
group by client.client_i d

>

will utilize the primary key. However, if I use a host variable for
type_id (i.e., type_id = ?), it will only pick up theMDCon type_id,
which I think is not optimal.

>

My questions are:

>

1) should I add an index on (product_id) which is what's really needed
rather than the primary key on (type_id, product_id). Note that the
cardinality of type_id is pretty small, around 6.

>

2) should I drop the host variable on type_id? Note that each type_id
has the same number of product_id's. Perhaps this is not a good case
for host variable. BTW, this SQL is embedded in a Java (as static SQL
I think?).

>

Thanks!- Hide quoted text -

>

- Show quoted text -

>
Hi,
there is a shell script know extent utilisation.sh which can be
downloaded from IBM website.it will answers all your queries regarding
which will be a better column and how much space will you save and how
will it improve the performance.- Hide quoted text -
>
- Show quoted text -

Could you please give me the link? I couldn't find it. Thanks.

Re: MDC vs. cluster index: which to use?

Re: MDC vs. cluster index: which to use?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment