Distributed statistics and optimizer

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • P. Adhia

    Distributed statistics and optimizer

    Hi,

    Consider the following query

    select *
    from t1
    where c1 = ?
    and c2 = ?


    table t1 has a high number of rows where c1 is null, but rows where c1
    is not null, the filter factor is relatively low. The table has
    distributed statistics collected on table and key columns.

    Now, if I have an index i1 on t1(c1), access path produced by db2exfmt
    shows the correct filter factor for c1 = ? predicate, presumably
    because optimizer knows the fact that predicate c1 = ? cannot be
    satisfied by rows which have null values for c1.

    However, if I have an index i2 on t1(c1,c2), the explain shows a very
    high filter factor for the same predicate. I can tell that it was
    derived from fullkeycard of index i2, which includes the null values.
    Optimizer completely ignores the fact that predicate c1 = ? cannot
    select null values.

    The actual query is a little bit complex, involving join of table t1
    with 5 other tables, but the incorrect filter factor calculations
    causes optimizer to select wrong table as the outer table in case
    index i2 exists.

    Has any one encountered similar situation before? Any suggestions?

    The DB2 manual says runstats command doesn't support collecting
    distributed statistics on a column group, but DB2 catalog has
    syscat.colgroup % tables, so I am confused, is it the runstats command
    limitation or the optimizer limitation? If it is just runstats
    limitation, any external ways to populate distributed column group
    statistics?

    TIA

    P. Adhia
Working...