Question about MIN/MAX optimization

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michel Esber

    Question about MIN/MAX optimization

    Hi all,

    Db2 v8 FP15 LUW .

    create table T (ID varchar (24), ABC timestamp)

    There is an index for (ID, ABC), allowing reverse Scans.

    My application needs to determine MIN and MAX(ABC) for a given ID. We
    are currently using a simple statement:

    select MIN(abc), MAX(abc) from T where ID = ? for read only

    Table T has 100+ million rows, and several other applications are
    reading/deleting data from it. The statement above runs with UR
    isolation, however it takes a very long time to complete (5-10
    minutes, or more).

    I have studied the access plan, and it looks OK:

    Access Plan:
    -----------

    Total Cost: 25.6855
    Query Degree: 1

    Rows
    RETURN
    ( 1)
    Cost
    I/O
    |
    1
    NLJOIN
    ( 2)
    25.6855
    3.99557
    /-----+-----\
    1 1
    GRPBY GRPBY
    ( 3) ( 5)
    12.8262 12.8587
    1.99557 2
    | |
    46.1442 46.1442
    IXSCAN IXSCAN
    ( 4) ( 6)
    12.8544 12.8544
    2 2
    | |
    973678 973678
    INDEX: RTM INDEX: RTM
    IPSSTAT_WIN IPSSTAT_WIN


    I am looking for some magic SQL or hint that will allow me to improve
    this rather 'simple' query.

    PS: Yes, table and indexes do have updated statistics.

    Thanks in advance,

    -Michel
  • Michel Esber

    #2
    Re: Question about MIN/MAX optimization

    Hi.
    >
    Try this. It will probably generate one indexs can instead of two.
    >
    with temp (abc) as
        ( select abc from T were id  = ?)
    select  max (abc), min(abc) from temp
    >
    /dg

    DG, DB2 optimized the original statement into two index scans.

    Original Statement:
    ------------------
    with temp (COLLECT_TIME) as
    (select COLLECT_TIME
    from RTM.TBL_COLLECT _PSSTAT_WIN_RTM
    where MACHINE_ID= ? and COLLECT_TIME ?)
    select MIN(COLLECT_TIM E), MAX(COLLECT_TIM E)
    from temp


    Optimized Statement:
    -------------------
    SELECT Q6.$C0, Q3.$C0
    FROM
    (SELECT MAX(Q2.$C0)
    FROM
    (SELECT Q1.COLLECT_TIME
    FROM RTM.TBL_COLLECT _PSSTAT_WIN_RTM AS Q1
    WHERE (:? < Q1.COLLECT_TIME ) AND (Q1.MACHINE_ID = :?)) AS Q2) AS
    Q3,
    (SELECT MIN(Q5.$C0)
    FROM
    (SELECT Q4.COLLECT_TIME
    FROM RTM.TBL_COLLECT _PSSTAT_WIN_RTM AS Q4
    WHERE (:? < Q4.COLLECT_TIME ) AND (Q4.MACHINE_ID = :?)) AS Q5) AS
    Q6

    Access Plan:
    -----------
    Total Cost: 51.2774
    Query Degree: 1

    Rows
    RETURN
    ( 1)
    Cost
    I/O
    |
    1
    NLJOIN
    ( 2)
    51.2774
    7.9774
    /-----+-----\
    1 1
    GRPBY GRPBY
    ( 3) ( 5)
    25.6383 25.6383
    3.98869 3.98871
    | |
    448.726 448.726
    IXSCAN IXSCAN
    ( 4) ( 6)
    87.5883 87.5879
    13.6267 13.6267
    | |
    1.48278e+06 1.48278e+06
    INDEX: RTM INDEX: RTM
    IPSSTAT_WIN IPSSTAT_WIN

    The overall cost is higher than the original statement :(

    Thanks

    Comment

    • Michel Esber

      #3
      Re: Question about MIN/MAX optimization

      This is the optimal plan we want. Can you reproduce it on DB2 V8 with
      this DDL and DML?

      Here is the plan with MIN/MAX:

      Database Context:
      ----------------
      Parallelism: None
      CPU Speed: 3.778754e-07
      Comm Speed: 0
      Buffer Pool size: 165240
      Sort Heap size: 1024
      Database Heap size: 1024
      Lock List size: 100
      Maximum Lock List: 10
      Average Applications: 1
      Locks Available: 1020

      Package Context:
      ---------------
      SQL Type: Dynamic
      Optimization Level: 5
      Blocking: Block All Cursors
      Isolation Level: Cursor Stability



      ---------------- STATEMENT 1 SECTION 203 ----------------
      QUERYNO: 1
      QUERYTAG:
      Statement Type: Select
      Updatable: No
      Deletable: No
      Query Degree: 1

      Original Statement:
      ------------------
      select MIN(ABC), MAX(ABC)
      from T
      where ID=?


      Optimized Statement:
      -------------------
      SELECT Q6.$C0, Q3.$C0
      FROM
      (SELECT MAX(Q2.$C0)
      FROM
      (SELECT Q1.ABC
      FROM DB2INST1.T AS Q1
      WHERE (Q1.ID = :?)) AS Q2) AS Q3,
      (SELECT MIN(Q5.$C0)
      FROM
      (SELECT Q4.ABC
      FROM DB2INST1.T AS Q4
      WHERE (Q4.ID = :?)) AS Q5) AS Q6

      Access Plan:
      -----------
      Total Cost: 12.8453
      Query Degree: 1

      Rows
      RETURN
      ( 1)
      Cost
      I/O
      |
      1
      NLJOIN
      ( 2)
      12.8453
      2
      /-----+-----\
      1 1
      GRPBY GRPBY
      ( 3) ( 5)
      6.42335 6.42127
      1 1
      | |
      2.92 2.92
      IXSCAN IXSCAN
      ( 4) ( 6)
      6.42298 6.42298
      1 1
      | |
      73 73
      INDEX: DB2INST1 INDEX: DB2INST1
      I I




      Extended Diagnostic Information:
      --------------------------------

      Diagnostic Identifier: 1
      Diagnostic Details: EXP0022W Index has no statistics. The index
      "DB2INST1". "I" has not had runstats run on it.
      This
      can lead to poor cardinality and predicate
      filtering estimates.

      Plan Details:
      -------------


      1) RETURN: (Return Result)
      Cumulative Total Cost: 12.8453
      Cumulative CPU Cost: 124849
      Cumulative I/O Cost: 2
      Cumulative Re-Total Cost: 0.0064907
      Cumulative Re-CPU Cost: 17176.8
      Cumulative Re-I/O Cost: 0
      Cumulative First Row Cost: 12.8441
      Estimated Bufferpool Buffers: 3

      Arguments:
      ---------
      BLDLEVEL: (Build level)
      DB2 v8.1.2.136 : special_19546
      HEAPUSE : (Maximum Statement Heap Usage)
      56 Pages
      STMTHEAP: (Statement heap size)
      4096

      Input Streams:
      -------------
      7) From Operator #2

      Estimated number of rows: 1
      Number of columns: 2
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q7.$C1+Q7.$C0


      2) NLJOIN: (Nested Loop Join)
      Cumulative Total Cost: 12.8453
      Cumulative CPU Cost: 124849
      Cumulative I/O Cost: 2
      Cumulative Re-Total Cost: 0.0064907
      Cumulative Re-CPU Cost: 17176.8
      Cumulative Re-I/O Cost: 0
      Cumulative First Row Cost: 12.8441
      Estimated Bufferpool Buffers: 3

      Arguments:
      ---------
      EARLYOUT: (Early Out flag)
      NONE
      FETCHMAX: (Override for FETCH MAXPAGES)
      IGNORE
      ISCANMAX: (Override for ISCAN MAXPAGES)
      IGNORE

      Input Streams:
      -------------
      3) From Operator #3

      Estimated number of rows: 1
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q3.$C0

      6) From Operator #5

      Estimated number of rows: 1
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q6.$C0


      Output Streams:
      --------------
      7) To Operator #1

      Estimated number of rows: 1
      Number of columns: 2
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q7.$C1+Q7.$C0


      3) GRPBY : (Group By)
      Cumulative Total Cost: 6.42335
      Cumulative CPU Cost: 61783.5
      Cumulative I/O Cost: 1
      Cumulative Re-Total Cost: 0.00503426
      Cumulative Re-CPU Cost: 13322.5
      Cumulative Re-I/O Cost: 0
      Cumulative First Row Cost: 6.4226
      Estimated Bufferpool Buffers: 2

      Arguments:
      ---------
      AGGMODE : (Aggregration Mode)
      COMPLETE
      GROUPBYC: (Group By columns)
      FALSE
      GROUPBYN: (Number of Group By columns)
      0
      JN INPUT: (Join input leg)
      OUTER
      ONEFETCH: (One Fetch flag)
      FALSE

      Input Streams:
      -------------
      2) From Operator #4

      Estimated number of rows: 2.92
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q2.$C0


      Output Streams:
      --------------
      3) To Operator #2

      Estimated number of rows: 1
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q3.$C0


      4) IXSCAN: (Index Scan)
      Cumulative Total Cost: 6.42298
      Cumulative CPU Cost: 60803.5
      Cumulative I/O Cost: 1
      Cumulative Re-Total Cost: 0.00466394
      Cumulative Re-CPU Cost: 12342.5
      Cumulative Re-I/O Cost: 0
      Cumulative First Row Cost: 6.42108
      Estimated Bufferpool Buffers: 2

      Arguments:
      ---------
      MAXPAGES: (Maximum pages for prefetch)
      1
      PREFETCH: (Type of Prefetch)
      NONE
      ROWLOCK : (Row Lock intent)
      NEXT KEY SHARE
      SCANDIR : (Scan Direction)
      FORWARD
      TABLOCK : (Table Lock intent)
      INTENT SHARE

      Predicates:
      ----------
      2) Start Key Predicate
      Relational Operator: Equal (=)
      Subquery Input Required: No
      Filter Factor: 0.04

      Predicate Text:
      --------------
      (Q1.ID = :?)

      2) Stop Key Predicate
      Relational Operator: Equal (=)
      Subquery Input Required: No
      Filter Factor: 0.04

      Predicate Text:
      --------------
      (Q1.ID = :?)


      Input Streams:
      -------------
      1) From Object DB2INST1.I

      Estimated number of rows: 73
      Number of columns: 3
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q1.$RID$+Q1.ID +Q1.ABC



      Output Streams:
      --------------
      2) To Operator #3

      Estimated number of rows: 2.92
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q2.$C0


      5) GRPBY : (Group By)
      Cumulative Total Cost: 6.42127
      Cumulative CPU Cost: 61285.6
      Cumulative I/O Cost: 1
      Cumulative Re-Total Cost: 0.000783828
      Cumulative Re-CPU Cost: 2074.3
      Cumulative Re-I/O Cost: 0
      Cumulative First Row Cost: 6.42117
      Estimated Bufferpool Buffers: 2

      Arguments:
      ---------
      AGGMODE : (Aggregration Mode)
      COMPLETE
      GROUPBYC: (Group By columns)
      FALSE
      GROUPBYN: (Number of Group By columns)
      0
      JN INPUT: (Join input leg)
      INNER
      ONEFETCH: (One Fetch flag)
      TRUE

      Input Streams:
      -------------
      5) From Operator #6

      Estimated number of rows: 2.92
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q5.$C0(A)


      Output Streams:
      --------------
      6) To Operator #2

      Estimated number of rows: 1
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q6.$C0


      6) IXSCAN: (Index Scan)
      Cumulative Total Cost: 6.42298
      Cumulative CPU Cost: 60803.5
      Cumulative I/O Cost: 1
      Cumulative Re-Total Cost: 0.00466394
      Cumulative Re-CPU Cost: 12342.5
      Cumulative Re-I/O Cost: 0
      Cumulative First Row Cost: 6.42108
      Estimated Bufferpool Buffers: 2

      Arguments:
      ---------
      MAXPAGES: (Maximum pages for prefetch)
      1
      PREFETCH: (Type of Prefetch)
      NONE
      ROWLOCK : (Row Lock intent)
      NEXT KEY SHARE
      SCANDIR : (Scan Direction)
      FORWARD
      TABLOCK : (Table Lock intent)
      INTENT SHARE

      Predicates:
      ----------
      3) Start Key Predicate
      Relational Operator: Equal (=)
      Subquery Input Required: No
      Filter Factor: 0.04

      Predicate Text:
      --------------
      (Q4.ID = :?)

      3) Stop Key Predicate
      Relational Operator: Equal (=)
      Subquery Input Required: No
      Filter Factor: 0.04

      Predicate Text:
      --------------
      (Q4.ID = :?)


      Input Streams:
      -------------
      4) From Object DB2INST1.I

      Estimated number of rows: 73
      Number of columns: 3
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q4.ABC(A)+Q4.$ RID$+Q4.ID


      Output Streams:
      --------------
      5) To Operator #5

      Estimated number of rows: 2.92
      Number of columns: 1
      Subquery predicate ID: Not
      Applicable

      Column Names:
      ------------
      +Q5.$C0(A)


      Objects Used in Access Plan:
      ---------------------------

      Schema: DB2INST1
      Name: T
      Type: Table (reference only)

      Schema: DB2INST1
      Name: I
      Type: Index
      Time of creation:
      2008-04-26-12.06.48.096459
      Last statistics update:
      Number of columns: 2
      Number of rows: 73
      Width of rows: -1
      Number of buffer pool pages: 1
      Distinct row values: No
      Tablespace name:
      IOSTATDATIDX
      Tablespace overhead: 6.000000
      Tablespace transfer rate: 0.400000
      Source for statistics: Single Node
      Prefetch page count: 128
      Container extent page count: 32
      Index clustering statistic: 80.000000
      Index leaf pages: 2
      Index tree levels: 2
      Index full key cardinality: 25
      Index first key cardinality: 25
      Index first 2 keys cardinality: -1
      Index first 3 keys cardinality: -1
      Index first 4 keys cardinality: -1
      Index sequential pages: 2
      Index page density: 100
      Index avg sequential pages: -1
      Index avg gap between sequences:-1
      Index avg random pages: -1
      Fetch avg sequential pages: -1
      Fetch avg gap between sequences:-1
      Fetch avg random pages: -1
      Index RID count: 0
      Index deleted RID count: 0
      Index empty leaf pages: 0
      Base Table Schema: DB2INST1
      Base Table Name: T
      Columns in index:
      ID
      ABC

      Base Table For Index Not Already Shown:
      ---------------------------------------

      Schema: DB2INST1
      Name: T
      Time of creation:
      2008-04-26-12.06.12.993169
      Last statistics update:
      Number of columns: 2
      Number of rows: -1
      Number of pages: -1
      Number of pages with rows: -1
      Tablespace name: IOSTATDATIDX
      Tablespace overhead: 6.000000
      Tablespace transfer rate: 0.400000
      Prefetch page count: 128
      Container extent page count: 32
      Table overflow record count: -1



      Both plans are very similar. In terms of performance (CPU/Disk IO),
      how do they compare ?

      Thanks,

      Comment

      • Serge Rielau

        #4
        Re: Question about MIN/MAX optimization

        Michel Esber wrote:
        >This is the optimal plan we want. Can you reproduce it on DB2 V8 with
        >this DDL and DML?
        >
        Here is a v8 plan using your solution. I will post another message
        with the plan for MIN/MAX and group by.
        Total Cost: 12.846
        Query Degree: 1
        >
        Rows
        RETURN
        ( 1)
        Cost
        I/O
        |
        1
        NLJOIN
        ( 2)
        12.846
        2
        /-----+-----\
        1 1
        IXSCAN IXSCAN
        ( 3) ( 4)
        6.42298 6.42298
        1 1
        | |
        73 73
        INDEX: DB2INST1 INDEX: DB2INST1
        I I
        OK, so now the question is where is the difference between your original
        scenario (not the best plan) and mine (best plan).
        I recommend morphing it step by step and see where you loose your way.
        Obviosuly the first step is to add data and statistics.

        Cheers
        Serge

        --
        Serge Rielau
        DB2 Solutions Development
        IBM Toronto Lab

        Comment

        Working...