Avoiding Table Scan on Update with correlated subselect

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mike L. Bell

    Avoiding Table Scan on Update with correlated subselect

    Query:

    update table1 t1
    set end_time = (
    select end_time
    from table2 t2
    where t2.key1 = t1.key1
    and t2.key2 = t1.key2
    )
    where exists
    (
    select 1
    from table2 t2
    where t2.key1 = t1.key1
    and t2.key2 = t2.key2
    )


    T1 and T2 share the same primary key. T2 is a volatile table that has
    zero or hundreds of rows in it. T1 is large table with millions of
    rows.

    DB2 explain facility shows table scan on large table. I need this to
    be fast to avoid concurrency/locking issues. I'm using the full
    primary key in the first correlated subselect, and in the "exists"
    clause, and am confused why the optimizer would choose to do a table
    scan.

    Is there a better way to write this to avoid table scans?

    Platform is 7.2 Fixpack 11.

    Thanks,
    Mike
  • Philip Sherman

    #2
    Re: Avoiding Table Scan on Update with correlated subselect

    Greatest concurrency occurs with minimal locking. Locking is minimized
    when only a single row at a time is locked. Since you are updating only
    "a few hundred rows" and T2 is "a volatile table" I'd assume that you
    will be deleting the contents of T2 when finished and that T2 can't be
    updated during your processing. You don't state where you are interested
    on minimizing locking - T1 or T2.

    A stored procedure with the following logic will do this.

    1. Construct a parameterized statement to update T1.
    2. Construct a cursor to read T2 - WITH HOLD clause will be needed.
    3. Lock T2.
    4. For each row read from T2
    5. Update T1
    6. Commit T1
    5. Delete rows from T2.

    The updates will be done using the primary key which should lock a
    single row of T1. Commits will release the locks immediately after the
    update, minimizing T1's locked time. The lock on T2 will prevent updates
    during your use of the table. Taking a commit after each update of T1 is
    not the best performer but does maximize concurrency on T1. A compromise
    of locking and performance is to maintain a rows updated counter and
    commit after every n updates.

    Phil Sherman


    Mike L. Bell wrote:[color=blue]
    > Query:
    >
    > update table1 t1
    > set end_time = (
    > select end_time
    > from table2 t2
    > where t2.key1 = t1.key1
    > and t2.key2 = t1.key2
    > )
    > where exists
    > (
    > select 1
    > from table2 t2
    > where t2.key1 = t1.key1
    > and t2.key2 = t2.key2
    > )
    >
    >
    > T1 and T2 share the same primary key. T2 is a volatile table that has
    > zero or hundreds of rows in it. T1 is large table with millions of
    > rows.
    >
    > DB2 explain facility shows table scan on large table. I need this to
    > be fast to avoid concurrency/locking issues. I'm using the full
    > primary key in the first correlated subselect, and in the "exists"
    > clause, and am confused why the optimizer would choose to do a table
    > scan.
    >
    > Is there a better way to write this to avoid table scans?
    >
    > Platform is 7.2 Fixpack 11.
    >
    > Thanks,
    > Mike[/color]

    Comment

    Working...