optimization challenge

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • aloha.kakuikanu

    optimization challenge

    create table T (
    x integer,
    y integer,
    );

    create index Ti on T(x,y);

    insert into t
    select i/1000 as x, mod(i,1000) as y
    from Integers where i < 1000000

    select * from T
    where x = 1 and y = x+1

    What is the best access path? How do you influence it if optimizerd
    didn't find it by default?

  • Lennart

    #2
    Re: optimization challenge


    aloha.kakuikanu wrote:
    create table T (
    x integer,
    y integer,
    );
    >
    create index Ti on T(x,y);
    >
    insert into t
    select i/1000 as x, mod(i,1000) as y
    from Integers where i < 1000000
    >
    select * from T
    where x = 1 and y = x+1
    >
    What is the best access path? How do you influence it if optimizerd
    didn't find it by default?
    I'm not sure I understand the question. The optimizer rewrites this to:

    select * from T where x = 1 and y = 2

    and does and index scan. Are you hoping for something better than this?
    >>>>
    Optimized Statement:
    -------------------
    SELECT 1 AS "X", Q1.Y AS "Y"
    FROM LTJN.T AS Q1
    WHERE (Q1.Y = 2) AND (Q1.X = 1)

    Access Plan:
    -----------
    Total Cost: 25.7351
    Query Degree: 1

    Rows
    RETURN
    ( 1)
    Cost
    I/O
    |
    1
    IXSCAN
    ( 2)
    25.7351
    2
    |
    1e+06
    >>>>
    drop table T;
    create table T (
    x integer,
    y integer
    );

    create index Ti on T(x,y);

    insert into t(x,y)
    with integers(i) as (
    values 0
    union all
    select i+1 from integers
    where i<1000000
    )
    select i/1000 as x, mod(i,1000) as y
    from Integers where i < 1000000;

    runstats on table ltjn.t AND SAMPLED DETAILED INDEXES ALL;

    Comment

    • Knut Stolze

      #3
      Re: optimization challenge

      aloha.kakuikanu wrote:
      create table T (
      x integer,
      y integer,
      );
      >
      create index Ti on T(x,y);
      >
      insert into t
      select i/1000 as x, mod(i,1000) as y
      from Integers where i < 1000000
      >
      select * from T
      where x = 1 and y = x+1
      >
      What is the best access path?
      That depends. Without knowing more details of the whole environment, I'd
      say that an index-only scan would suffice. I.e. the query could be
      answered by the index alone and no table lookup is necessary.
      How do you influence it if optimizerd
      didn't find it by default?
      Have you collected statistics on the table and indexes? Followed the usual
      performance analysis and tuning steps?

      If the plan is still bad, you could ask here with the details on your access
      plan and the usual environment-specific details. Other than that, a bad
      plan is an optimizer problem and not a user problem. So if everything else
      is in order, I would suggest to open a PMR.

      --
      Knut Stolze
      DB2 Information Integration Development
      IBM Germany

      Comment

      • aloha.kakuikanu

        #4
        Re: optimization challenge


        Lennart wrote:
        aloha.kakuikanu wrote:
        create table T (
        x integer,
        y integer,
        );

        create index Ti on T(x,y);

        insert into t
        select i/1000 as x, mod(i,1000) as y
        from Integers where i < 1000000

        select * from T
        where x = 1 and y = x+1

        What is the best access path? How do you influence it if optimizerd
        didn't find it by default?
        >
        I'm not sure I understand the question. The optimizer rewrites this to:
        >
        select * from T where x = 1 and y = 2
        >
        and does and index scan. Are you hoping for something better than this?
        >
        >>>
        OK, I greatly underestimated DB2 optimizer capabilities. How about
        wrapping x+1 into a function f and changing the predicate y = x+1 into
        y = f(x)? Still rewrites?

        Comment

        • Lennart

          #5
          Re: optimization challenge


          aloha.kakuikanu wrote:
          >
          OK, I greatly underestimated DB2 optimizer capabilities. How about
          wrapping x+1 into a function f and changing the predicate y = x+1 into
          y = f(x)? Still rewrites?
          Guess it depends on how complicated f is. Something simple like:

          create function my_f2 (x int)
          returns int
          language sql
          contains sql
          deterministic
          return case when mod(x,2)=0 then x*x+3 else -1*x end;

          works well

          select *
          from T
          where x = 1 and y = my_f2(x)

          Optimized Statement:
          -------------------
          SELECT 1 AS "X", Q2.$C0 AS "Y"
          FROM
          (SELECT Q1.Y
          FROM LTJN.T AS Q1
          WHERE (Q1.X = 1) AND (Q1.Y =
          CASE
          WHEN ("SYSFUN "."MOD"(1, 2) = 0)
          THEN 4
          ELSE -1 END )) AS Q2

          Access Plan:
          -----------
          Total Cost: 25.7351
          Query Degree: 1

          Rows
          RETURN
          ( 1)
          Cost
          I/O
          |
          1
          IXSCAN
          :Executing Connect Reset -- Connect Reset was Successful.
          ( 2)
          25.7351
          2
          |
          1e+06
          INDEX: LTJN
          TI

          But I guess that it is rather easy to construct a function where it
          would fail. Perhaps it could be the next challenge in this thread :-)

          I also assume that the optimizer will have a much tougher job if we are
          allowed to vary x in the query


          /Lennart

          Comment

          • aloha.kakuikanu

            #6
            Re: optimization challenge


            Lennart wrote:
            aloha.kakuikanu wrote:
            >

            OK, I greatly underestimated DB2 optimizer capabilities. How about
            wrapping x+1 into a function f and changing the predicate y = x+1 into
            y = f(x)? Still rewrites?
            >
            Guess it depends on how complicated f is. Something simple like:
            >
            create function my_f2 (x int)
            returns int
            language sql
            contains sql
            deterministic
            return case when mod(x,2)=0 then x*x+3 else -1*x end;
            >
            works well
            Wow, that is still pretty impressive -- DB2 optimizer indeed kicks a**!

            Comment

            • Lennart

              #7
              Re: optimization challenge


              aloha.kakuikanu wrote:
              [...]
              Wow, that is still pretty impressive -- DB2 optimizer indeed kicks a**!
              Sssch, not so loud. Serge might here you, think his work is done, and
              start doing something else ;-)

              /Lennart

              Comment

              • aloha.kakuikanu

                #8
                Re: optimization challenge

                Just to doblecheck. It also does this rewrite when x is supplied as
                bind variable, right?

                select x,y from T
                where x = :1 and y = x+1

                Comment

                • Lennart

                  #9
                  Re: optimization challenge


                  aloha.kakuikanu wrote:
                  Just to doblecheck. It also does this rewrite when x is supplied as
                  bind variable, right?
                  >
                  select x,y from T
                  where x = :1 and y = x+1
                  Sure:

                  Original Statement:
                  ------------------
                  select *
                  from T
                  where x = ? and y = my_f2(x+1)


                  Optimized Statement:
                  -------------------
                  SELECT :? AS "X", Q2.$C0 AS "Y"
                  FROM
                  (SELECT Q1.Y
                  FROM LTJN.T AS Q1
                  WHERE (Q1.X = :?) AND (Q1.Y =
                  CASE
                  WHEN ("SYSFUN "."MOD"($C1 , 2) = 0)
                  THEN $C2
                  ELSE $C3 END )) AS Q2

                  Access Plan:
                  -----------
                  Total Cost: 25.7351
                  Query Degree: 1

                  Rows
                  RETURN
                  ( 1)
                  Cost
                  I/O
                  |
                  1
                  IXSCAN
                  ( 2)
                  25.7351
                  2
                  |
                  1e+06
                  INDEX: LTJN
                  TI

                  Comment

                  Working...