why this 2-column index not used

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Henry J.

    why this 2-column index not used

    I have a composite index on two columns in a table. However, the
    index is not used in a query that restricts the 2nd column to a
    constant. If both columns are linked with columns in other join
    tables, the index will be used.

    To illustrate it with an example, I have a query like this:

    select s.ticker, p.quantity
    from stock s, positions p
    where s.type_id = 4
    and s.stock_id = p.stock_id

    There is an index on stock: (stock_id, type_id). In the table stocks,
    there are about 100 different type_ids evenly distributed, and ten of
    thousand different stock_ids (each stock_id could map to every one of
    the 100 type_ids).

    From the plan, the above query does not use the index, but the
    following query does use the index:

    select s.ticker, p.quantity
    from stock s, positions p
    where s.type_id = p.pos_type_id
    and s.stock_id = p.stock_id

    The only difference is that in the first query, type_id is constant 4,
    while in the second it is linked with another column in the second
    table.

    The stats are good. Is there anything else that might have caused the
    above?

  • Henry J.

    #2
    Re: why this 2-column index not used

    Even if I add another index stock (stock_id), the first query still
    doesn't use any index.

    However, if the query is like this:

    select s.ticker, p.quantity
    from stock s, positions p
    where s.stock_id = p.stock_id

    The index on stock (stock) will be used (of course). If the query is
    like this:

    select s.ticker, p.quantity
    from stock s, positions p
    where s.stock_id = 101
    s.type_id = 4

    then the index on stock (stock_id, type_id) is used.

    BTW, there is a foreign key in stock that links the stock_id field to
    the positions (stock_id).

    Anybody can explain the above?


    Comment

    • Mark A

      #3
      Re: why this 2-column index not used

      "Henry J." <tank209209@yah oo.comwrote in message
      news:02e9b6d6-229f-459a-b8f7-0097b19ba2b1@m4 5g2000hsb.googl egroups.com...
      >I have a composite index on two columns in a table. However, the
      index is not used in a query that restricts the 2nd column to a
      constant. If both columns are linked with columns in other join
      tables, the index will be used.
      >
      To illustrate it with an example, I have a query like this:
      >
      select s.ticker, p.quantity
      from stock s, positions p
      where s.type_id = 4
      and s.stock_id = p.stock_id
      >
      There is an index on stock: (stock_id, type_id). In the table stocks,
      there are about 100 different type_ids evenly distributed, and ten of
      thousand different stock_ids (each stock_id could map to every one of
      the 100 type_ids).
      >
      From the plan, the above query does not use the index, but the
      following query does use the index:
      >
      select s.ticker, p.quantity
      from stock s, positions p
      where s.type_id = p.pos_type_id
      and s.stock_id = p.stock_id
      >
      The only difference is that in the first query, type_id is constant 4,
      while in the second it is linked with another column in the second
      table.
      >
      The stats are good. Is there anything else that might have caused the
      above?
      You are confusing the use of an index with automatically getting better
      performance than not using an index. Sometimes using an index will result in
      better performance, and sometimes not. If DB2 determines that it is faster
      to not use an index, then it will not use one. This typically happens when
      DB2 will have read every data page (rows are stored in pages) to satisfy the
      query.

      Keep in mind that you are returning all rows from the table (you don't
      qualify which type_id or which stock_id's you want), so it is safe to assume
      that DB2 will have to read not just every page, but in fact every row, to
      satisfy the query.

      Assuming you did a runstats as follows:
      runstats on table <table-namewith distribution on key columns and indexes
      all
      then DB2 almost always makes the correct decision.


      Comment

      • Henry J.

        #4
        Re: why this 2-column index not used

        On Sep 13, 12:34 am, "Mark A" <some...@someon e.comwrote:
        "Henry J." <tank209...@yah oo.comwrote in message
        >
        news:02e9b6d6-229f-459a-b8f7-0097b19ba2b1@m4 5g2000hsb.googl egroups.com...
        >
        >
        >
        I have a composite index on two columns in a table.  However, the
        index is not used in a query that restricts the 2nd column to a
        constant.  If both columns are linked with columns in other join
        tables, the index will be used.
        >
        To illustrate it with an example, I have a query like this:
        >
        select s.ticker, p.quantity
        from stock s, positions p
             where s.type_id = 4
                      and s.stock_id = p.stock_id
        >
        There is an index on stock: (stock_id, type_id).  In the table stocks,
        there are about 100 different type_ids evenly distributed, and ten of
        thousand different stock_ids (each stock_id could map to every one of
        the 100 type_ids).
        >
        From the plan, the above query does not use the index, but the
        following query does use the index:
        >
        select s.ticker, p.quantity
        from stock s, positions p
             where s.type_id = p.pos_type_id
                      and s.stock_id = p.stock_id
        >
        The only difference is that in the first query, type_id is constant 4,
        while in the second it is linked with another column in the second
        table.
        >
        The stats are good.  Is there anything else that might have caused the
        above?
        >
        You are confusing the use of an index with automatically getting better
        performance than not using an index. Sometimes using an index will resultin
        better performance, and sometimes not. If DB2 determines that it is faster
        to not use an index, then it will not use one. This typically happens when
        DB2 will have read every data page (rows are stored in pages) to satisfy the
        query.
        >
        Keep in mind that you are returning all rows from the table (you don't
        qualify which type_id or which stock_id's you want), so it is safe to assume
        that DB2 will have to read not just every page, but in fact every row, to
        satisfy the query.
        >
        Assuming you did a runstats as follows:
        runstats on table <table-namewith distribution on key columns and indexes
        all
        then DB2 almost always makes the correct decision.
        My first query is not to return all data? It's only returning on
        s.type_id = 4.

        Actually my questions arised when I changed the indexes on the table
        stock and the same queries run more than twice as long as before.
        Then I found the main index is not used at all. Since the tables are
        huge (40 mil rows), I don't see why DB2 decides it's faster not to use
        the index.

        Maybe I should create an index on stock (type_id)? Actually it
        eliminates a table scan on stock. But why it is not using (stock_id,
        type_id) even though both columns are used in where clause.

        I know I must have some mis-conceptions. I just don't know which ones.

        Comment

        • yongleig@gmail.com

          #5
          Re: why this 2-column index not used

          On Sep 13, 10:07 am, "Henry J." <tank209...@yah oo.comwrote:
          On Sep 13, 12:34 am, "Mark A" <some...@someon e.comwrote:
          >
          >
          >
          "Henry J." <tank209...@yah oo.comwrote in message
          >
          news:02e9b6d6-229f-459a-b8f7-0097b19ba2b1@m4 5g2000hsb.googl egroups.com....
          >
          >I have a composite index on two columns in a table.  However, the
          index is not used in a query that restricts the 2nd column to a
          constant.  If both columns are linked with columns in other join
          tables, the index will be used.
          >
          To illustrate it with an example, I have a query like this:
          >
          select s.ticker, p.quantity
          from stock s, positions p
               where s.type_id = 4
                        and s.stock_id = p.stock_id
          >
          There is an index on stock: (stock_id, type_id).  In the table stocks,
          there are about 100 different type_ids evenly distributed, and ten of
          thousand different stock_ids (each stock_id could map to every one of
          the 100 type_ids).
          >
          From the plan, the above query does not use the index, but the
          following query does use the index:
          >
          select s.ticker, p.quantity
          from stock s, positions p
               where s.type_id = p.pos_type_id
                        and s.stock_id = p.stock_id
          >
          The only difference is that in the first query, type_id is constant 4,
          while in the second it is linked with another column in the second
          table.
          >
          The stats are good.  Is there anything else that might have caused the
          above?
          >
          You are confusing the use of an index with automatically getting better
          performance than not using an index. Sometimes using an index will result in
          better performance, and sometimes not. If DB2 determines that it is faster
          to not use an index, then it will not use one. This typically happens when
          DB2 will have read every data page (rows are stored in pages) to satisfy the
          query.
          >
          Keep in mind that you are returning all rows from the table (you don't
          qualify which type_id or which stock_id's you want), so it is safe to assume
          that DB2 will have to read not just every page, but in fact every row, to
          satisfy the query.
          >
          Assuming you did a runstats as follows:
          runstats on table <table-namewith distribution on key columns and indexes
          all
          then DB2 almost always makes the correct decision.
          >
          My first query is not to return all data?  It's only returning on
          s.type_id = 4.
          >
          Actually my questions arised when I changed the indexes on the table
          stock and the same queries run more than twice as long as before.
          Then I found the main index is not used at all.   Since the tables are
          huge (40 mil rows), I don't see why DB2 decides it's faster not to use
          the index.
          >
          Maybe I should create an index on stock (type_id)?  Actually it
          eliminates a table scan on stock.   But why it is not using (stock_id,
          type_id) even though both columns are used in where clause.
          >
          I know I must have some mis-conceptions.  I just don't know which ones.
          You should create index (type_id, stock_id) on stock.

          Yonglei

          Comment

          • Henry J.

            #6
            Re: why this 2-column index not used

            On Sep 14, 10:51 am, yongl...@gmail. com wrote:
            On Sep 13, 10:07 am, "Henry J." <tank209...@yah oo.comwrote:
            >
            >
            >
            On Sep 13, 12:34 am, "Mark A" <some...@someon e.comwrote:
            >
            "Henry J." <tank209...@yah oo.comwrote in message
            >
            >news:02e9b6d 6-229f-459a-b8f7-0097b19ba2b1@m4 5g2000hsb.googl egroups.com....
            >
            I have a composite index on two columns in a table.  However, the
            index is not used in a query that restricts the 2nd column to a
            constant.  If both columns are linked with columns in other join
            tables, the index will be used.
            >
            To illustrate it with an example, I have a query like this:
            >
            select s.ticker, p.quantity
            from stock s, positions p
                 where s.type_id = 4
                          and s.stock_id = p.stock_id
            >
            There is an index on stock: (stock_id, type_id).  In the table stocks,
            there are about 100 different type_ids evenly distributed, and ten of
            thousand different stock_ids (each stock_id could map to every one of
            the 100 type_ids).
            >
            From the plan, the above query does not use the index, but the
            following query does use the index:
            >
            select s.ticker, p.quantity
            from stock s, positions p
                 where s.type_id = p.pos_type_id
                          and s.stock_id = p.stock_id
            >
            The only difference is that in the first query, type_id is constant4,
            while in the second it is linked with another column in the second
            table.
            >
            The stats are good.  Is there anything else that might have caused the
            above?
            >
            You are confusing the use of an index with automatically getting better
            performance than not using an index. Sometimes using an index will result in
            better performance, and sometimes not. If DB2 determines that it is faster
            to not use an index, then it will not use one. This typically happenswhen
            DB2 will have read every data page (rows are stored in pages) to satisfy the
            query.
            >
            Keep in mind that you are returning all rows from the table (you don't
            qualify which type_id or which stock_id's you want), so it is safe toassume
            that DB2 will have to read not just every page, but in fact every row, to
            satisfy the query.
            >
            Assuming you did a runstats as follows:
            runstats on table <table-namewith distribution on key columns and indexes
            all
            then DB2 almost always makes the correct decision.
            >
            My first query is not to return all data?  It's only returning on
            s.type_id = 4.
            >
            Actually my questions arised when I changed the indexes on the table
            stock and the same queries run more than twice as long as before.
            Then I found the main index is not used at all.   Since the tables are
            huge (40 mil rows), I don't see why DB2 decides it's faster not to use
            the index.
            >
            Maybe I should create an index on stock (type_id)?  Actually it
            eliminates a table scan on stock.   But why it is not using (stock_id,
            type_id) even though both columns are used in where clause.
            >
            I know I must have some mis-conceptions.  I just don't know which ones.
            >
            You should create index (type_id, stock_id) on stock.
            >
            Yonglei
            I think you are right -- if I create an index stock (type_id,
            stock_id), the index will be used and no more table scan is done.

            I created the index on (stock_id, type_id) because stock_id is much
            more selective. Can't the DB2 optimizer figure out that this index
            should be used?

            BTW, as (stock_id, type_id) was the only index on the table, I'm able
            to get rid of the table scan by making the table volatile.

            I know DB2 optimizer is good, so what do I miss?

            Comment

            • yongleig@gmail.com

              #7
              Re: why this 2-column index not used

              On Sep 14, 3:35 pm, "Henry J." <tank209...@yah oo.comwrote:
              On Sep 14, 10:51 am, yongl...@gmail. com wrote:
              >
              >
              >
              On Sep 13, 10:07 am, "Henry J." <tank209...@yah oo.comwrote:
              >
              On Sep 13, 12:34 am, "Mark A" <some...@someon e.comwrote:
              >
              "Henry J." <tank209...@yah oo.comwrote in message
              >
              news:02e9b6d6-229f-459a-b8f7-0097b19ba2b1@m4 5g2000hsb.googl egroups.com...
              >
              >I have a composite index on two columns in a table.  However, the
              index is not used in a query that restricts the 2nd column to a
              constant.  If both columns are linked with columns in other join
              tables, the index will be used.
              >
              To illustrate it with an example, I have a query like this:
              >
              select s.ticker, p.quantity
              from stock s, positions p
                   where s.type_id = 4
                            and s.stock_id = p.stock_id
              >
              There is an index on stock: (stock_id, type_id).  In the table stocks,
              there are about 100 different type_ids evenly distributed, and ten of
              thousand different stock_ids (each stock_id could map to every one of
              the 100 type_ids).
              >
              From the plan, the above query does not use the index, but the
              following query does use the index:
              >
              select s.ticker, p.quantity
              from stock s, positions p
                   where s.type_id = p.pos_type_id
                            and s.stock_id = p.stock_id
              >
              The only difference is that in the first query, type_id is constant 4,
              while in the second it is linked with another column in the second
              table.
              >
              The stats are good.  Is there anything else that might have caused the
              above?
              >
              You are confusing the use of an index with automatically getting better
              performance than not using an index. Sometimes using an index will result in
              better performance, and sometimes not. If DB2 determines that it isfaster
              to not use an index, then it will not use one. This typically happens when
              DB2 will have read every data page (rows are stored in pages) to satisfy the
              query.
              >
              Keep in mind that you are returning all rows from the table (you don't
              qualify which type_id or which stock_id's you want), so it is safe to assume
              that DB2 will have to read not just every page, but in fact every row, to
              satisfy the query.
              >
              Assuming you did a runstats as follows:
              runstats on table <table-namewith distribution on key columns andindexes
              all
              then DB2 almost always makes the correct decision.
              >
              My first query is not to return all data?  It's only returning on
              s.type_id = 4.
              >
              Actually my questions arised when I changed the indexes on the table
              stock and the same queries run more than twice as long as before.
              Then I found the main index is not used at all.   Since the tables are
              huge (40 mil rows), I don't see why DB2 decides it's faster not to use
              the index.
              >
              Maybe I should create an index on stock (type_id)?  Actually it
              eliminates a table scan on stock.   But why it is not using (stock_id,
              type_id) even though both columns are used in where clause.
              >
              I know I must have some mis-conceptions.  I just don't know which ones.
              >
              You should create index (type_id, stock_id) on stock.
              >
              Yonglei
              >
              I think you are right -- if I create an index stock (type_id,
              stock_id), the index will be used and no more table scan is done.
              >
              I created the index on (stock_id, type_id) because stock_id is much
              more selective.   Can't the DB2 optimizer figure out that this index
              should be used?
              >
              BTW, as (stock_id, type_id) was the only index on the table, I'm able
              to get rid of the table scan by making the table volatile.
              >
              I know DB2 optimizer is good, so what do I miss?
              For index1 (stock_id, type_id), DB2 needs to scan the whole index and
              keep keys with type_id=4, while for index2 (type_id, stock_id), it
              only needs to scan portion of index where type_id=4.

              That's huge difference.

              Comment

              Working...