Re: MDC vs. cluster index: which to use?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ian

    Re: MDC vs. cluster index: which to use?

    Henry J. wrote:
    >
    MDC on type_id? So cluster index is not a good idea?
    MDC *guarantees* clustering, whereas a table with a clustering index
    will eventually require maintenance (a.k.a. reorg) to maintain the
    cluster ratio.

    That's not to say that a clustering index isn't still valuable
    (especially for high cardinality columns that aren't a reasonable
    candidate as an MDC dimension).

  • Henry J.

    #2
    Re: MDC vs. cluster index: which to use?

    On Oct 21, 1:24 am, Ian <ianb...@mobile audio.comwrote:
    Henry J. wrote:
    >
    MDC on type_id?  So cluster index is not a good idea?
    >
    MDC *guarantees* clustering, whereas a table with a clustering index
    will eventually require maintenance (a.k.a. reorg) to maintain the
    cluster ratio.
    >
    That's not to say that a clustering index isn't still valuable
    (especially for high cardinality columns that aren't a reasonable
    candidate as an MDC dimension).
    For my case, the table is purged and re-populated every day. And the
    other column, product_id, has high cardinality.
    Perhaps a cluster index makes more sense? I guess I'd choose type_id
    if I have to pick one column for MDC. Then it won't be as helpful as
    a cluster index on (type_id, product_id)?

    Comment

    • ChrisC

      #3
      Re: MDC vs. cluster index: which to use?

      On Oct 21, 7:18 am, "Henry J." <tank209...@yah oo.comwrote:
      For my case, the table is purged and re-populated every day.  And the
      other column, product_id, has high cardinality.
      Perhaps a cluster index makes more sense?  I guess I'd choose type_id
      if I have to pick one column for MDC.  Then it won't be as helpful as
      a cluster index on (type_id, product_id)?
      MDC still might be faster - probably faster at loading and equivalent
      for searches based on type_id. For searches on type_id and
      product_id, it could outperform the cluster index, if you also create
      an index on product_id - and the combination (MDC + index on
      product_id) will definitely outperform a cluster index on type_id,
      product_id for product_id searches.

      You'll want to verify these claims - but since you rebuild the table
      every day, you are in a perfect position to try these various options
      out. Pick a new one each day until you've found the right
      combination.

      -Chris

      Comment

      • Henry J.

        #4
        Re: MDC vs. cluster index: which to use?

        On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
        On Oct 21, 7:18 am, "Henry J." <tank209...@yah oo.comwrote:
        >
        For my case, the table is purged and re-populated every day.  And the
        other column, product_id, has high cardinality.
        Perhaps a cluster index makes more sense?  I guess I'd choose type_id
        if I have to pick one column for MDC.  Then it won't be as helpful as
        a cluster index on (type_id, product_id)?
        >
        MDC still might be faster - probably faster at loading and equivalent
        for searches based on type_id.  For searches on type_id and
        product_id, it could outperform the cluster index, if you also create
        an index on product_id - and the combination (MDC + index on
        product_id) will definitely outperform a cluster index on type_id,
        product_id for product_id searches.
        >
        You'll want to verify these claims - but since you rebuild the table
        every day, you are in a perfect position to try these various options
        out.  Pick a new one each day until you've found the right
        combination.
        >
        -Chris
        Thanks Chris.

        The primary key of the table is (type_id, product_id). Now if we add
        a MDC on type_id, will the MDC be actually used if we query like the
        following, or the primary key will be used only? I guess even if only
        the primary key is used, the I/O will still be benefited.

        select client.client_i d, sum( prod.quantity * prod.price *
        client.ordersiz e )
        from prod, client
        where prod.type_id = 4 and prod.product_id =
        client.product_ id
        group by client.client_i d

        Also, would the following query benefit from both the MDC and the
        primary key?

        select *
        from prod
        where prod.type_id = 4 and prod.product_id = 101

        Or, instead of the primary key, just use an index on (product_id)
        would cover both of the above queries?

        I'll certainly test to find out. But I'd like to gain better
        understanding about indexing. Also, the turnaround of making such
        changes are not as fast as I want and I hope to get to the best
        solution a little faster. Thanks.



        Comment

        • Ian

          #5
          Re: MDC vs. cluster index: which to use?

          Henry J. wrote:
          On Oct 21, 1:24 am, Ian <ianb...@mobile audio.comwrote:
          >Henry J. wrote:
          >>
          >>MDC on type_id? So cluster index is not a good idea?
          >MDC *guarantees* clustering, whereas a table with a clustering index
          >will eventually require maintenance (a.k.a. reorg) to maintain the
          >cluster ratio.
          >>
          >That's not to say that a clustering index isn't still valuable
          >(especially for high cardinality columns that aren't a reasonable
          >candidate as an MDC dimension).
          >
          For my case, the table is purged and re-populated every day. And the
          other column, product_id, has high cardinality.
          Perhaps a cluster index makes more sense? I guess I'd choose type_id
          if I have to pick one column for MDC. Then it won't be as helpful as
          a cluster index on (type_id, product_id)?
          By purging it every day, you're going to make it nearly impossible
          for a clustering index to be useful. Unless your data comes in
          already sorted by type_id,product _id the cluster ratio is probably
          going to be very low, regardless of what you set pctfree to.

          Furthermore, will a clustering index really help with queries that
          aggregating on the clustering key? Maybe, but maybe not -- that
          depends on whether the query is doing an index scan / fetch. If
          it's just doing a tablescan, you'll still need to do a sort.


          Comment

          • Henry J.

            #6
            Re: MDC vs. cluster index: which to use?

            On Oct 21, 2:32 pm, Ian <ianb...@mobile audio.comwrote:
            Henry J. wrote:
            On Oct 21, 1:24 am, Ian <ianb...@mobile audio.comwrote:
            Henry J. wrote:
            >
            >MDC on type_id?  So cluster index is not a good idea?
            MDC *guarantees* clustering, whereas a table with a clustering index
            will eventually require maintenance (a.k.a. reorg) to maintain the
            cluster ratio.
            >
            That's not to say that a clustering index isn't still valuable
            (especially for high cardinality columns that aren't a reasonable
            candidate as an MDC dimension).
            >
            For my case, the table is purged and re-populated every day.  And the
            other column, product_id, has high cardinality.
            Perhaps a cluster index makes more sense?  I guess I'd choose type_id
            if I have to pick one column for MDC.  Then it won't be as helpful as
            a cluster index on (type_id, product_id)?
            >
            By purging it every day, you're going to make it nearly impossible
            for a clustering index to be useful.  Unless your data comes in
            already sorted by type_id,product _id the cluster ratio is probably
            going to be very low, regardless of what you set pctfree to.
            >
            Furthermore, will a clustering index really help with queries that
            aggregating on the clustering key?  Maybe, but maybe not -- that
            depends on whether the query is doing an index scan / fetch.  If
            it's just doing a tablescan, you'll still need to do a sort.- Hide quotedtext -
            >
            - Show quoted text -
            I'm totally confused -- doesn't a clustering index make sure the rows
            are inserted as sorted and hence with high cluster ratio? And only
            updates/deletes/inserts over time would reduce the cluster ratio? And
            what I understood is that a truly clustered index would avoid a sort.

            Comment

            • Henry J.

              #7
              Re: MDC vs. cluster index: which to use?

              On Oct 21, 1:05 pm, "Henry J." <tank209...@yah oo.comwrote:
              On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
              >
              >
              >
              >
              >
              On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:
              >
              For my case, the table is purged and re-populated every day.  And the
              other column, product_id, has high cardinality.
              Perhaps a cluster index makes more sense?  I guess I'd choose type_id
              if I have to pick one column for MDC.  Then it won't be as helpful as
              a cluster index on (type_id, product_id)?
              >
              MDC still might be faster - probably faster at loading and equivalent
              for searches based on type_id.  For searches on type_id and
              product_id, it could outperform the cluster index, if you also create
              an index on product_id - and the combination (MDC + index on
              product_id) will definitely outperform a cluster index on type_id,
              product_id for product_id searches.
              >
              You'll want to verify these claims - but since you rebuild the table
              every day, you are in a perfect position to try these various options
              out.  Pick a new one each day until you've found the right
              combination.
              >
              -Chris
              >
              Thanks Chris.
              >
              The primary key of the table is (type_id, product_id).  Now if we add
              a MDC on type_id, will the MDC be actually used if we query like the
              following, or the primary key will be used only?  I guess even if only
              the primary key is used, the I/O will still be benefited.
              >
                          select client.client_i d, sum( prod.quantity * prod.price *
              client.ordersiz e )
                           from prod, client
                           where prod.type_id = 4 and prod.product_id =
              client.product_ id
                           group by client.client_i d
              >
              Also, would the following query benefit from both the MDC and the
              primary key?
              >
                   select *
                         from prod
                           where prod.type_id = 4 and prod.product_id =  101
              >
              Or, instead of the primary key, just use an index on (product_id)
              would cover both of the above queries?
              >
              I'll certainly test to find out.  But I'd like to gain better
              understanding about indexing.  Also, the turnaround of making such
              changes are not as fast as I want and I hope to get to the best
              solution a little faster.  Thanks.- Hide quoted text -
              >
              - Show quoted text -
              Now I've changed the primary key of the table to (type_id, product_id)
              while making (type_id) as MDC.

              When examing the plan, I found that the following query:

              select client.client_i d, sum( prod.quantity * prod.price
              * client.ordersiz e )
              from prod, client
              where prod.type_id = 4 and prod.product_id =
              client.product_ id
              group by client.client_i d

              does use the new primary key on (type_id, product_id). However, if I
              use a bind variable for type_id, i.e.:

              select client.client_i d, sum( prod.quantity * prod.price
              * client.ordersiz e )
              from prod, client
              where prod.type_id = ? and prod.product_id =
              client.product_ id
              group by client.client_i d

              It only uses the MDC on (type_id).

              Now, my questions are:

              1) should I drop the bind variable on type_id so it can utilize the
              primary key? The cardinality of type_id is about 5 to 8.

              2) Would an addition on (product_id) help?

              Thanks.

              Comment

              • Henry J.

                #8
                Re: MDC vs. cluster index: which to use?

                On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
                On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:
                >
                For my case, the table is purged and re-populated every day.  And the
                other column, product_id, has high cardinality.
                Perhaps a cluster index makes more sense?  I guess I'd choose type_id
                if I have to pick one column forMDC.  Then it won't be as helpful as
                a cluster index on (type_id, product_id)?
                >
                MDCstill might be faster - probably faster at loading and equivalent
                for searches based on type_id.  For searches on type_id and
                product_id, it could outperform the cluster index, if you also create
                an index on product_id - and the combination (MDC+ index on
                product_id) will definitely outperform a cluster index on type_id,
                product_id for product_id searches.
                >
                You'll want to verify these claims - but since you rebuild the table
                every day, you are in a perfect position to try these various options
                out.  Pick a new one each day until you've found the right
                combination.
                >
                -Chris
                I have the primary key changed to (type_id, product_id), and added a
                MDC on type_id. Now the following query:


                select client.client_i d, sum( prod.quantity * prod.price *
                client.ordersiz e )
                from prod, client
                where prod.type_id = 4 and prod.product_id =
                client.product_ id
                group by client.client_i d

                it will utilize the primary key. However, if I use a host variable
                for type_id, it will only pick up the MDC on type_id, which I think is
                not optimal.

                My questions are:

                1) should I add an index on (product_id) which is what's really needed
                rather than the primary key on (type_id, product_id). Note that the
                cardinality of type_id is pretty small, around 6.

                2) should I drop the host variable on type_id? Note that each type_id
                has the same number of product_id's. Perhaps this is not a good case
                for host variable. BTW, this SQL is embedded in a Java (as static SQL
                I think?).

                Thanks!

                Thanks!

                Comment

                • Henry J.

                  #9
                  Re: MDC vs. cluster index: which to use?

                  On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
                  On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:
                  >
                  For my case, the table is purged and re-populated every day.  And the
                  other column, product_id, has high cardinality.
                  Perhaps a cluster index makes more sense?  I guess I'd choose type_id
                  if I have to pick one column forMDC.  Then it won't be as helpful as
                  a cluster index on (type_id, product_id)?
                  >
                  MDCstill might be faster - probably faster at loading and equivalent
                  for searches based on type_id.  For searches on type_id and
                  product_id, it could outperform the cluster index, if you also create
                  an index on product_id - and the combination (MDC+ index on
                  product_id) will definitely outperform a cluster index on type_id,
                  product_id for product_id searches.
                  >
                  You'll want to verify these claims - but since you rebuild the table
                  every day, you are in a perfect position to try these various options
                  out.  Pick a new one each day until you've found the right
                  combination.
                  >
                  -Chris
                  I have the primary key changed to (type_id, product_id), and added a
                  MDC on type_id. Now the following query:


                  select client.client_i d, sum( prod.quantity * prod.price *
                  client.ordersiz e )
                  from prod, client
                  where prod.type_id = 4 and prod.product_id =
                  client.product_ id
                  group by client.client_i d

                  will utilize the primary key. However, if I use a host variable for
                  type_id (i.e., type_id = ?), it will only pick up the MDC on type_id,
                  which I think is not optimal.

                  My questions are:

                  1) should I add an index on (product_id) which is what's really needed
                  rather than the primary key on (type_id, product_id). Note that the
                  cardinality of type_id is pretty small, around 6.

                  2) should I drop the host variable on type_id? Note that each type_id
                  has the same number of product_id's. Perhaps this is not a good case
                  for host variable. BTW, this SQL is embedded in a Java (as static SQL
                  I think?).

                  Thanks!

                  Comment

                  • Saurabh.Nr@gmail.com

                    #10
                    Re: MDC vs. cluster index: which to use?

                    On Oct 29, 12:56 am, "Henry J." <tank209...@yah oo.comwrote:
                    On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
                    >
                    >
                    >
                    >
                    >
                    On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:
                    >
                    For my case, the table is purged and re-populated every day.  And the
                    other column, product_id, has high cardinality.
                    Perhaps a cluster index makes more sense?  I guess I'd choose type_id
                    if I have to pick one column forMDC.  Then it won't be as helpful as
                    a cluster index on (type_id, product_id)?
                    >
                    MDCstill might be faster - probably faster at loading and equivalent
                    for searches based on type_id.  For searches on type_id and
                    product_id, it could outperform the cluster index, if you also create
                    an index on product_id - and the combination (MDC+ index on
                    product_id) will definitely outperform a cluster index on type_id,
                    product_id for product_id searches.
                    >
                    You'll want to verify these claims - but since you rebuild the table
                    every day, you are in a perfect position to try these various options
                    out.  Pick a new one each day until you've found the right
                    combination.
                    >
                    -Chris
                    >
                    I have the primary key changed to (type_id, product_id), and added a
                    MDC on type_id.  Now the following query:
                    >
                           select client.client_i d, sum( prod.quantity * prod.price *
                    client.ordersiz e )
                                 from prod, client
                                 where prod.type_id = 4 and prod.product_id =
                    client.product_ id
                                 group by client.client_i d
                    >
                    will utilize the primary key.  However, if I use a host variable for
                    type_id (i.e., type_id = ?), it will only pick up the MDC on type_id,
                    which I think is not optimal.
                    >
                    My questions are:
                    >
                    1) should I add an index on (product_id) which is what's really needed
                    rather than the primary key on (type_id, product_id).  Note that the
                    cardinality of type_id is pretty small, around 6.
                    >
                    2) should I drop the host variable on type_id?  Note that each type_id
                    has the same number of product_id's.  Perhaps this is not a good case
                    for host variable.  BTW, this SQL is embedded in a Java (as static SQL
                    I think?).
                    >
                    Thanks!- Hide quoted text -
                    >
                    - Show quoted text -
                    Hi,
                    there is a shell script know extent utilisation.sh which can be
                    downloaded from IBM website.it will answers all your queries regarding
                    which will be a better column and how much space will you save and how
                    will it improve the performance.

                    Comment

                    • Henry J.

                      #11
                      Re: MDC vs. cluster index: which to use?

                      On Oct 29, 9:55 am, Saurabh...@gmai l.com wrote:
                      On Oct 29, 12:56 am, "HenryJ." <tank209...@yah oo.comwrote:
                      >
                      >
                      >
                      >
                      >
                      On Oct 21, 11:19 am, ChrisC <cunningham...@ gmail.comwrote:
                      >
                      On Oct 21, 7:18 am, "HenryJ." <tank209...@yah oo.comwrote:
                      >
                      For my case, the table is purged and re-populated every day.  Andthe
                      other column, product_id, has high cardinality.
                      Perhaps a cluster index makes more sense?  I guess I'd choose type_id
                      if I have to pick one column forMDC.  Then it won't be as helpfulas
                      a cluster index on (type_id, product_id)?
                      >
                      MDCstill might be faster - probably faster at loading and equivalent
                      for searches based on type_id.  For searches on type_id and
                      product_id, it could outperform the cluster index, if you also create
                      an index on product_id - and the combination (MDC+ index on
                      product_id) will definitely outperform a cluster index on type_id,
                      product_id for product_id searches.
                      >
                      You'll want to verify these claims - but since you rebuild the table
                      every day, you are in a perfect position to try these various options
                      out.  Pick a new one each day until you've found the right
                      combination.
                      >
                      -Chris
                      >
                      I have the primary key changed to (type_id, product_id), and added a
                      MDCon type_id.  Now the following query:
                      >
                             select client.client_i d, sum( prod.quantity * prod.price*
                      client.ordersiz e )
                                   from prod, client
                                   where prod.type_id = 4 and prod.product_id =
                      client.product_ id
                                   group by client.client_i d
                      >
                      will utilize the primary key.  However, if I use a host variable for
                      type_id (i.e., type_id = ?), it will only pick up theMDCon type_id,
                      which I think is not optimal.
                      >
                      My questions are:
                      >
                      1) should I add an index on (product_id) which is what's really needed
                      rather than the primary key on (type_id, product_id).  Note that the
                      cardinality of type_id is pretty small, around 6.
                      >
                      2) should I drop the host variable on type_id?  Note that each type_id
                      has the same number of product_id's.  Perhaps this is not a good case
                      for host variable.  BTW, this SQL is embedded in a Java (as static SQL
                      I think?).
                      >
                      Thanks!- Hide quoted text -
                      >
                      - Show quoted text -
                      >
                      Hi,
                      there is a shell script know extent utilisation.sh which can be
                      downloaded from IBM website.it will answers all your queries regarding
                      which will be a better column and how much space will you save and how
                      will it improve the performance.- Hide quoted text -
                      >
                      - Show quoted text -
                      Could you please give me the link? I couldn't find it. Thanks.

                      Comment

                      Working...