Indexing strategy

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ryan

    Indexing strategy

    I have a bit of a problem with regards an indexing strategy. Well,
    basically there is no indexing strategy on a set of data I have at
    work. Now, I didn't create the design as I would have allowed for this.

    OK, so there is primary key (clustered) indexes (mainly composite
    keys), but no other indexes on the tables. As you would expect, the
    performance leaves a lot to be desired. A hell of a lot. We have
    several million rows in a lot of the tables. None of the queries seem
    to be overly complex so we can work through the applications at a later
    stage.

    We have multiple copies (one per client per country) of the same
    structure (I may have considered combining these to allow better
    performance). One specific SP that I need to run takes 40+ hours
    without indexes and 5 hours with some (130) basic indexes to get us
    started on a better design. These 130 indexes are the minimum I suspect
    we need and from there, we can start to work out which ones we need.

    Now the test database (our worst performer) doubles in size to 20Gb,
    but the performance is much much better (as expected). The original
    thinking behind the design was for storage concerns (server space
    recently upgraded) and for performance with bulk inserts.

    We have a lot of bulk inserts, but I suspect that these are not too
    bad, and the time taken for indexing these is negligable due to the
    performance gains. I strongly suspect that we have a considerable
    amount of table scans going on, but the problem is that people here
    don't understand indexing (yet) or have the time (probably because it's
    all taken up using the poorly designed system). That's a whole seperate
    issue for me to address.

    So, finally getting round to my questions...

    Is there any good reference explaining in Layman's terms why you need
    basic (or advanced) indexing ? Any links would be appreciated. I need
    this to help explain to colleagues why a disk space increase and
    indexing will be far better than spending thousands on a new box and
    doing the same (a common problem I suspect).

    How can I accurately estimate the amount of time taken to update an
    index once data is bulk inserted. I don't want to re-index from scratch
    as this may take too long. Indexing my database first time round takes
    about 1 hour 30 minutes.

    It's all part of an ongoing bit of digging into the system and re-doing
    it to make it work properly. I'm sure most of you will have been there
    at some point or another.

    Thanks


    Ryan

  • Stu

    #2
    Re: Indexing strategy

    Hey Ryan,

    I can't think of any layman's resources for indexing strategies, but I
    know that Kalen Delaney's book "Inside SQL Server" has some very good
    explanations of how things work under the hood (including indexes);
    might be useful.

    However, I am a bit perplexed as to the need for 130+ indexes being
    affected by a single stored procedure; is this because of the redundant
    data structure per client? Can you post some sample table structures?
    It sounds as if the clustered index is actually a poor candidate given
    the volume of data and the amount of time to perform a bulk insert.

    Stu


    Ryan wrote:[color=blue]
    > I have a bit of a problem with regards an indexing strategy. Well,
    > basically there is no indexing strategy on a set of data I have at
    > work. Now, I didn't create the design as I would have allowed for this.
    >
    > OK, so there is primary key (clustered) indexes (mainly composite
    > keys), but no other indexes on the tables. As you would expect, the
    > performance leaves a lot to be desired. A hell of a lot. We have
    > several million rows in a lot of the tables. None of the queries seem
    > to be overly complex so we can work through the applications at a later
    > stage.
    >
    > We have multiple copies (one per client per country) of the same
    > structure (I may have considered combining these to allow better
    > performance). One specific SP that I need to run takes 40+ hours
    > without indexes and 5 hours with some (130) basic indexes to get us
    > started on a better design. These 130 indexes are the minimum I suspect
    > we need and from there, we can start to work out which ones we need.
    >
    > Now the test database (our worst performer) doubles in size to 20Gb,
    > but the performance is much much better (as expected). The original
    > thinking behind the design was for storage concerns (server space
    > recently upgraded) and for performance with bulk inserts.
    >
    > We have a lot of bulk inserts, but I suspect that these are not too
    > bad, and the time taken for indexing these is negligable due to the
    > performance gains. I strongly suspect that we have a considerable
    > amount of table scans going on, but the problem is that people here
    > don't understand indexing (yet) or have the time (probably because it's
    > all taken up using the poorly designed system). That's a whole seperate
    > issue for me to address.
    >
    > So, finally getting round to my questions...
    >
    > Is there any good reference explaining in Layman's terms why you need
    > basic (or advanced) indexing ? Any links would be appreciated. I need
    > this to help explain to colleagues why a disk space increase and
    > indexing will be far better than spending thousands on a new box and
    > doing the same (a common problem I suspect).
    >
    > How can I accurately estimate the amount of time taken to update an
    > index once data is bulk inserted. I don't want to re-index from scratch
    > as this may take too long. Indexing my database first time round takes
    > about 1 hour 30 minutes.
    >
    > It's all part of an ongoing bit of digging into the system and re-doing
    > it to make it work properly. I'm sure most of you will have been there
    > at some point or another.
    >
    > Thanks
    >
    >
    > Ryan[/color]

    Comment

    • Erland Sommarskog

      #3
      Re: Indexing strategy

      Ryan (ryanofford@hot mail.com) writes:[color=blue]
      > Is there any good reference explaining in Layman's terms why you need
      > basic (or advanced) indexing ? Any links would be appreciated. I need
      > this to help explain to colleagues why a disk space increase and
      > indexing will be far better than spending thousands on a new box and
      > doing the same (a common problem I suspect).[/color]

      I echo Stu and recommend Kalen's book.

      Understanding indexing correctly is absolute essential to get best
      performance out of SQL Server (or any other RDBMS for that matter).
      [color=blue]
      > How can I accurately estimate the amount of time taken to update an
      > index once data is bulk inserted. I don't want to re-index from scratch
      > as this may take too long. Indexing my database first time round takes
      > about 1 hour 30 minutes.[/color]

      For 130 indexes? That's not too bad.

      In any case, the best way to find out the cost of an index for bulk
      insert is to benchmark.


      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server 2005 at

      Books Online for SQL Server 2000 at

      Comment

      • Ryan

        #4
        Re: Indexing strategy

        It's 130 basic indexes across all of the tables. Some of which are used
        in the SP. I examined all of the tables to work out the minimum, then
        we can start working on a proper strategy. Example table :

        if exists (select * from dbo.sysobjects where id =
        object_id(N'[dbo].[DEALER_SOURCE_D ATA_VALUES]') and OBJECTPROPERTY( id,
        N'IsUserTable') = 1)
        drop table [dbo].[DEALER_SOURCE_D ATA_VALUES]
        GO

        CREATE TABLE [dbo].[DEALER_SOURCE_D ATA_VALUES] (
        [DEALER_SOURCE_D ATA_ID] [int] NOT NULL ,
        [FIELD_CODE] [varchar] (10) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT
        NULL ,
        [FIELD_VALUE] [numeric](15, 5) NOT NULL
        ) ON [PRIMARY]
        GO

        ALTER TABLE [dbo].[DEALER_SOURCE_D ATA_VALUES] WITH NOCHECK ADD
        CONSTRAINT [PK_DEALER_SOURC E_DATA_VALUES] PRIMARY KEY CLUSTERED
        (
        [DEALER_SOURCE_D ATA_ID],
        [FIELD_CODE]
        ) WITH FILLFACTOR = 90 ON [PRIMARY]
        GO

        This table has approx 25 million rows. I would have added indexes to
        Dealer_Source_D ata_Id and Field Code seperately. This table holds the
        data ID which can be related back to a row in another table for the
        year, month and Dealer ID. It also holds all lines and the value
        against those lines. Quite often queries would be needed for everything
        by the Dealer_Source_D ata_Id hence the first index I would add. Also,
        we want to sum (for example) all of a specific field_code, hence the
        other index I would add.

        We would normally have approx 4,000 unique Field_Codes per
        Dealer_Source_D ata_Id.

        The PK is correct, but not good for querying what we need to as it
        would only be used when we specify both Dealer_Source_D ata_Id and
        Field_Code in order to return a value. Correct ? When I want more, it
        would force a table scan if I'm correct. Proper indexing should allow
        us to only need to scan 4,000 rows max using one index and 6,250 rows
        using the other.


        Ryan

        Stu wrote:[color=blue]
        > Hey Ryan,
        >
        > I can't think of any layman's resources for indexing strategies, but I
        > know that Kalen Delaney's book "Inside SQL Server" has some very good
        > explanations of how things work under the hood (including indexes);
        > might be useful.
        >
        > However, I am a bit perplexed as to the need for 130+ indexes being
        > affected by a single stored procedure; is this because of the redundant
        > data structure per client? Can you post some sample table structures?
        > It sounds as if the clustered index is actually a poor candidate given
        > the volume of data and the amount of time to perform a bulk insert.
        >
        > Stu
        >
        >
        > Ryan wrote:[color=green]
        > > I have a bit of a problem with regards an indexing strategy. Well,
        > > basically there is no indexing strategy on a set of data I have at
        > > work. Now, I didn't create the design as I would have allowed for this.
        > >
        > > OK, so there is primary key (clustered) indexes (mainly composite
        > > keys), but no other indexes on the tables. As you would expect, the
        > > performance leaves a lot to be desired. A hell of a lot. We have
        > > several million rows in a lot of the tables. None of the queries seem
        > > to be overly complex so we can work through the applications at a later
        > > stage.
        > >
        > > We have multiple copies (one per client per country) of the same
        > > structure (I may have considered combining these to allow better
        > > performance). One specific SP that I need to run takes 40+ hours
        > > without indexes and 5 hours with some (130) basic indexes to get us
        > > started on a better design. These 130 indexes are the minimum I suspect
        > > we need and from there, we can start to work out which ones we need.
        > >
        > > Now the test database (our worst performer) doubles in size to 20Gb,
        > > but the performance is much much better (as expected). The original
        > > thinking behind the design was for storage concerns (server space
        > > recently upgraded) and for performance with bulk inserts.
        > >
        > > We have a lot of bulk inserts, but I suspect that these are not too
        > > bad, and the time taken for indexing these is negligable due to the
        > > performance gains. I strongly suspect that we have a considerable
        > > amount of table scans going on, but the problem is that people here
        > > don't understand indexing (yet) or have the time (probably because it's
        > > all taken up using the poorly designed system). That's a whole seperate
        > > issue for me to address.
        > >
        > > So, finally getting round to my questions...
        > >
        > > Is there any good reference explaining in Layman's terms why you need
        > > basic (or advanced) indexing ? Any links would be appreciated. I need
        > > this to help explain to colleagues why a disk space increase and
        > > indexing will be far better than spending thousands on a new box and
        > > doing the same (a common problem I suspect).
        > >
        > > How can I accurately estimate the amount of time taken to update an
        > > index once data is bulk inserted. I don't want to re-index from scratch
        > > as this may take too long. Indexing my database first time round takes
        > > about 1 hour 30 minutes.
        > >
        > > It's all part of an ongoing bit of digging into the system and re-doing
        > > it to make it work properly. I'm sure most of you will have been there
        > > at some point or another.
        > >
        > > Thanks
        > >
        > >
        > > Ryan[/color][/color]

        Comment

        • francois.bourdages@harfan.com

          #5
          Re: Indexing strategy

          A good rule of thumb :
          1) index all FK.

          If you have time and money :
          Analyse all your query, and you will be able to get advise on all the
          index you need.

          If you dont have time or money :
          Start working on long query first, and optimize them. You will get big
          result fast.

          90% of all performance probleme i have seen in the past are from query.
          (query itself or logic in the design) . Not network, not RAM, or CPU,
          not disk.
          So start there.

          Comment

          • Erland Sommarskog

            #6
            Re: Indexing strategy

            Ryan (ryanofford@hot mail.com) writes:[color=blue]
            > ALTER TABLE [dbo].[DEALER_SOURCE_D ATA_VALUES] WITH NOCHECK ADD
            > CONSTRAINT [PK_DEALER_SOURC E_DATA_VALUES] PRIMARY KEY CLUSTERED
            > (
            > [DEALER_SOURCE_D ATA_ID],
            > [FIELD_CODE]
            > ) WITH FILLFACTOR = 90 ON [PRIMARY]
            > GO
            >
            > This table has approx 25 million rows. I would have added indexes to
            > Dealer_Source_D ata_Id and Field Code seperately. This table holds the
            > data ID which can be related back to a row in another table for the
            > year, month and Dealer ID. It also holds all lines and the value
            > against those lines. Quite often queries would be needed for everything
            > by the Dealer_Source_D ata_Id hence the first index I would add. Also,
            > we want to sum (for example) all of a specific field_code, hence the
            > other index I would add.
            >
            > We would normally have approx 4,000 unique Field_Codes per
            > Dealer_Source_D ata_Id.
            >
            > The PK is correct, but not good for querying what we need to as it
            > would only be used when we specify both Dealer_Source_D ata_Id and
            > Field_Code in order to return a value. Correct ?[/color]

            Not really. A query only by DEALER_SOURCE_D ATA_ID would use the
            clustered index. Unless you do queries like:

            SELECT COUNT(*), DEALER_SOURCE_D ATA_ID
            FROM tbl
            GROUP BY DEALER_SOURCE_D ATA_ID

            there is not reason to add an NC index on DEALER_SOURCE_D ATA_ID.


            But you are correct that the clustred index is not good for queries
            on a certain FIELD_CODE.


            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server 2005 at

            Books Online for SQL Server 2000 at

            Comment

            • DickChristoph

              #7
              Re: Indexing strategy

              > Is there any good reference explaining in Layman's terms why you need[color=blue]
              > basic (or advanced) indexing ?[/color]

              I assume you need to explain this to the people who spend the money.

              Do you know how a binary search works? I usually explain the benefits of
              indexing by saying. Suppose you are searching the phonebook. You aren't
              going to search every entry on every page until you find the name you are
              looking for. (full table scan).

              You could split the book in half and determine if the name you are looking
              for is in the left or the right half. Then keep doing that until you get to
              the right page. You will have at worst Log base 2 of N comparisons. N being
              the total number of entries. You can search 1,000,000 entries with 20
              comparisons versus an average of 500,000 without an index.

              Or I just say its like those little tabs in some dictionaries. (which it
              isn't really) but it gets across the point that you are making less work for
              SQL Server to do.

              --
              -Dick Christoph
              "Ryan" <ryanofford@hot mail.com> wrote in message
              news:1150119580 .643378.235860@ f6g2000cwb.goog legroups.com...[color=blue]
              >I have a bit of a problem with regards an indexing strategy. Well,
              > basically there is no indexing strategy on a set of data I have at
              > work. Now, I didn't create the design as I would have allowed for this.
              >
              > OK, so there is primary key (clustered) indexes (mainly composite
              > keys), but no other indexes on the tables. As you would expect, the
              > performance leaves a lot to be desired. A hell of a lot. We have
              > several million rows in a lot of the tables. None of the queries seem
              > to be overly complex so we can work through the applications at a later
              > stage.
              >
              > We have multiple copies (one per client per country) of the same
              > structure (I may have considered combining these to allow better
              > performance). One specific SP that I need to run takes 40+ hours
              > without indexes and 5 hours with some (130) basic indexes to get us
              > started on a better design. These 130 indexes are the minimum I suspect
              > we need and from there, we can start to work out which ones we need.
              >
              > Now the test database (our worst performer) doubles in size to 20Gb,
              > but the performance is much much better (as expected). The original
              > thinking behind the design was for storage concerns (server space
              > recently upgraded) and for performance with bulk inserts.
              >
              > We have a lot of bulk inserts, but I suspect that these are not too
              > bad, and the time taken for indexing these is negligable due to the
              > performance gains. I strongly suspect that we have a considerable
              > amount of table scans going on, but the problem is that people here
              > don't understand indexing (yet) or have the time (probably because it's
              > all taken up using the poorly designed system). That's a whole seperate
              > issue for me to address.
              >
              > So, finally getting round to my questions...
              >
              > Is there any good reference explaining in Layman's terms why you need
              > basic (or advanced) indexing ? Any links would be appreciated. I need
              > this to help explain to colleagues why a disk space increase and
              > indexing will be far better than spending thousands on a new box and
              > doing the same (a common problem I suspect).
              >
              > How can I accurately estimate the amount of time taken to update an
              > index once data is bulk inserted. I don't want to re-index from scratch
              > as this may take too long. Indexing my database first time round takes
              > about 1 hour 30 minutes.
              >
              > It's all part of an ongoing bit of digging into the system and re-doing
              > it to make it work properly. I'm sure most of you will have been there
              > at some point or another.
              >
              > Thanks
              >
              >
              > Ryan
              >[/color]


              Comment

              • Mike C#

                #8
                Re: Indexing strategy

                Use the 80/20 rule to start. Usually 20% of your queries will account for
                80% of your interactions with the database. Identify the 20% that are used
                the most often and create indexes that optimize them. You can grab a query
                plan (I prefer the graphical query plans in QA) for each query to identify
                the bottlenecks and where indexes would be most effective.

                "Ryan" <ryanofford@hot mail.com> wrote in message
                news:1150119580 .643378.235860@ f6g2000cwb.goog legroups.com...[color=blue]
                >I have a bit of a problem with regards an indexing strategy. Well,
                > basically there is no indexing strategy on a set of data I have at
                > work. Now, I didn't create the design as I would have allowed for this.
                >
                > OK, so there is primary key (clustered) indexes (mainly composite
                > keys), but no other indexes on the tables. As you would expect, the
                > performance leaves a lot to be desired. A hell of a lot. We have
                > several million rows in a lot of the tables. None of the queries seem
                > to be overly complex so we can work through the applications at a later
                > stage.
                >
                > We have multiple copies (one per client per country) of the same
                > structure (I may have considered combining these to allow better
                > performance). One specific SP that I need to run takes 40+ hours
                > without indexes and 5 hours with some (130) basic indexes to get us
                > started on a better design. These 130 indexes are the minimum I suspect
                > we need and from there, we can start to work out which ones we need.
                >
                > Now the test database (our worst performer) doubles in size to 20Gb,
                > but the performance is much much better (as expected). The original
                > thinking behind the design was for storage concerns (server space
                > recently upgraded) and for performance with bulk inserts.
                >
                > We have a lot of bulk inserts, but I suspect that these are not too
                > bad, and the time taken for indexing these is negligable due to the
                > performance gains. I strongly suspect that we have a considerable
                > amount of table scans going on, but the problem is that people here
                > don't understand indexing (yet) or have the time (probably because it's
                > all taken up using the poorly designed system). That's a whole seperate
                > issue for me to address.
                >
                > So, finally getting round to my questions...
                >
                > Is there any good reference explaining in Layman's terms why you need
                > basic (or advanced) indexing ? Any links would be appreciated. I need
                > this to help explain to colleagues why a disk space increase and
                > indexing will be far better than spending thousands on a new box and
                > doing the same (a common problem I suspect).
                >
                > How can I accurately estimate the amount of time taken to update an
                > index once data is bulk inserted. I don't want to re-index from scratch
                > as this may take too long. Indexing my database first time round takes
                > about 1 hour 30 minutes.
                >
                > It's all part of an ongoing bit of digging into the system and re-doing
                > it to make it work properly. I'm sure most of you will have been there
                > at some point or another.
                >
                > Thanks
                >
                >
                > Ryan
                >[/color]


                Comment

                Working...