Indexing strategy

**Stu** · Jun 12 '06, 06:45 PM

Re: Indexing strategy

Hey Ryan,

I can't think of any layman's resources for indexing strategies, but I
know that Kalen Delaney's book "Inside SQL Server" has some very good
explanations of how things work under the hood (including indexes);
might be useful.

However, I am a bit perplexed as to the need for 130+ indexes being
affected by a single stored procedure; is this because of the redundant
data structure per client? Can you post some sample table structures?
It sounds as if the clustered index is actually a poor candidate given
the volume of data and the amount of time to perform a bulk insert.

Stu

Ryan wrote:[color=blue]
> I have a bit of a problem with regards an indexing strategy. Well,
> basically there is no indexing strategy on a set of data I have at
> work. Now, I didn't create the design as I would have allowed for this.
>
> OK, so there is primary key (clustered) indexes (mainly composite
> keys), but no other indexes on the tables. As you would expect, the
> performance leaves a lot to be desired. A hell of a lot. We have
> several million rows in a lot of the tables. None of the queries seem
> to be overly complex so we can work through the applications at a later
> stage.
>
> We have multiple copies (one per client per country) of the same
> structure (I may have considered combining these to allow better
> performance). One specific SP that I need to run takes 40+ hours
> without indexes and 5 hours with some (130) basic indexes to get us
> started on a better design. These 130 indexes are the minimum I suspect
> we need and from there, we can start to work out which ones we need.
>
> Now the test database (our worst performer) doubles in size to 20Gb,
> but the performance is much much better (as expected). The original
> thinking behind the design was for storage concerns (server space
> recently upgraded) and for performance with bulk inserts.
>
> We have a lot of bulk inserts, but I suspect that these are not too
> bad, and the time taken for indexing these is negligable due to the
> performance gains. I strongly suspect that we have a considerable
> amount of table scans going on, but the problem is that people here
> don't understand indexing (yet) or have the time (probably because it's
> all taken up using the poorly designed system). That's a whole seperate
> issue for me to address.
>
> So, finally getting round to my questions...
>
> Is there any good reference explaining in Layman's terms why you need
> basic (or advanced) indexing ? Any links would be appreciated. I need
> this to help explain to colleagues why a disk space increase and
> indexing will be far better than spending thousands on a new box and
> doing the same (a common problem I suspect).
>
> How can I accurately estimate the amount of time taken to update an
> index once data is bulk inserted. I don't want to re-index from scratch
> as this may take too long. Indexing my database first time round takes
> about 1 hour 30 minutes.
>
> It's all part of an ongoing bit of digging into the system and re-doing
> it to make it work properly. I'm sure most of you will have been there
> at some point or another.
>
> Thanks
>
>
> Ryan[/color]

**Erland Sommarskog** · Jun 12 '06, 09:55 PM

Re: Indexing strategy

Ryan (ryanofford@hot mail.com) writes:[color=blue]
> Is there any good reference explaining in Layman's terms why you need
> basic (or advanced) indexing ? Any links would be appreciated. I need
> this to help explain to colleagues why a disk space increase and
> indexing will be far better than spending thousands on a new box and
> doing the same (a common problem I suspect).[/color]

I echo Stu and recommend Kalen's book.

Understanding indexing correctly is absolute essential to get best
performance out of SQL Server (or any other RDBMS for that matter).
[color=blue]
> How can I accurately estimate the amount of time taken to update an
> index once data is bulk inserted. I don't want to re-index from scratch
> as this may take too long. Indexing my database first time round takes
> about 1 hour 30 minutes.[/color]

For 130 indexes? That's not too bad.

In any case, the best way to find out the cost of an index for bulk
insert is to benchmark.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server 2005 at

http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

Books Online for SQL Server 2000 at

http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

**Ryan** · Jun 13 '06, 08:55 AM

Re: Indexing strategy

It's 130 basic indexes across all of the tables. Some of which are used
in the SP. I examined all of the tables to work out the minimum, then
we can start working on a proper strategy. Example table :

if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[DEALER_SOURCE_D ATA_VALUES]') and OBJECTPROPERTY( id,
N'IsUserTable') = 1)
drop table [dbo].[DEALER_SOURCE_D ATA_VALUES]
GO

CREATE TABLE [dbo].[DEALER_SOURCE_D ATA_VALUES] (
[DEALER_SOURCE_D ATA_ID] [int] NOT NULL ,
[FIELD_CODE] [varchar] (10) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT
NULL ,
[FIELD_VALUE] [numeric](15, 5) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[DEALER_SOURCE_D ATA_VALUES] WITH NOCHECK ADD
CONSTRAINT [PK_DEALER_SOURC E_DATA_VALUES] PRIMARY KEY CLUSTERED
(
[DEALER_SOURCE_D ATA_ID],
[FIELD_CODE]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

This table has approx 25 million rows. I would have added indexes to
Dealer_Source_D ata_Id and Field Code seperately. This table holds the
data ID which can be related back to a row in another table for the
year, month and Dealer ID. It also holds all lines and the value
against those lines. Quite often queries would be needed for everything
by the Dealer_Source_D ata_Id hence the first index I would add. Also,
we want to sum (for example) all of a specific field_code, hence the
other index I would add.

We would normally have approx 4,000 unique Field_Codes per
Dealer_Source_D ata_Id.

The PK is correct, but not good for querying what we need to as it
would only be used when we specify both Dealer_Source_D ata_Id and
Field_Code in order to return a value. Correct ? When I want more, it
would force a table scan if I'm correct. Proper indexing should allow
us to only need to scan 4,000 rows max using one index and 6,250 rows
using the other.

Ryan

Stu wrote:[color=blue]
> Hey Ryan,
>
> I can't think of any layman's resources for indexing strategies, but I
> know that Kalen Delaney's book "Inside SQL Server" has some very good
> explanations of how things work under the hood (including indexes);
> might be useful.
>
> However, I am a bit perplexed as to the need for 130+ indexes being
> affected by a single stored procedure; is this because of the redundant
> data structure per client? Can you post some sample table structures?
> It sounds as if the clustered index is actually a poor candidate given
> the volume of data and the amount of time to perform a bulk insert.
>
> Stu
>
>
> Ryan wrote:[color=green]
> > I have a bit of a problem with regards an indexing strategy. Well,
> > basically there is no indexing strategy on a set of data I have at
> > work. Now, I didn't create the design as I would have allowed for this.
> >
> > OK, so there is primary key (clustered) indexes (mainly composite
> > keys), but no other indexes on the tables. As you would expect, the
> > performance leaves a lot to be desired. A hell of a lot. We have
> > several million rows in a lot of the tables. None of the queries seem
> > to be overly complex so we can work through the applications at a later
> > stage.
> >
> > We have multiple copies (one per client per country) of the same
> > structure (I may have considered combining these to allow better
> > performance). One specific SP that I need to run takes 40+ hours
> > without indexes and 5 hours with some (130) basic indexes to get us
> > started on a better design. These 130 indexes are the minimum I suspect
> > we need and from there, we can start to work out which ones we need.
> >
> > Now the test database (our worst performer) doubles in size to 20Gb,
> > but the performance is much much better (as expected). The original
> > thinking behind the design was for storage concerns (server space
> > recently upgraded) and for performance with bulk inserts.
> >
> > We have a lot of bulk inserts, but I suspect that these are not too
> > bad, and the time taken for indexing these is negligable due to the
> > performance gains. I strongly suspect that we have a considerable
> > amount of table scans going on, but the problem is that people here
> > don't understand indexing (yet) or have the time (probably because it's
> > all taken up using the poorly designed system). That's a whole seperate
> > issue for me to address.
> >
> > So, finally getting round to my questions...
> >
> > Is there any good reference explaining in Layman's terms why you need
> > basic (or advanced) indexing ? Any links would be appreciated. I need
> > this to help explain to colleagues why a disk space increase and
> > indexing will be far better than spending thousands on a new box and
> > doing the same (a common problem I suspect).
> >
> > How can I accurately estimate the amount of time taken to update an
> > index once data is bulk inserted. I don't want to re-index from scratch
> > as this may take too long. Indexing my database first time round takes
> > about 1 hour 30 minutes.
> >
> > It's all part of an ongoing bit of digging into the system and re-doing
> > it to make it work properly. I'm sure most of you will have been there
> > at some point or another.
> >
> > Thanks
> >
> >
> > Ryan[/color][/color]

**francois.bourdages@harfan.com** · Jun 13 '06, 12:45 PM

Re: Indexing strategy

A good rule of thumb :
1) index all FK.

If you have time and money :
Analyse all your query, and you will be able to get advise on all the
index you need.

If you dont have time or money :
Start working on long query first, and optimize them. You will get big
result fast.

90% of all performance probleme i have seen in the past are from query.
(query itself or logic in the design) . Not network, not RAM, or CPU,
not disk.
So start there.

**Erland Sommarskog** · Jun 13 '06, 09:55 PM

Re: Indexing strategy

Ryan (ryanofford@hot mail.com) writes:[color=blue]
> ALTER TABLE [dbo].[DEALER_SOURCE_D ATA_VALUES] WITH NOCHECK ADD
> CONSTRAINT [PK_DEALER_SOURC E_DATA_VALUES] PRIMARY KEY CLUSTERED
> (
> [DEALER_SOURCE_D ATA_ID],
> [FIELD_CODE]
> ) WITH FILLFACTOR = 90 ON [PRIMARY]
> GO
>
> This table has approx 25 million rows. I would have added indexes to
> Dealer_Source_D ata_Id and Field Code seperately. This table holds the
> data ID which can be related back to a row in another table for the
> year, month and Dealer ID. It also holds all lines and the value
> against those lines. Quite often queries would be needed for everything
> by the Dealer_Source_D ata_Id hence the first index I would add. Also,
> we want to sum (for example) all of a specific field_code, hence the
> other index I would add.
>
> We would normally have approx 4,000 unique Field_Codes per
> Dealer_Source_D ata_Id.
>
> The PK is correct, but not good for querying what we need to as it
> would only be used when we specify both Dealer_Source_D ata_Id and
> Field_Code in order to return a value. Correct ?[/color]

Not really. A query only by DEALER_SOURCE_D ATA_ID would use the
clustered index. Unless you do queries like:

SELECT COUNT(*), DEALER_SOURCE_D ATA_ID
FROM tbl
GROUP BY DEALER_SOURCE_D ATA_ID

there is not reason to add an NC index on DEALER_SOURCE_D ATA_ID.

But you are correct that the clustred index is not good for queries
on a certain FIELD_CODE.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server 2005 at

http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

Books Online for SQL Server 2000 at

http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

**DickChristoph** · Jun 14 '06, 01:05 AM

Re: Indexing strategy

> Is there any good reference explaining in Layman's terms why you need[color=blue]
> basic (or advanced) indexing ?[/color]

I assume you need to explain this to the people who spend the money.

Do you know how a binary search works? I usually explain the benefits of
indexing by saying. Suppose you are searching the phonebook. You aren't
going to search every entry on every page until you find the name you are
looking for. (full table scan).

You could split the book in half and determine if the name you are looking
for is in the left or the right half. Then keep doing that until you get to
the right page. You will have at worst Log base 2 of N comparisons. N being
the total number of entries. You can search 1,000,000 entries with 20
comparisons versus an average of 500,000 without an index.

Or I just say its like those little tabs in some dictionaries. (which it
isn't really) but it gets across the point that you are making less work for
SQL Server to do.

--
-Dick Christoph
"Ryan" <ryanofford@hot mail.com> wrote in message
news:1150119580 .643378.235860@ f6g2000cwb.goog legroups.com...[color=blue]
>I have a bit of a problem with regards an indexing strategy. Well,
> basically there is no indexing strategy on a set of data I have at
> work. Now, I didn't create the design as I would have allowed for this.
>
> OK, so there is primary key (clustered) indexes (mainly composite
> keys), but no other indexes on the tables. As you would expect, the
> performance leaves a lot to be desired. A hell of a lot. We have
> several million rows in a lot of the tables. None of the queries seem
> to be overly complex so we can work through the applications at a later
> stage.
>
> We have multiple copies (one per client per country) of the same
> structure (I may have considered combining these to allow better
> performance). One specific SP that I need to run takes 40+ hours
> without indexes and 5 hours with some (130) basic indexes to get us
> started on a better design. These 130 indexes are the minimum I suspect
> we need and from there, we can start to work out which ones we need.
>
> Now the test database (our worst performer) doubles in size to 20Gb,
> but the performance is much much better (as expected). The original
> thinking behind the design was for storage concerns (server space
> recently upgraded) and for performance with bulk inserts.
>
> We have a lot of bulk inserts, but I suspect that these are not too
> bad, and the time taken for indexing these is negligable due to the
> performance gains. I strongly suspect that we have a considerable
> amount of table scans going on, but the problem is that people here
> don't understand indexing (yet) or have the time (probably because it's
> all taken up using the poorly designed system). That's a whole seperate
> issue for me to address.
>
> So, finally getting round to my questions...
>
> Is there any good reference explaining in Layman's terms why you need
> basic (or advanced) indexing ? Any links would be appreciated. I need
> this to help explain to colleagues why a disk space increase and
> indexing will be far better than spending thousands on a new box and
> doing the same (a common problem I suspect).
>
> How can I accurately estimate the amount of time taken to update an
> index once data is bulk inserted. I don't want to re-index from scratch
> as this may take too long. Indexing my database first time round takes
> about 1 hour 30 minutes.
>
> It's all part of an ongoing bit of digging into the system and re-doing
> it to make it work properly. I'm sure most of you will have been there
> at some point or another.
>
> Thanks
>
>
> Ryan
>[/color]

**Mike C#** · Jun 14 '06, 10:35 PM

Re: Indexing strategy

Use the 80/20 rule to start. Usually 20% of your queries will account for
80% of your interactions with the database. Identify the 20% that are used
the most often and create indexes that optimize them. You can grab a query
plan (I prefer the graphical query plans in QA) for each query to identify
the bottlenecks and where indexes would be most effective.

"Ryan" <ryanofford@hot mail.com> wrote in message
news:1150119580 .643378.235860@ f6g2000cwb.goog legroups.com...[color=blue]
>I have a bit of a problem with regards an indexing strategy. Well,
> basically there is no indexing strategy on a set of data I have at
> work. Now, I didn't create the design as I would have allowed for this.
>
> OK, so there is primary key (clustered) indexes (mainly composite
> keys), but no other indexes on the tables. As you would expect, the
> performance leaves a lot to be desired. A hell of a lot. We have
> several million rows in a lot of the tables. None of the queries seem
> to be overly complex so we can work through the applications at a later
> stage.
>
> We have multiple copies (one per client per country) of the same
> structure (I may have considered combining these to allow better
> performance). One specific SP that I need to run takes 40+ hours
> without indexes and 5 hours with some (130) basic indexes to get us
> started on a better design. These 130 indexes are the minimum I suspect
> we need and from there, we can start to work out which ones we need.
>
> Now the test database (our worst performer) doubles in size to 20Gb,
> but the performance is much much better (as expected). The original
> thinking behind the design was for storage concerns (server space
> recently upgraded) and for performance with bulk inserts.
>
> We have a lot of bulk inserts, but I suspect that these are not too
> bad, and the time taken for indexing these is negligable due to the
> performance gains. I strongly suspect that we have a considerable
> amount of table scans going on, but the problem is that people here
> don't understand indexing (yet) or have the time (probably because it's
> all taken up using the poorly designed system). That's a whole seperate
> issue for me to address.
>
> So, finally getting round to my questions...
>
> Is there any good reference explaining in Layman's terms why you need
> basic (or advanced) indexing ? Any links would be appreciated. I need
> this to help explain to colleagues why a disk space increase and
> indexing will be far better than spending thousands on a new box and
> doing the same (a common problem I suspect).
>
> How can I accurately estimate the amount of time taken to update an
> index once data is bulk inserted. I don't want to re-index from scratch
> as this may take too long. Indexing my database first time round takes
> about 1 hour 30 minutes.
>
> It's all part of an ongoing bit of digging into the system and re-doing
> it to make it work properly. I'm sure most of you will have been there
> at some point or another.
>
> Thanks
>
>
> Ryan
>[/color]

Indexing strategy

Indexing strategy

Comment

Comment

Comment

Comment

Comment

Comment

Comment