Optimizator and indexes

**Gert-Jan Strik** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Igor,

If you create the tables (and indexes) dynamically, then it should be no
problem for you to name your indexes and constraints. And if you can
name your indexes/constraints, then you can use index hints.

BTW: There is no way to force the use of a particular index without
using hints. Such option would make no sense at all.

SQL-Server uses a cost based optimizer and will minimize I/O, it will
choose indexes that will minimize query execution times.

With a hot cache, an index hint can in some situations increase
performance. Note however that the index hint can become troublesome
when the amount of data grows (or unused space increases), or when the
cache size shrinks (because other processes need the server's memory).

If the query that you are actually using is more complex then the one
below, then please post it. There may be other options...

Hope this helps,
Gert-Jan

Igor wrote:[color=blue]
>
> Is there a way to force optimizer to use indexes without hints? (some
> server setting or index type...)
> I'll give an example to clarify :
>
> I have a table with fields
> Customer_Code char(10) not null
> Invoice_Number int not null
> and an index on those fields IX_1.
>
> there are about 2,000,000 records in the table and those two fields are
> not unique, and I cant use clustered index because of the nature of the
> table (need it for something else).
>
> When I use query like :
>
> Select * from CustInv where Customer_Code=' ABC' and invoice_Number= 2
>
> depending on the ammount of data statistics computes, query is executed
> using the IX_1 or IX_1 is ignored. The documentation says that using of
> indexes is determened by the uniqueness of the data, but my tests show
> that every usage of index is faster.
> I can fool the optimizer when using query like
>
> Select * from CustInv where Customer_Code=' ABC' and invoice_Number> =2
> and invoice_number< =2
>
> but all that it does is extends the optimizer's tolerance (IX_1) is
> ignored for some queries in that form also.
>
> Because my database and indexes are dynamically created, I cannot use
> hints for indexes, so if anyone knows a way to tell the database
> something like : "If you have an index on where fields, use it ALWAYS",
> the knowledge would be gratly appretiated! :)
>
> Thanx in advance! Sorry for a longer post and maybe fuzzy explanation of
> the problem...
>
> Igor[/color]

**Erland Sommarskog** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Igor (bija@baja.com) writes:[color=blue]
> Because my database and indexes are dynamically created, I cannot use
> hints for indexes, so if anyone knows a way to tell the database
> something like : "If you have an index on where fields, use it ALWAYS",
> the knowledge would be gratly appretiated! :)[/color]

Believe me, you don't want that. Say that you have:

SELECT * FROM tbl WHERE nonclusteredind exedcol = @value

If there is a good distribution of values in nonclusteredind exedcol,
the index will be good. But say that for 30% of the rows, the value
is the one the same. The cost for using the index rather than scanning
the table, will be considerably lower than using the index, since when
using the index the same page will be accessed more than once.

It is true, though, that the optimizer is overly conservative when it
comes to using non-clustered index, and in my opinion, the threshold
when it switches to table scan is a little low.

For a table where I know my data, I don't shudder for using an index
hint. But for a dynamically created database like yours, this sounds
risky.

A better alternative might be to build non-clustered indexes which
comprises all columns you need. If that is all columns of the table,
that is in practice a second clustered index on the table. It may
not be good for update speed though.

You could also investigate indexed views.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**Igor** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Thanks for the reply.

Unfortunately, the query is as simple as I posted.
I didn't mean forcing usage of an PARTICULAR index, but telling the
optimizer if there is an index on the table with my where clause, use
it. :)
What I ment by dynamically creating indexes is that some tables index
themselves depending on the usage and data in them. The index names are
created dynamically and for me to choose one would mean scanning the
where clause and index names table...

And, if the only way to get the optimizer to actually optimize is to
turn it off, I can use DBF+NTX :))

btw, the irony is that the more complex the query, the smarter the
optimizer gets! :((

Gert-Jan Strik wrote:[color=blue]
> Igor,
>
> If you create the tables (and indexes) dynamically, then it should be no
> problem for you to name your indexes and constraints. And if you can
> name your indexes/constraints, then you can use index hints.
>
> BTW: There is no way to force the use of a particular index without
> using hints. Such option would make no sense at all.
>
> SQL-Server uses a cost based optimizer and will minimize I/O, it will
> choose indexes that will minimize query execution times.
>
> With a hot cache, an index hint can in some situations increase
> performance. Note however that the index hint can become troublesome
> when the amount of data grows (or unused space increases), or when the
> cache size shrinks (because other processes need the server's memory).
>
> If the query that you are actually using is more complex then the one
> below, then please post it. There may be other options...
>
> Hope this helps,
> Gert-Jan[/color]

**Igor** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Thanks for the reply!
I agree with you on the threshold of the optimizer. But my problem
manifests itself like this :

My table is about 2,000,000 records, with an estimated growth of about
250,000 a month (don't think a giant index would be nice :))

When a result set will contain 3 records the query time is about 20secs
(stupid thing scans the table)

When a result set will contain 50,000 records (or so the optimizer
thinks), the query time is about 5 secs!
(uses index)
When I show this to my user, I get beat up by a stick! :))

Also, the server has only 1G of ram, so it cannot cache everything (if
it could, i wouldn't have noticed this) and is a dedicated server (only
SQL2000).
Not dealing with the inteligence (stupidity?) of the optimizer, is there
a way to increase the threshold for the table scan???

database is optimized every night and is currently about 10G!

Erland Sommarskog wrote:[color=blue]
> Igor (bija@baja.com) writes:
>[color=green]
>>Because my database and indexes are dynamically created, I cannot use
>>hints for indexes, so if anyone knows a way to tell the database
>>something like : "If you have an index on where fields, use it ALWAYS",
>>the knowledge would be gratly appretiated! :)[/color]
>
>
> Believe me, you don't want that. Say that you have:
>
> SELECT * FROM tbl WHERE nonclusteredind exedcol = @value
>
> If there is a good distribution of values in nonclusteredind exedcol,
> the index will be good. But say that for 30% of the rows, the value
> is the one the same. The cost for using the index rather than scanning
> the table, will be considerably lower than using the index, since when
> using the index the same page will be accessed more than once.
>
> It is true, though, that the optimizer is overly conservative when it
> comes to using non-clustered index, and in my opinion, the threshold
> when it switches to table scan is a little low.
>
> For a table where I know my data, I don't shudder for using an index
> hint. But for a dynamically created database like yours, this sounds
> risky.
>
> A better alternative might be to build non-clustered indexes which
> comprises all columns you need. If that is all columns of the table,
> that is in practice a second clustered index on the table. It may
> not be good for update speed though.
>
> You could also investigate indexed views.
>[/color]

**Tzvika Barenholz** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Hi Igor
have you tried manually updating statistics more often?

**Igor** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Tzvika Barenholz wrote:[color=blue]
> Hi Igor
> have you tried manually updating statistics more often?
>[/color]

Yes. Database has a plan to do it every night. but optimizer decides on
the basis of the statistics which is bad in my case... :(

**Gert-Jan Strik** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Igor,

If table CustInv has a nonclustered compound index on
(Customer_Code, Invoice_Number) and you run the query

Select * from CustInv where Customer_Code=' ABC' and invoice_Number= 2

and this query (0) will only return a few rows, then I would be *very*
surprised if the optimizer did not use the nonclustered index.

However, if what you are actually doing is this (1)

EXEC MySProc 'ABC', 2

or this (2)

Declare @Customer_Code char(10)
Declare @invoice_Number int
Set @Customer_Code= 'ABC'
Set @invoice_Number =2
Select * from CustInv where Customer_Code=@ Customer_Code and
invoice_Number= @invoice_Number

then that is a different story.

Situation 1 is different because of parameter sniffing (check out Google
for more info). If you are using that, adding WITH RECOMPILE to the
stored procdure definition could be a solution.

In situation 2, it is not the statistics of the value pair ('ABC',2)
that is used, but the selectivity of the entire index, because the
variable values are not known compile time. The query optimizer will
then create a plan for the worst case scenario. If your data is evenly
distributed, then it could use the nonclustered index. However, if your
data distribution is skewed, and there is a combination with a very high
occurrence, then SQL-Server will probably choose a query plan with a
clustered index scan. Whatever query plan is chosen, it is used for all
different variable values.

Please let us know which type of query is actually causing the bad
performance: Type 0, 1 or 2.

Gert-Jan

Igor wrote:[color=blue]
>
> Thanks for the reply.
>
> Unfortunately, the query is as simple as I posted.
> I didn't mean forcing usage of an PARTICULAR index, but telling the
> optimizer if there is an index on the table with my where clause, use
> it. :)
> What I ment by dynamically creating indexes is that some tables index
> themselves depending on the usage and data in them. The index names are
> created dynamically and for me to choose one would mean scanning the
> where clause and index names table...
>
> And, if the only way to get the optimizer to actually optimize is to
> turn it off, I can use DBF+NTX :))
>
> btw, the irony is that the more complex the query, the smarter the
> optimizer gets! :((
>
> Gert-Jan Strik wrote:[color=green]
> > Igor,
> >
> > If you create the tables (and indexes) dynamically, then it should be no
> > problem for you to name your indexes and constraints. And if you can
> > name your indexes/constraints, then you can use index hints.
> >
> > BTW: There is no way to force the use of a particular index without
> > using hints. Such option would make no sense at all.
> >
> > SQL-Server uses a cost based optimizer and will minimize I/O, it will
> > choose indexes that will minimize query execution times.
> >
> > With a hot cache, an index hint can in some situations increase
> > performance. Note however that the index hint can become troublesome
> > when the amount of data grows (or unused space increases), or when the
> > cache size shrinks (because other processes need the server's memory).
> >
> > If the query that you are actually using is more complex then the one
> > below, then please post it. There may be other options...
> >
> > Hope this helps,
> > Gert-Jan[/color][/color]

**Igor** · Jul 23 '05, 07:39 AM

Re: Optimizator and indexes

Gert-Jan,
Unfortunately, query (0) is the one with a problem! :)
But, i've tried q(2) and this works fine????
Also, I've tried this on a box with 4G of ram - works perfectly (not the
speed, the plan)... If only I could find a way to increase the threshold
for I/O performance... :)

Anyway, thanks a lot for a great post - i'll play around with it some
more, but my nerves are breaking... :)

Gert-Jan Strik wrote:[color=blue]
> Igor,
>
> If table CustInv has a nonclustered compound index on
> (Customer_Code, Invoice_Number) and you run the query
>
> Select * from CustInv where Customer_Code=' ABC' and invoice_Number= 2
>
> and this query (0) will only return a few rows, then I would be *very*
> surprised if the optimizer did not use the nonclustered index.
>
> However, if what you are actually doing is this (1)
>
> EXEC MySProc 'ABC', 2
>
> or this (2)
>
> Declare @Customer_Code char(10)
> Declare @invoice_Number int
> Set @Customer_Code= 'ABC'
> Set @invoice_Number =2
> Select * from CustInv where Customer_Code=@ Customer_Code and
> invoice_Number= @invoice_Number
>
> then that is a different story.
>
> Situation 1 is different because of parameter sniffing (check out Google
> for more info). If you are using that, adding WITH RECOMPILE to the
> stored procdure definition could be a solution.
>
> In situation 2, it is not the statistics of the value pair ('ABC',2)
> that is used, but the selectivity of the entire index, because the
> variable values are not known compile time. The query optimizer will
> then create a plan for the worst case scenario. If your data is evenly
> distributed, then it could use the nonclustered index. However, if your
> data distribution is skewed, and there is a combination with a very high
> occurrence, then SQL-Server will probably choose a query plan with a
> clustered index scan. Whatever query plan is chosen, it is used for all
> different variable values.
>
> Please let us know which type of query is actually causing the bad
> performance: Type 0, 1 or 2.
>
> Gert-Jan
>
>
> Igor wrote:
>[color=green]
>>Thanks for the reply.
>>
>>Unfortunately , the query is as simple as I posted.
>>I didn't mean forcing usage of an PARTICULAR index, but telling the
>>optimizer if there is an index on the table with my where clause, use
>>it. :)
>>What I ment by dynamically creating indexes is that some tables index
>>themselves depending on the usage and data in them. The index names are
>>created dynamically and for me to choose one would mean scanning the
>>where clause and index names table...
>>
>>And, if the only way to get the optimizer to actually optimize is to
>>turn it off, I can use DBF+NTX :))
>>
>>btw, the irony is that the more complex the query, the smarter the
>>optimizer gets! :((
>>
>>Gert-Jan Strik wrote:
>>[color=darkred]
>>>Igor,
>>>
>>>If you create the tables (and indexes) dynamically, then it should be no
>>>problem for you to name your indexes and constraints. And if you can
>>>name your indexes/constraints, then you can use index hints.
>>>
>>>BTW: There is no way to force the use of a particular index without
>>>using hints. Such option would make no sense at all.
>>>
>>>SQL-Server uses a cost based optimizer and will minimize I/O, it will
>>>choose indexes that will minimize query execution times.
>>>
>>>With a hot cache, an index hint can in some situations increase
>>>performanc e. Note however that the index hint can become troublesome
>>>when the amount of data grows (or unused space increases), or when the
>>>cache size shrinks (because other processes need the server's memory).
>>>
>>>If the query that you are actually using is more complex then the one
>>>below, then please post it. There may be other options...
>>>
>>>Hope this helps,
>>>Gert-Jan[/color][/color][/color]

**Erland Sommarskog** · Jul 23 '05, 07:40 AM

Re: Optimizator and indexes

Igor (bija@baja.com) writes:[color=blue]
> Thanks for the reply!
> I agree with you on the threshold of the optimizer. But my problem
> manifests itself like this :
>
> My table is about 2,000,000 records, with an estimated growth of about
> 250,000 a month (don't think a giant index would be nice :))
>
> When a result set will contain 3 records the query time is about 20secs
> (stupid thing scans the table)
>
> When a result set will contain 50,000 records (or so the optimizer
> thinks), the query time is about 5 secs!
> (uses index)
> When I show this to my user, I get beat up by a stick! :))[/color]

Not seeing the table, its indexes and its plan, it's difficult to say
something useful.

But it is worth keeping in mind that stastistics are normally maintained
for single columns, if your query has two conditions of which each
value is fairly common, but the combination is not, the optimizer
may be confused. You can create two-column statistics, but I don't know
whether they actually have any practical importance.

Assuming that your query is dynamically generated, you could add an
index hint like:

WITH INDEX (2, 3, 4, ...)

where you simply include the index ids of the non-clustered indexes
of the table. You would have to query sysindexes to find them. (You
need to exclude statistics and hypothetical indexes.)

If that doesn't work out, you probably need to reconsider your
strategy with dynamic indexing.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

Optimizator and indexes

Optimizator and indexes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment