Optimizator and indexes

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Igor

    Optimizator and indexes

    Is there a way to force optimizer to use indexes without hints? (some
    server setting or index type...)
    I'll give an example to clarify :

    I have a table with fields
    Customer_Code char(10) not null
    Invoice_Number int not null
    and an index on those fields IX_1.

    there are about 2,000,000 records in the table and those two fields are
    not unique, and I cant use clustered index because of the nature of the
    table (need it for something else).

    When I use query like :

    Select * from CustInv where Customer_Code=' ABC' and invoice_Number= 2

    depending on the ammount of data statistics computes, query is executed
    using the IX_1 or IX_1 is ignored. The documentation says that using of
    indexes is determened by the uniqueness of the data, but my tests show
    that every usage of index is faster.
    I can fool the optimizer when using query like

    Select * from CustInv where Customer_Code=' ABC' and invoice_Number> =2
    and invoice_number< =2

    but all that it does is extends the optimizer's tolerance (IX_1) is
    ignored for some queries in that form also.

    Because my database and indexes are dynamically created, I cannot use
    hints for indexes, so if anyone knows a way to tell the database
    something like : "If you have an index on where fields, use it ALWAYS",
    the knowledge would be gratly appretiated! :)

    Thanx in advance! Sorry for a longer post and maybe fuzzy explanation of
    the problem...

    Igor
  • Gert-Jan Strik

    #2
    Re: Optimizator and indexes

    Igor,

    If you create the tables (and indexes) dynamically, then it should be no
    problem for you to name your indexes and constraints. And if you can
    name your indexes/constraints, then you can use index hints.

    BTW: There is no way to force the use of a particular index without
    using hints. Such option would make no sense at all.

    SQL-Server uses a cost based optimizer and will minimize I/O, it will
    choose indexes that will minimize query execution times.

    With a hot cache, an index hint can in some situations increase
    performance. Note however that the index hint can become troublesome
    when the amount of data grows (or unused space increases), or when the
    cache size shrinks (because other processes need the server's memory).

    If the query that you are actually using is more complex then the one
    below, then please post it. There may be other options...

    Hope this helps,
    Gert-Jan


    Igor wrote:[color=blue]
    >
    > Is there a way to force optimizer to use indexes without hints? (some
    > server setting or index type...)
    > I'll give an example to clarify :
    >
    > I have a table with fields
    > Customer_Code char(10) not null
    > Invoice_Number int not null
    > and an index on those fields IX_1.
    >
    > there are about 2,000,000 records in the table and those two fields are
    > not unique, and I cant use clustered index because of the nature of the
    > table (need it for something else).
    >
    > When I use query like :
    >
    > Select * from CustInv where Customer_Code=' ABC' and invoice_Number= 2
    >
    > depending on the ammount of data statistics computes, query is executed
    > using the IX_1 or IX_1 is ignored. The documentation says that using of
    > indexes is determened by the uniqueness of the data, but my tests show
    > that every usage of index is faster.
    > I can fool the optimizer when using query like
    >
    > Select * from CustInv where Customer_Code=' ABC' and invoice_Number> =2
    > and invoice_number< =2
    >
    > but all that it does is extends the optimizer's tolerance (IX_1) is
    > ignored for some queries in that form also.
    >
    > Because my database and indexes are dynamically created, I cannot use
    > hints for indexes, so if anyone knows a way to tell the database
    > something like : "If you have an index on where fields, use it ALWAYS",
    > the knowledge would be gratly appretiated! :)
    >
    > Thanx in advance! Sorry for a longer post and maybe fuzzy explanation of
    > the problem...
    >
    > Igor[/color]

    Comment

    • Erland Sommarskog

      #3
      Re: Optimizator and indexes

      Igor (bija@baja.com) writes:[color=blue]
      > Because my database and indexes are dynamically created, I cannot use
      > hints for indexes, so if anyone knows a way to tell the database
      > something like : "If you have an index on where fields, use it ALWAYS",
      > the knowledge would be gratly appretiated! :)[/color]

      Believe me, you don't want that. Say that you have:

      SELECT * FROM tbl WHERE nonclusteredind exedcol = @value

      If there is a good distribution of values in nonclusteredind exedcol,
      the index will be good. But say that for 30% of the rows, the value
      is the one the same. The cost for using the index rather than scanning
      the table, will be considerably lower than using the index, since when
      using the index the same page will be accessed more than once.

      It is true, though, that the optimizer is overly conservative when it
      comes to using non-clustered index, and in my opinion, the threshold
      when it switches to table scan is a little low.

      For a table where I know my data, I don't shudder for using an index
      hint. But for a dynamically created database like yours, this sounds
      risky.

      A better alternative might be to build non-clustered indexes which
      comprises all columns you need. If that is all columns of the table,
      that is in practice a second clustered index on the table. It may
      not be good for update speed though.

      You could also investigate indexed views.

      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server SP3 at
      Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

      Comment

      • Igor

        #4
        Re: Optimizator and indexes

        Thanks for the reply.

        Unfortunately, the query is as simple as I posted.
        I didn't mean forcing usage of an PARTICULAR index, but telling the
        optimizer if there is an index on the table with my where clause, use
        it. :)
        What I ment by dynamically creating indexes is that some tables index
        themselves depending on the usage and data in them. The index names are
        created dynamically and for me to choose one would mean scanning the
        where clause and index names table...

        And, if the only way to get the optimizer to actually optimize is to
        turn it off, I can use DBF+NTX :))

        btw, the irony is that the more complex the query, the smarter the
        optimizer gets! :((


        Gert-Jan Strik wrote:[color=blue]
        > Igor,
        >
        > If you create the tables (and indexes) dynamically, then it should be no
        > problem for you to name your indexes and constraints. And if you can
        > name your indexes/constraints, then you can use index hints.
        >
        > BTW: There is no way to force the use of a particular index without
        > using hints. Such option would make no sense at all.
        >
        > SQL-Server uses a cost based optimizer and will minimize I/O, it will
        > choose indexes that will minimize query execution times.
        >
        > With a hot cache, an index hint can in some situations increase
        > performance. Note however that the index hint can become troublesome
        > when the amount of data grows (or unused space increases), or when the
        > cache size shrinks (because other processes need the server's memory).
        >
        > If the query that you are actually using is more complex then the one
        > below, then please post it. There may be other options...
        >
        > Hope this helps,
        > Gert-Jan[/color]

        Comment

        • Igor

          #5
          Re: Optimizator and indexes

          Thanks for the reply!
          I agree with you on the threshold of the optimizer. But my problem
          manifests itself like this :

          My table is about 2,000,000 records, with an estimated growth of about
          250,000 a month (don't think a giant index would be nice :))

          When a result set will contain 3 records the query time is about 20secs
          (stupid thing scans the table)

          When a result set will contain 50,000 records (or so the optimizer
          thinks), the query time is about 5 secs!
          (uses index)
          When I show this to my user, I get beat up by a stick! :))

          Also, the server has only 1G of ram, so it cannot cache everything (if
          it could, i wouldn't have noticed this) and is a dedicated server (only
          SQL2000).
          Not dealing with the inteligence (stupidity?) of the optimizer, is there
          a way to increase the threshold for the table scan???

          database is optimized every night and is currently about 10G!

          Erland Sommarskog wrote:[color=blue]
          > Igor (bija@baja.com) writes:
          >[color=green]
          >>Because my database and indexes are dynamically created, I cannot use
          >>hints for indexes, so if anyone knows a way to tell the database
          >>something like : "If you have an index on where fields, use it ALWAYS",
          >>the knowledge would be gratly appretiated! :)[/color]
          >
          >
          > Believe me, you don't want that. Say that you have:
          >
          > SELECT * FROM tbl WHERE nonclusteredind exedcol = @value
          >
          > If there is a good distribution of values in nonclusteredind exedcol,
          > the index will be good. But say that for 30% of the rows, the value
          > is the one the same. The cost for using the index rather than scanning
          > the table, will be considerably lower than using the index, since when
          > using the index the same page will be accessed more than once.
          >
          > It is true, though, that the optimizer is overly conservative when it
          > comes to using non-clustered index, and in my opinion, the threshold
          > when it switches to table scan is a little low.
          >
          > For a table where I know my data, I don't shudder for using an index
          > hint. But for a dynamically created database like yours, this sounds
          > risky.
          >
          > A better alternative might be to build non-clustered indexes which
          > comprises all columns you need. If that is all columns of the table,
          > that is in practice a second clustered index on the table. It may
          > not be good for update speed though.
          >
          > You could also investigate indexed views.
          >[/color]

          Comment

          • Tzvika Barenholz

            #6
            Re: Optimizator and indexes

            Hi Igor
            have you tried manually updating statistics more often?

            Comment

            • Igor

              #7
              Re: Optimizator and indexes

              Tzvika Barenholz wrote:[color=blue]
              > Hi Igor
              > have you tried manually updating statistics more often?
              >[/color]

              Yes. Database has a plan to do it every night. but optimizer decides on
              the basis of the statistics which is bad in my case... :(

              Comment

              • Gert-Jan Strik

                #8
                Re: Optimizator and indexes

                Igor,

                If table CustInv has a nonclustered compound index on
                (Customer_Code, Invoice_Number) and you run the query

                Select * from CustInv where Customer_Code=' ABC' and invoice_Number= 2

                and this query (0) will only return a few rows, then I would be *very*
                surprised if the optimizer did not use the nonclustered index.

                However, if what you are actually doing is this (1)

                EXEC MySProc 'ABC', 2

                or this (2)

                Declare @Customer_Code char(10)
                Declare @invoice_Number int
                Set @Customer_Code= 'ABC'
                Set @invoice_Number =2
                Select * from CustInv where Customer_Code=@ Customer_Code and
                invoice_Number= @invoice_Number

                then that is a different story.

                Situation 1 is different because of parameter sniffing (check out Google
                for more info). If you are using that, adding WITH RECOMPILE to the
                stored procdure definition could be a solution.

                In situation 2, it is not the statistics of the value pair ('ABC',2)
                that is used, but the selectivity of the entire index, because the
                variable values are not known compile time. The query optimizer will
                then create a plan for the worst case scenario. If your data is evenly
                distributed, then it could use the nonclustered index. However, if your
                data distribution is skewed, and there is a combination with a very high
                occurrence, then SQL-Server will probably choose a query plan with a
                clustered index scan. Whatever query plan is chosen, it is used for all
                different variable values.

                Please let us know which type of query is actually causing the bad
                performance: Type 0, 1 or 2.

                Gert-Jan


                Igor wrote:[color=blue]
                >
                > Thanks for the reply.
                >
                > Unfortunately, the query is as simple as I posted.
                > I didn't mean forcing usage of an PARTICULAR index, but telling the
                > optimizer if there is an index on the table with my where clause, use
                > it. :)
                > What I ment by dynamically creating indexes is that some tables index
                > themselves depending on the usage and data in them. The index names are
                > created dynamically and for me to choose one would mean scanning the
                > where clause and index names table...
                >
                > And, if the only way to get the optimizer to actually optimize is to
                > turn it off, I can use DBF+NTX :))
                >
                > btw, the irony is that the more complex the query, the smarter the
                > optimizer gets! :((
                >
                > Gert-Jan Strik wrote:[color=green]
                > > Igor,
                > >
                > > If you create the tables (and indexes) dynamically, then it should be no
                > > problem for you to name your indexes and constraints. And if you can
                > > name your indexes/constraints, then you can use index hints.
                > >
                > > BTW: There is no way to force the use of a particular index without
                > > using hints. Such option would make no sense at all.
                > >
                > > SQL-Server uses a cost based optimizer and will minimize I/O, it will
                > > choose indexes that will minimize query execution times.
                > >
                > > With a hot cache, an index hint can in some situations increase
                > > performance. Note however that the index hint can become troublesome
                > > when the amount of data grows (or unused space increases), or when the
                > > cache size shrinks (because other processes need the server's memory).
                > >
                > > If the query that you are actually using is more complex then the one
                > > below, then please post it. There may be other options...
                > >
                > > Hope this helps,
                > > Gert-Jan[/color][/color]

                Comment

                • Igor

                  #9
                  Re: Optimizator and indexes

                  Gert-Jan,
                  Unfortunately, query (0) is the one with a problem! :)
                  But, i've tried q(2) and this works fine????
                  Also, I've tried this on a box with 4G of ram - works perfectly (not the
                  speed, the plan)... If only I could find a way to increase the threshold
                  for I/O performance... :)

                  Anyway, thanks a lot for a great post - i'll play around with it some
                  more, but my nerves are breaking... :)


                  Gert-Jan Strik wrote:[color=blue]
                  > Igor,
                  >
                  > If table CustInv has a nonclustered compound index on
                  > (Customer_Code, Invoice_Number) and you run the query
                  >
                  > Select * from CustInv where Customer_Code=' ABC' and invoice_Number= 2
                  >
                  > and this query (0) will only return a few rows, then I would be *very*
                  > surprised if the optimizer did not use the nonclustered index.
                  >
                  > However, if what you are actually doing is this (1)
                  >
                  > EXEC MySProc 'ABC', 2
                  >
                  > or this (2)
                  >
                  > Declare @Customer_Code char(10)
                  > Declare @invoice_Number int
                  > Set @Customer_Code= 'ABC'
                  > Set @invoice_Number =2
                  > Select * from CustInv where Customer_Code=@ Customer_Code and
                  > invoice_Number= @invoice_Number
                  >
                  > then that is a different story.
                  >
                  > Situation 1 is different because of parameter sniffing (check out Google
                  > for more info). If you are using that, adding WITH RECOMPILE to the
                  > stored procdure definition could be a solution.
                  >
                  > In situation 2, it is not the statistics of the value pair ('ABC',2)
                  > that is used, but the selectivity of the entire index, because the
                  > variable values are not known compile time. The query optimizer will
                  > then create a plan for the worst case scenario. If your data is evenly
                  > distributed, then it could use the nonclustered index. However, if your
                  > data distribution is skewed, and there is a combination with a very high
                  > occurrence, then SQL-Server will probably choose a query plan with a
                  > clustered index scan. Whatever query plan is chosen, it is used for all
                  > different variable values.
                  >
                  > Please let us know which type of query is actually causing the bad
                  > performance: Type 0, 1 or 2.
                  >
                  > Gert-Jan
                  >
                  >
                  > Igor wrote:
                  >[color=green]
                  >>Thanks for the reply.
                  >>
                  >>Unfortunately , the query is as simple as I posted.
                  >>I didn't mean forcing usage of an PARTICULAR index, but telling the
                  >>optimizer if there is an index on the table with my where clause, use
                  >>it. :)
                  >>What I ment by dynamically creating indexes is that some tables index
                  >>themselves depending on the usage and data in them. The index names are
                  >>created dynamically and for me to choose one would mean scanning the
                  >>where clause and index names table...
                  >>
                  >>And, if the only way to get the optimizer to actually optimize is to
                  >>turn it off, I can use DBF+NTX :))
                  >>
                  >>btw, the irony is that the more complex the query, the smarter the
                  >>optimizer gets! :((
                  >>
                  >>Gert-Jan Strik wrote:
                  >>[color=darkred]
                  >>>Igor,
                  >>>
                  >>>If you create the tables (and indexes) dynamically, then it should be no
                  >>>problem for you to name your indexes and constraints. And if you can
                  >>>name your indexes/constraints, then you can use index hints.
                  >>>
                  >>>BTW: There is no way to force the use of a particular index without
                  >>>using hints. Such option would make no sense at all.
                  >>>
                  >>>SQL-Server uses a cost based optimizer and will minimize I/O, it will
                  >>>choose indexes that will minimize query execution times.
                  >>>
                  >>>With a hot cache, an index hint can in some situations increase
                  >>>performanc e. Note however that the index hint can become troublesome
                  >>>when the amount of data grows (or unused space increases), or when the
                  >>>cache size shrinks (because other processes need the server's memory).
                  >>>
                  >>>If the query that you are actually using is more complex then the one
                  >>>below, then please post it. There may be other options...
                  >>>
                  >>>Hope this helps,
                  >>>Gert-Jan[/color][/color][/color]

                  Comment

                  • Erland Sommarskog

                    #10
                    Re: Optimizator and indexes

                    Igor (bija@baja.com) writes:[color=blue]
                    > Thanks for the reply!
                    > I agree with you on the threshold of the optimizer. But my problem
                    > manifests itself like this :
                    >
                    > My table is about 2,000,000 records, with an estimated growth of about
                    > 250,000 a month (don't think a giant index would be nice :))
                    >
                    > When a result set will contain 3 records the query time is about 20secs
                    > (stupid thing scans the table)
                    >
                    > When a result set will contain 50,000 records (or so the optimizer
                    > thinks), the query time is about 5 secs!
                    > (uses index)
                    > When I show this to my user, I get beat up by a stick! :))[/color]

                    Not seeing the table, its indexes and its plan, it's difficult to say
                    something useful.

                    But it is worth keeping in mind that stastistics are normally maintained
                    for single columns, if your query has two conditions of which each
                    value is fairly common, but the combination is not, the optimizer
                    may be confused. You can create two-column statistics, but I don't know
                    whether they actually have any practical importance.

                    Assuming that your query is dynamically generated, you could add an
                    index hint like:

                    WITH INDEX (2, 3, 4, ...)

                    where you simply include the index ids of the non-clustered indexes
                    of the table. You would have to query sysindexes to find them. (You
                    need to exclude statistics and hypothetical indexes.)

                    If that doesn't work out, you probably need to reconsider your
                    strategy with dynamic indexing.


                    --
                    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                    Books Online for SQL Server SP3 at
                    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                    Comment

                    Working...