Auto created statistics and missing statistics

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jesper Jensen

    Auto created statistics and missing statistics

    Hello group.

    I have an issue, which has bothered me for a while now:

    I'm wondering why the column statistics, which SQL Server wants me to
    create, if I turn off auto-created statistics, are so important to the
    optimizer?

    Example: from Northwind (with auto create stats off), I do the following:

    SELECT * FROM Customers WHERE Country = 'Sweden'

    My query plan show a clustered index scan, which is expected - no index
    exists for Country. BUT, the query plan also shows, that the optimizer is
    missing a statistic on Country, which tells me, that the optimizer would
    benefit from knowing this.

    I cannot see why? (and I've been trying for a while now).

    If I create the missing statistics, nothing happens in the query plan (and
    why should it?). I could understand it, if the optimizer suggested an index
    on Country - this would make sense, but if creating the missing index, query
    analyzer creates the statistics with an empty index, which seems to me to be
    less than usable.

    I've been thinking long and hard about this, but haven't been able to reach
    a conclusion :) It has some relevance to my work, because allowing the
    optimizer to create missing statistics limits my options for designing
    indexes (e.g. covering) for some rather wide tables, so I'm thinking why not
    turn it off altogether. But I would like to know the consequences - hope
    somebody has already delved into this, and knows a good explanation.

    Rgds
    Jesper


  • Simon Hayes

    #2
    Re: Auto created statistics and missing statistics


    "Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
    news:40727f2e$0 $237$edfadb0f@d read12.news.tel e.dk...[color=blue]
    > Hello group.
    >
    > I have an issue, which has bothered me for a while now:
    >
    > I'm wondering why the column statistics, which SQL Server wants me to
    > create, if I turn off auto-created statistics, are so important to the
    > optimizer?
    >
    > Example: from Northwind (with auto create stats off), I do the following:
    >
    > SELECT * FROM Customers WHERE Country = 'Sweden'
    >
    > My query plan show a clustered index scan, which is expected - no index
    > exists for Country. BUT, the query plan also shows, that the optimizer is
    > missing a statistic on Country, which tells me, that the optimizer would
    > benefit from knowing this.
    >
    > I cannot see why? (and I've been trying for a while now).
    >
    > If I create the missing statistics, nothing happens in the query plan (and
    > why should it?). I could understand it, if the optimizer suggested an[/color]
    index[color=blue]
    > on Country - this would make sense, but if creating the missing index,[/color]
    query[color=blue]
    > analyzer creates the statistics with an empty index, which seems to me to[/color]
    be[color=blue]
    > less than usable.
    >
    > I've been thinking long and hard about this, but haven't been able to[/color]
    reach[color=blue]
    > a conclusion :) It has some relevance to my work, because allowing the
    > optimizer to create missing statistics limits my options for designing
    > indexes (e.g. covering) for some rather wide tables, so I'm thinking why[/color]
    not[color=blue]
    > turn it off altogether. But I would like to know the consequences - hope
    > somebody has already delved into this, and knows a good explanation.
    >
    > Rgds
    > Jesper
    >
    >[/color]



    Simon



    Comment

    • Jesper Jensen

      #3
      Re: Auto created statistics and missing statistics

      Thanks, Simon, informative article, but ...

      .... it doesn't really explain the stuff, that I wrote. The closest I get to
      an explanation, when reading this is 'These statistics are created for
      columns where the optimizer would have to estimate the approximate density
      or distribution otherwise'.

      I knew this, but I still do not know, why the optimizer needs to know the
      density and/or distribution?? I can see no valid reason, and therefore I can
      see no good reason for enabling auto-creation of stats.

      What I probably looking for is a good example, where the use of an
      automatically created stat saves time, cycles and IOs :)

      Best Rgds - Jesper

      "Simon Hayes" <sql@hayes.ch > skrev i en meddelelse
      news:4072f05a$1 _2@news.bluewin .ch...[color=blue]
      >
      > "Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
      > news:40727f2e$0 $237$edfadb0f@d read12.news.tel e.dk...[color=green]
      > > Hello group.
      > >
      > > I have an issue, which has bothered me for a while now:
      > >
      > > I'm wondering why the column statistics, which SQL Server wants me to
      > > create, if I turn off auto-created statistics, are so important to the
      > > optimizer?
      > >
      > > Example: from Northwind (with auto create stats off), I do the[/color][/color]
      following:[color=blue][color=green]
      > >
      > > SELECT * FROM Customers WHERE Country = 'Sweden'
      > >
      > > My query plan show a clustered index scan, which is expected - no index
      > > exists for Country. BUT, the query plan also shows, that the optimizer[/color][/color]
      is[color=blue][color=green]
      > > missing a statistic on Country, which tells me, that the optimizer would
      > > benefit from knowing this.
      > >
      > > I cannot see why? (and I've been trying for a while now).
      > >
      > > If I create the missing statistics, nothing happens in the query plan[/color][/color]
      (and[color=blue][color=green]
      > > why should it?). I could understand it, if the optimizer suggested an[/color]
      > index[color=green]
      > > on Country - this would make sense, but if creating the missing index,[/color]
      > query[color=green]
      > > analyzer creates the statistics with an empty index, which seems to me[/color][/color]
      to[color=blue]
      > be[color=green]
      > > less than usable.
      > >
      > > I've been thinking long and hard about this, but haven't been able to[/color]
      > reach[color=green]
      > > a conclusion :) It has some relevance to my work, because allowing the
      > > optimizer to create missing statistics limits my options for designing
      > > indexes (e.g. covering) for some rather wide tables, so I'm thinking why[/color]
      > not[color=green]
      > > turn it off altogether. But I would like to know the consequences - hope
      > > somebody has already delved into this, and knows a good explanation.
      > >
      > > Rgds
      > > Jesper
      > >
      > >[/color]
      >
      >[/color]
      http://msdn.microsoft.com/library/de.../statquery.asp[color=blue]
      >
      > Simon
      >
      >
      >[/color]


      Comment

      • Simon Hayes

        #4
        Re: Auto created statistics and missing statistics


        "Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
        news:4072f91b$0 $300$edfadb0f@d read12.news.tel e.dk...[color=blue]
        > Thanks, Simon, informative article, but ...
        >
        > ... it doesn't really explain the stuff, that I wrote. The closest I get[/color]
        to[color=blue]
        > an explanation, when reading this is 'These statistics are created for
        > columns where the optimizer would have to estimate the approximate density
        > or distribution otherwise'.
        >
        > I knew this, but I still do not know, why the optimizer needs to know the
        > density and/or distribution?? I can see no valid reason, and therefore I[/color]
        can[color=blue]
        > see no good reason for enabling auto-creation of stats.
        >
        > What I probably looking for is a good example, where the use of an
        > automatically created stat saves time, cycles and IOs :)
        >
        > Best Rgds - Jesper
        >[/color]

        OK, here's another informative article :-)



        In summary, index statistics exist only for the first column in an index,
        but auto-created (or manually created) statistics can exist for any column.
        This gives the optimizer extra information, which might mean it chooses a
        different, more efficient index for a query.

        Check out the example on the second page of the article - on my system, this
        reduced the logical reads required for the query from 104 to 43.

        But you're correct to consider that there can be an impact on performance in
        some situations:



        Simon


        Comment

        • Jesper Jensen

          #5
          Re: Auto created statistics and missing statistics

          Thanks, Simon, that one did the trick.

          One less mystery.

          On my machine, QA tells me that the two queries (the index scan on
          ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost
          respectively. I would argue, that this saving is not worth the 'used up
          index space'. In my professional life, I've seen tables, which are wide
          enough (200+ columns) to demand, that precious index space is saved.

          Basically, I think there are too many 'ifs' before an auto-created index
          saves performance, but I appreciate the optimization idea behind it.

          Thanks - Jesper

          "Simon Hayes" <sql@hayes.ch > skrev i en meddelelse
          news:4073121e_1 @news.bluewin.c h...[color=blue]
          >
          > "Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
          > news:4072f91b$0 $300$edfadb0f@d read12.news.tel e.dk...[color=green]
          > > Thanks, Simon, informative article, but ...
          > >
          > > ... it doesn't really explain the stuff, that I wrote. The closest I get[/color]
          > to[color=green]
          > > an explanation, when reading this is 'These statistics are created for
          > > columns where the optimizer would have to estimate the approximate[/color][/color]
          density[color=blue][color=green]
          > > or distribution otherwise'.
          > >
          > > I knew this, but I still do not know, why the optimizer needs to know[/color][/color]
          the[color=blue][color=green]
          > > density and/or distribution?? I can see no valid reason, and therefore I[/color]
          > can[color=green]
          > > see no good reason for enabling auto-creation of stats.
          > >
          > > What I probably looking for is a good example, where the use of an
          > > automatically created stat saves time, cycles and IOs :)
          > >
          > > Best Rgds - Jesper
          > >[/color]
          >
          > OK, here's another informative article :-)
          >
          > http://www.winnetmag.com/SQLServer/A...075/22075.html
          >
          > In summary, index statistics exist only for the first column in an index,
          > but auto-created (or manually created) statistics can exist for any[/color]
          column.[color=blue]
          > This gives the optimizer extra information, which might mean it chooses a
          > different, more efficient index for a query.
          >
          > Check out the example on the second page of the article - on my system,[/color]
          this[color=blue]
          > reduced the logical reads required for the query from 104 to 43.
          >
          > But you're correct to consider that there can be an impact on performance[/color]
          in[color=blue]
          > some situations:
          >
          > http://support.microsoft.com/default...b;en-us;195565
          >
          > Simon
          >
          >[/color]


          Comment

          • Simon Hayes

            #6
            Re: Auto created statistics and missing statistics

            "Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message news:<40732877$ 0$274$edfadb0f@ dread12.news.te le.dk>...[color=blue]
            > Thanks, Simon, that one did the trick.
            >
            > One less mystery.
            >
            > On my machine, QA tells me that the two queries (the index scan on
            > ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost
            > respectively. I would argue, that this saving is not worth the 'used up
            > index space'. In my professional life, I've seen tables, which are wide
            > enough (200+ columns) to demand, that precious index space is saved.
            >
            > Basically, I think there are too many 'ifs' before an auto-created index
            > saves performance, but I appreciate the optimization idea behind it.
            >
            > Thanks - Jesper
            >[/color]

            <snip>

            Well, you have to be careful about reaching conclusions based on
            simple queries using small data sets. It's possible that a complex
            join involving millions of rows would give a more significant
            difference. To get a definite answer for your environment, you would
            have to do some benchmarking, with and without statistics.

            Simon

            Comment

            Working...