Auto created statistics and missing statistics

**Simon Hayes** · Jul 20 '05, 03:22 AM

Re: Auto created statistics and missing statistics

"Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
news:40727f2e$0 $237$edfadb0f@d read12.news.tel e.dk...[color=blue]
> Hello group.
>
> I have an issue, which has bothered me for a while now:
>
> I'm wondering why the column statistics, which SQL Server wants me to
> create, if I turn off auto-created statistics, are so important to the
> optimizer?
>
> Example: from Northwind (with auto create stats off), I do the following:
>
> SELECT * FROM Customers WHERE Country = 'Sweden'
>
> My query plan show a clustered index scan, which is expected - no index
> exists for Country. BUT, the query plan also shows, that the optimizer is
> missing a statistic on Country, which tells me, that the optimizer would
> benefit from knowing this.
>
> I cannot see why? (and I've been trying for a while now).
>
> If I create the missing statistics, nothing happens in the query plan (and
> why should it?). I could understand it, if the optimizer suggested an[/color]
index[color=blue]
> on Country - this would make sense, but if creating the missing index,[/color]
query[color=blue]
> analyzer creates the statistics with an empty index, which seems to me to[/color]
be[color=blue]
> less than usable.
>
> I've been thinking long and hard about this, but haven't been able to[/color]
reach[color=blue]
> a conclusion :) It has some relevance to my work, because allowing the
> optimizer to create missing statistics limits my options for designing
> indexes (e.g. covering) for some rather wide tables, so I'm thinking why[/color]
not[color=blue]
> turn it off altogether. But I would like to know the consequences - hope
> somebody has already delved into this, and knows a good explanation.
>
> Rgds
> Jesper
>
>[/color]

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/statquery.asp

Simon

**Jesper Jensen** · Jul 20 '05, 03:22 AM

Re: Auto created statistics and missing statistics

Thanks, Simon, informative article, but ...

.... it doesn't really explain the stuff, that I wrote. The closest I get to
an explanation, when reading this is 'These statistics are created for
columns where the optimizer would have to estimate the approximate density
or distribution otherwise'.

I knew this, but I still do not know, why the optimizer needs to know the
density and/or distribution?? I can see no valid reason, and therefore I can
see no good reason for enabling auto-creation of stats.

What I probably looking for is a good example, where the use of an
automatically created stat saves time, cycles and IOs :)

Best Rgds - Jesper

"Simon Hayes" <sql@hayes.ch > skrev i en meddelelse
news:4072f05a$1 _2@news.bluewin .ch...[color=blue]
>
> "Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
> news:40727f2e$0 $237$edfadb0f@d read12.news.tel e.dk...[color=green]
> > Hello group.
> >
> > I have an issue, which has bothered me for a while now:
> >
> > I'm wondering why the column statistics, which SQL Server wants me to
> > create, if I turn off auto-created statistics, are so important to the
> > optimizer?
> >
> > Example: from Northwind (with auto create stats off), I do the[/color][/color]
following:[color=blue][color=green]
> >
> > SELECT * FROM Customers WHERE Country = 'Sweden'
> >
> > My query plan show a clustered index scan, which is expected - no index
> > exists for Country. BUT, the query plan also shows, that the optimizer[/color][/color]
is[color=blue][color=green]
> > missing a statistic on Country, which tells me, that the optimizer would
> > benefit from knowing this.
> >
> > I cannot see why? (and I've been trying for a while now).
> >
> > If I create the missing statistics, nothing happens in the query plan[/color][/color]
(and[color=blue][color=green]
> > why should it?). I could understand it, if the optimizer suggested an[/color]
> index[color=green]
> > on Country - this would make sense, but if creating the missing index,[/color]
> query[color=green]
> > analyzer creates the statistics with an empty index, which seems to me[/color][/color]
to[color=blue]
> be[color=green]
> > less than usable.
> >
> > I've been thinking long and hard about this, but haven't been able to[/color]
> reach[color=green]
> > a conclusion :) It has some relevance to my work, because allowing the
> > optimizer to create missing statistics limits my options for designing
> > indexes (e.g. covering) for some rather wide tables, so I'm thinking why[/color]
> not[color=green]
> > turn it off altogether. But I would like to know the consequences - hope
> > somebody has already delved into this, and knows a good explanation.
> >
> > Rgds
> > Jesper
> >
> >[/color]
>
>[/color]
http://msdn.microsoft.com/library/de.../statquery.asp[color=blue]
>
> Simon
>
>
>[/color]

**Simon Hayes** · Jul 20 '05, 03:22 AM

Re: Auto created statistics and missing statistics

"Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
news:4072f91b$0 $300$edfadb0f@d read12.news.tel e.dk...[color=blue]
> Thanks, Simon, informative article, but ...
>
> ... it doesn't really explain the stuff, that I wrote. The closest I get[/color]
to[color=blue]
> an explanation, when reading this is 'These statistics are created for
> columns where the optimizer would have to estimate the approximate density
> or distribution otherwise'.
>
> I knew this, but I still do not know, why the optimizer needs to know the
> density and/or distribution?? I can see no valid reason, and therefore I[/color]
can[color=blue]
> see no good reason for enabling auto-creation of stats.
>
> What I probably looking for is a good example, where the use of an
> automatically created stat saves time, cycles and IOs :)
>
> Best Rgds - Jesper
>[/color]

OK, here's another informative article :-)

http://www.winnetmag.com/SQLServer/Article/ArticleID/22075/22075.html

In summary, index statistics exist only for the first column in an index,
but auto-created (or manually created) statistics can exist for any column.
This gives the optimizer extra information, which might mean it chooses a
different, more efficient index for a query.

Check out the example on the second page of the article - on my system, this
reduced the logical reads required for the query from 104 to 43.

But you're correct to consider that there can be an impact on performance in
some situations:

Error - Office.com

http://support.microsoft.com/default.aspx?scid=kb;en-us;195565

Simon

**Jesper Jensen** · Jul 20 '05, 03:22 AM

Re: Auto created statistics and missing statistics

Thanks, Simon, that one did the trick.

One less mystery.

On my machine, QA tells me that the two queries (the index scan on
ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost
respectively. I would argue, that this saving is not worth the 'used up
index space'. In my professional life, I've seen tables, which are wide
enough (200+ columns) to demand, that precious index space is saved.

Basically, I think there are too many 'ifs' before an auto-created index
saves performance, but I appreciate the optimization idea behind it.

Thanks - Jesper

"Simon Hayes" <sql@hayes.ch > skrev i en meddelelse
news:4073121e_1 @news.bluewin.c h...[color=blue]
>
> "Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message
> news:4072f91b$0 $300$edfadb0f@d read12.news.tel e.dk...[color=green]
> > Thanks, Simon, informative article, but ...
> >
> > ... it doesn't really explain the stuff, that I wrote. The closest I get[/color]
> to[color=green]
> > an explanation, when reading this is 'These statistics are created for
> > columns where the optimizer would have to estimate the approximate[/color][/color]
density[color=blue][color=green]
> > or distribution otherwise'.
> >
> > I knew this, but I still do not know, why the optimizer needs to know[/color][/color]
the[color=blue][color=green]
> > density and/or distribution?? I can see no valid reason, and therefore I[/color]
> can[color=green]
> > see no good reason for enabling auto-creation of stats.
> >
> > What I probably looking for is a good example, where the use of an
> > automatically created stat saves time, cycles and IOs :)
> >
> > Best Rgds - Jesper
> >[/color]
>
> OK, here's another informative article :-)
>
> http://www.winnetmag.com/SQLServer/A...075/22075.html
>
> In summary, index statistics exist only for the first column in an index,
> but auto-created (or manually created) statistics can exist for any[/color]
column.[color=blue]
> This gives the optimizer extra information, which might mean it chooses a
> different, more efficient index for a query.
>
> Check out the example on the second page of the article - on my system,[/color]
this[color=blue]
> reduced the logical reads required for the query from 104 to 43.
>
> But you're correct to consider that there can be an impact on performance[/color]
in[color=blue]
> some situations:
>
> http://support.microsoft.com/default...b;en-us;195565
>
> Simon
>
>[/color]

**Simon Hayes** · Jul 20 '05, 03:23 AM

Re: Auto created statistics and missing statistics

"Jesper Jensen" <moellemand@pos t.tdcadsl.dk> wrote in message news:<40732877$ 0$274$edfadb0f@ dread12.news.te le.dk>...[color=blue]
> Thanks, Simon, that one did the trick.
>
> One less mystery.
>
> On my machine, QA tells me that the two queries (the index scan on
> ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost
> respectively. I would argue, that this saving is not worth the 'used up
> index space'. In my professional life, I've seen tables, which are wide
> enough (200+ columns) to demand, that precious index space is saved.
>
> Basically, I think there are too many 'ifs' before an auto-created index
> saves performance, but I appreciate the optimization idea behind it.
>
> Thanks - Jesper
>[/color]

<snip>

Well, you have to be careful about reaching conclusions based on
simple queries using small data sets. It's possible that a complex
join involving millions of rows would give a more significant
difference. To get a definite answer for your environment, you would
have to do some benchmarking, with and without statistics.

Simon

Auto created statistics and missing statistics

Auto created statistics and missing statistics

Comment

Comment

Comment

Comment

Comment