Very slow distinct select

**Dan Guzman** · Jul 26 '05, 01:25 AM

Re: Very slow distinct select

You can speed up this query with a non-clustered index on REMOTE_ADDR.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Rich" <no@spam.invali d> wrote in message
news:DbfFe.4879 2$4o.35562@fed1 read06...[color=blue]
> My table looks like this:
> char(150) HTTP_REF,
> char(250) HTTP_USER,
> char(150) REMOTE_ADDR,
> char(150) REMOTE_HOST,
> char(150) URL,
> smalldatetime TIME_STAMP
>
> There are no indexes on this table and there are only 293,658 records
> total.
>
> When I do a select like this it takes forever:
>
> SELECT COUNT(DISTINCT REMOTE_ADDR)
>
> Takes 2 minutes. Is there anyway to speed that up?
>
> Thanks
>
>[/color]

**Rich** · Jul 26 '05, 01:55 AM

Re: Very slow distinct select

REMOTE_ADDR has duplicate values. So I added a numeric identity column and
did this:

CREATE UNIQUE NONCLUSTERED INDEX Requests_Remote Addr ON Requests
(REMOTE_ADDR, ID)

But that didn't seem to help the speed any.

Any other ideas?

Thanks.

"Dan Guzman" <guzmanda@nospa m-online.sbcgloba l.net> wrote in message
news:09gFe.2034 $PC7.1092@newss vr29.news.prodi gy.net...[color=blue]
> You can speed up this query with a non-clustered index on REMOTE_ADDR.
>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVP
>
> "Rich" <no@spam.invali d> wrote in message
> news:DbfFe.4879 2$4o.35562@fed1 read06...[color=green]
> > My table looks like this:
> > char(150) HTTP_REF,
> > char(250) HTTP_USER,
> > char(150) REMOTE_ADDR,
> > char(150) REMOTE_HOST,
> > char(150) URL,
> > smalldatetime TIME_STAMP
> >
> > There are no indexes on this table and there are only 293,658 records
> > total.
> >
> > When I do a select like this it takes forever:
> >
> > SELECT COUNT(DISTINCT REMOTE_ADDR)
> >
> > Takes 2 minutes. Is there anyway to speed that up?
> >
> > Thanks
> >
> >[/color]
>
>[/color]

**--CELKO--** · Jul 26 '05, 02:05 AM

Re: Very slow distinct select

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications. It is very hard to debug code when you do not let us
see it.

You did not so much as tell us the **name* of this table! There are
not indexes?

Does your boss -- who is paying you, unlike us who are doing your work
for you for free -- give sooooooo little to work with?

**Rich** · Jul 26 '05, 02:15 AM

Re: Very slow distinct select

What the heck is DDL???
Didn't I give enough info in my two posts?
And what does the name matter?

"--CELKO--" <jcelko212@eart hlink.net> wrote in message
news:1122342953 .732318.259740@ g44g2000cwa.goo glegroups.com.. .[color=blue]
> Please post DDL, so that people do not have to guess what the keys,
> constraints, Declarative Referential Integrity, data types, etc. in
> your schema are. Sample data is also a good idea, along with clear
> specifications. It is very hard to debug code when you do not let us
> see it.
>
> You did not so much as tell us the **name* of this table! There are
> not indexes?
>
> Does your boss -- who is paying you, unlike us who are doing your work
> for you for free -- give sooooooo little to work with?
>[/color]

**Dan Guzman** · Jul 26 '05, 03:15 AM

Re: Very slow distinct select

> REMOTE_ADDR has duplicate values. So I added a numeric identity column

A non-unique index will optimize your COUNT(DISTINCT REMOTE_ADDR). No need
to add a unique-ifier identity column.

How many distinct values? If it is fairly high, a scan of many index leaf
nodes is needed and this might not perform much better than a table scan.
In that case, you can create a view with the aggregation and then create an
index on that view. For Example

CREATE VIEW Requests_Remote Addr
WITH SCHEMABINDING AS
SELECT REMOTE_ADDR, COUNT_BIG(*) AS CoungBig
FROM dbo.Requests
GROUP BY REMOTE_ADDR
GO

CREATE UNIQUE CLUSTERED INDEX Requests_Remote Addr_Index ON
WebLog(REMOTE_A DDR)
GO

Your query can automatically use the view index when accessing the base
table if you are using SQL 2000 Enterprise Edition. If you are using
another edition, you'll need to access the view and add the NOEXPAND hint:

SELECT COUNT(DISTINCT REMOTE_ADDR)
FROM dbo.Results_Vie w WITH (NOEXPAND)

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Rich" <no@spam.invali d> wrote in message
news:IBgFe.4880 0$4o.29531@fed1 read06...[color=blue]
> REMOTE_ADDR has duplicate values. So I added a numeric identity column and
> did this:
>
> CREATE UNIQUE NONCLUSTERED INDEX Requests_Remote Addr ON Requests
> (REMOTE_ADDR, ID)
>
> But that didn't seem to help the speed any.
>
> Any other ideas?
>
> Thanks.
>
>
> "Dan Guzman" <guzmanda@nospa m-online.sbcgloba l.net> wrote in message
> news:09gFe.2034 $PC7.1092@newss vr29.news.prodi gy.net...[color=green]
>> You can speed up this query with a non-clustered index on REMOTE_ADDR.
>>
>> --
>> Hope this helps.
>>
>> Dan Guzman
>> SQL Server MVP
>>
>> "Rich" <no@spam.invali d> wrote in message
>> news:DbfFe.4879 2$4o.35562@fed1 read06...[color=darkred]
>> > My table looks like this:
>> > char(150) HTTP_REF,
>> > char(250) HTTP_USER,
>> > char(150) REMOTE_ADDR,
>> > char(150) REMOTE_HOST,
>> > char(150) URL,
>> > smalldatetime TIME_STAMP
>> >
>> > There are no indexes on this table and there are only 293,658 records
>> > total.
>> >
>> > When I do a select like this it takes forever:
>> >
>> > SELECT COUNT(DISTINCT REMOTE_ADDR)
>> >
>> > Takes 2 minutes. Is there anyway to speed that up?
>> >
>> > Thanks
>> >
>> >[/color]
>>
>>[/color]
>
>[/color]

**Erland Sommarskog** · Jul 26 '05, 08:55 AM

Re: Very slow distinct select

Rich (no@spam.invali d) writes:[color=blue]
> What the heck is DDL???
> Didn't I give enough info in my two posts?
> And what does the name matter?[/color]

DDL = Data Definition Laguage. That is CREATE TABLE etc. Which it appears
that you did include.

Too many SQL savvies just throw this funny TLAs(*) around them, and
don't give a damn if people they pretend to help understand them or not.

(*) TLA = Three-letter abbriviation.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**Erland Sommarskog** · Jul 26 '05, 09:05 AM

Re: Very slow distinct select

Rich (no@spam.invali d) writes:[color=blue]
> My table looks like this:
> char(150) HTTP_REF,
> char(250) HTTP_USER,
> char(150) REMOTE_ADDR,
> char(150) REMOTE_HOST,
> char(150) URL,
> smalldatetime TIME_STAMP
>
> There are no indexes on this table and there are only 293,658 records
> total.
>
> When I do a select like this it takes forever:
>
> SELECT COUNT(DISTINCT REMOTE_ADDR)
>
> Takes 2 minutes. Is there anyway to speed that up?[/color]

Assuming that these columns are not really of fixed length, there is
quite something to win by changing char to varchar. char is fixed
length. Your row size is 854 bytes, which means that you can get at
most 9 rows per page. Thus your table requires 32628 pages @ 8192
bytes for a total of 267 MB. Which is a small table.

Say that the average length of all columns is 40 bytes. Then your average
row size is 5*(40+2) + 4 =214, giving 37 rows per page. This reduces the
size of the table to 65 MB.

The smaller the pages, the fewer pages to read, and thus the shorter the
response time.

Adding a non-clustered index as Dan suggested is of course even better
because then the table you traverse is only the index. And if you
use varchar instead that index is even smaller.

Note that I and Dan assume that your query really is

SELECT COUNT(DISTINCT REMOTE_ADDR) FROM tbl

and nothing else. If there is a WHERE clause involved that refers to
other columns, then the index is not that useful.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**Rich** · Jul 26 '05, 10:45 PM

Re: Very slow distinct select

Thanks Dan and Erland... it has gone from 2 minutes all the way down to 1
second!
But in the process, I accidentally truncated a lot of URLs to 50. OH WELL.
At least its faster :) Thanks.

"Rich" <no@spam.invali d> wrote in message
news:DbfFe.4879 2$4o.35562@fed1 read06...[color=blue]
> My table looks like this:
> char(150) HTTP_REF,
> char(250) HTTP_USER,
> char(150) REMOTE_ADDR,
> char(150) REMOTE_HOST,
> char(150) URL,
> smalldatetime TIME_STAMP
>
> There are no indexes on this table and there are only 293,658 records[/color]
total.[color=blue]
>
> When I do a select like this it takes forever:
>
> SELECT COUNT(DISTINCT REMOTE_ADDR)
>
> Takes 2 minutes. Is there anyway to speed that up?
>
> Thanks
>
>[/color]

Very slow distinct select

Very slow distinct select

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment