Very slow distinct select

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rich

    Very slow distinct select

    My table looks like this:
    char(150) HTTP_REF,
    char(250) HTTP_USER,
    char(150) REMOTE_ADDR,
    char(150) REMOTE_HOST,
    char(150) URL,
    smalldatetime TIME_STAMP

    There are no indexes on this table and there are only 293,658 records total.

    When I do a select like this it takes forever:

    SELECT COUNT(DISTINCT REMOTE_ADDR)

    Takes 2 minutes. Is there anyway to speed that up?

    Thanks


  • Dan Guzman

    #2
    Re: Very slow distinct select

    You can speed up this query with a non-clustered index on REMOTE_ADDR.

    --
    Hope this helps.

    Dan Guzman
    SQL Server MVP

    "Rich" <no@spam.invali d> wrote in message
    news:DbfFe.4879 2$4o.35562@fed1 read06...[color=blue]
    > My table looks like this:
    > char(150) HTTP_REF,
    > char(250) HTTP_USER,
    > char(150) REMOTE_ADDR,
    > char(150) REMOTE_HOST,
    > char(150) URL,
    > smalldatetime TIME_STAMP
    >
    > There are no indexes on this table and there are only 293,658 records
    > total.
    >
    > When I do a select like this it takes forever:
    >
    > SELECT COUNT(DISTINCT REMOTE_ADDR)
    >
    > Takes 2 minutes. Is there anyway to speed that up?
    >
    > Thanks
    >
    >[/color]


    Comment

    • Rich

      #3
      Re: Very slow distinct select

      REMOTE_ADDR has duplicate values. So I added a numeric identity column and
      did this:

      CREATE UNIQUE NONCLUSTERED INDEX Requests_Remote Addr ON Requests
      (REMOTE_ADDR, ID)

      But that didn't seem to help the speed any.

      Any other ideas?

      Thanks.


      "Dan Guzman" <guzmanda@nospa m-online.sbcgloba l.net> wrote in message
      news:09gFe.2034 $PC7.1092@newss vr29.news.prodi gy.net...[color=blue]
      > You can speed up this query with a non-clustered index on REMOTE_ADDR.
      >
      > --
      > Hope this helps.
      >
      > Dan Guzman
      > SQL Server MVP
      >
      > "Rich" <no@spam.invali d> wrote in message
      > news:DbfFe.4879 2$4o.35562@fed1 read06...[color=green]
      > > My table looks like this:
      > > char(150) HTTP_REF,
      > > char(250) HTTP_USER,
      > > char(150) REMOTE_ADDR,
      > > char(150) REMOTE_HOST,
      > > char(150) URL,
      > > smalldatetime TIME_STAMP
      > >
      > > There are no indexes on this table and there are only 293,658 records
      > > total.
      > >
      > > When I do a select like this it takes forever:
      > >
      > > SELECT COUNT(DISTINCT REMOTE_ADDR)
      > >
      > > Takes 2 minutes. Is there anyway to speed that up?
      > >
      > > Thanks
      > >
      > >[/color]
      >
      >[/color]


      Comment

      • --CELKO--

        #4
        Re: Very slow distinct select

        Please post DDL, so that people do not have to guess what the keys,
        constraints, Declarative Referential Integrity, data types, etc. in
        your schema are. Sample data is also a good idea, along with clear
        specifications. It is very hard to debug code when you do not let us
        see it.

        You did not so much as tell us the **name* of this table! There are
        not indexes?

        Does your boss -- who is paying you, unlike us who are doing your work
        for you for free -- give sooooooo little to work with?

        Comment

        • Rich

          #5
          Re: Very slow distinct select

          What the heck is DDL???
          Didn't I give enough info in my two posts?
          And what does the name matter?


          "--CELKO--" <jcelko212@eart hlink.net> wrote in message
          news:1122342953 .732318.259740@ g44g2000cwa.goo glegroups.com.. .[color=blue]
          > Please post DDL, so that people do not have to guess what the keys,
          > constraints, Declarative Referential Integrity, data types, etc. in
          > your schema are. Sample data is also a good idea, along with clear
          > specifications. It is very hard to debug code when you do not let us
          > see it.
          >
          > You did not so much as tell us the **name* of this table! There are
          > not indexes?
          >
          > Does your boss -- who is paying you, unlike us who are doing your work
          > for you for free -- give sooooooo little to work with?
          >[/color]


          Comment

          • Dan Guzman

            #6
            Re: Very slow distinct select

            > REMOTE_ADDR has duplicate values. So I added a numeric identity column

            A non-unique index will optimize your COUNT(DISTINCT REMOTE_ADDR). No need
            to add a unique-ifier identity column.

            How many distinct values? If it is fairly high, a scan of many index leaf
            nodes is needed and this might not perform much better than a table scan.
            In that case, you can create a view with the aggregation and then create an
            index on that view. For Example

            CREATE VIEW Requests_Remote Addr
            WITH SCHEMABINDING AS
            SELECT REMOTE_ADDR, COUNT_BIG(*) AS CoungBig
            FROM dbo.Requests
            GROUP BY REMOTE_ADDR
            GO

            CREATE UNIQUE CLUSTERED INDEX Requests_Remote Addr_Index ON
            WebLog(REMOTE_A DDR)
            GO

            Your query can automatically use the view index when accessing the base
            table if you are using SQL 2000 Enterprise Edition. If you are using
            another edition, you'll need to access the view and add the NOEXPAND hint:

            SELECT COUNT(DISTINCT REMOTE_ADDR)
            FROM dbo.Results_Vie w WITH (NOEXPAND)

            --
            Hope this helps.

            Dan Guzman
            SQL Server MVP

            "Rich" <no@spam.invali d> wrote in message
            news:IBgFe.4880 0$4o.29531@fed1 read06...[color=blue]
            > REMOTE_ADDR has duplicate values. So I added a numeric identity column and
            > did this:
            >
            > CREATE UNIQUE NONCLUSTERED INDEX Requests_Remote Addr ON Requests
            > (REMOTE_ADDR, ID)
            >
            > But that didn't seem to help the speed any.
            >
            > Any other ideas?
            >
            > Thanks.
            >
            >
            > "Dan Guzman" <guzmanda@nospa m-online.sbcgloba l.net> wrote in message
            > news:09gFe.2034 $PC7.1092@newss vr29.news.prodi gy.net...[color=green]
            >> You can speed up this query with a non-clustered index on REMOTE_ADDR.
            >>
            >> --
            >> Hope this helps.
            >>
            >> Dan Guzman
            >> SQL Server MVP
            >>
            >> "Rich" <no@spam.invali d> wrote in message
            >> news:DbfFe.4879 2$4o.35562@fed1 read06...[color=darkred]
            >> > My table looks like this:
            >> > char(150) HTTP_REF,
            >> > char(250) HTTP_USER,
            >> > char(150) REMOTE_ADDR,
            >> > char(150) REMOTE_HOST,
            >> > char(150) URL,
            >> > smalldatetime TIME_STAMP
            >> >
            >> > There are no indexes on this table and there are only 293,658 records
            >> > total.
            >> >
            >> > When I do a select like this it takes forever:
            >> >
            >> > SELECT COUNT(DISTINCT REMOTE_ADDR)
            >> >
            >> > Takes 2 minutes. Is there anyway to speed that up?
            >> >
            >> > Thanks
            >> >
            >> >[/color]
            >>
            >>[/color]
            >
            >[/color]


            Comment

            • Erland Sommarskog

              #7
              Re: Very slow distinct select

              Rich (no@spam.invali d) writes:[color=blue]
              > What the heck is DDL???
              > Didn't I give enough info in my two posts?
              > And what does the name matter?[/color]

              DDL = Data Definition Laguage. That is CREATE TABLE etc. Which it appears
              that you did include.

              Too many SQL savvies just throw this funny TLAs(*) around them, and
              don't give a damn if people they pretend to help understand them or not.


              (*) TLA = Three-letter abbriviation.

              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server SP3 at
              Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

              Comment

              • Erland Sommarskog

                #8
                Re: Very slow distinct select

                Rich (no@spam.invali d) writes:[color=blue]
                > My table looks like this:
                > char(150) HTTP_REF,
                > char(250) HTTP_USER,
                > char(150) REMOTE_ADDR,
                > char(150) REMOTE_HOST,
                > char(150) URL,
                > smalldatetime TIME_STAMP
                >
                > There are no indexes on this table and there are only 293,658 records
                > total.
                >
                > When I do a select like this it takes forever:
                >
                > SELECT COUNT(DISTINCT REMOTE_ADDR)
                >
                > Takes 2 minutes. Is there anyway to speed that up?[/color]

                Assuming that these columns are not really of fixed length, there is
                quite something to win by changing char to varchar. char is fixed
                length. Your row size is 854 bytes, which means that you can get at
                most 9 rows per page. Thus your table requires 32628 pages @ 8192
                bytes for a total of 267 MB. Which is a small table.

                Say that the average length of all columns is 40 bytes. Then your average
                row size is 5*(40+2) + 4 =214, giving 37 rows per page. This reduces the
                size of the table to 65 MB.

                The smaller the pages, the fewer pages to read, and thus the shorter the
                response time.

                Adding a non-clustered index as Dan suggested is of course even better
                because then the table you traverse is only the index. And if you
                use varchar instead that index is even smaller.

                Note that I and Dan assume that your query really is

                SELECT COUNT(DISTINCT REMOTE_ADDR) FROM tbl

                and nothing else. If there is a WHERE clause involved that refers to
                other columns, then the index is not that useful.


                --
                Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                Books Online for SQL Server SP3 at
                Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                Comment

                • Rich

                  #9
                  Re: Very slow distinct select

                  Thanks Dan and Erland... it has gone from 2 minutes all the way down to 1
                  second!
                  But in the process, I accidentally truncated a lot of URLs to 50. OH WELL.
                  At least its faster :) Thanks.


                  "Rich" <no@spam.invali d> wrote in message
                  news:DbfFe.4879 2$4o.35562@fed1 read06...[color=blue]
                  > My table looks like this:
                  > char(150) HTTP_REF,
                  > char(250) HTTP_USER,
                  > char(150) REMOTE_ADDR,
                  > char(150) REMOTE_HOST,
                  > char(150) URL,
                  > smalldatetime TIME_STAMP
                  >
                  > There are no indexes on this table and there are only 293,658 records[/color]
                  total.[color=blue]
                  >
                  > When I do a select like this it takes forever:
                  >
                  > SELECT COUNT(DISTINCT REMOTE_ADDR)
                  >
                  > Takes 2 minutes. Is there anyway to speed that up?
                  >
                  > Thanks
                  >
                  >[/color]


                  Comment

                  Working...