why does count take so long?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Joseph Shraibman

    why does count take so long?

    On a 7.3.4 database:

    explain analyse select count(*) from elog;


    Aggregate (cost=223764.05 ..223764.05 rows=1 width=0) (actual
    time=81372.11.. 81372.11 rows=1 loops=1)
    -> Seq Scan on elog (cost=0.00..203 012.24 rows=8300724 width=0)
    (actual time=3.91..7154 2.53 rows=8313762 loops=1)
    Total runtime: 81378.42 msec
    (3 rows)


    It looks like the aggregate took 10 secs all by itself. What's taking
    so long?


    ---------------------------(end of broadcast)---------------------------
    TIP 7: don't forget to increase your free space map settings

  • Bruno Wolff III

    #2
    Re: why does count take so long?

    On Sun, Sep 07, 2003 at 21:06:26 -0400,
    Joseph Shraibman <jks@selectacas t.net> wrote:[color=blue]
    > On a 7.3.4 database:
    >
    > explain analyse select count(*) from elog;
    >
    >
    > Aggregate (cost=223764.05 ..223764.05 rows=1 width=0) (actual
    > time=81372.11.. 81372.11 rows=1 loops=1)
    > -> Seq Scan on elog (cost=0.00..203 012.24 rows=8300724 width=0)
    > (actual time=3.91..7154 2.53 rows=8313762 loops=1)
    > Total runtime: 81378.42 msec
    > (3 rows)
    >
    >
    > It looks like the aggregate took 10 secs all by itself. What's taking
    > so long?[/color]

    It looks like there are 8 million log records that need to be counted.

    There have been some discussions over the last few days about tricks to
    get better performance if you need to use count on large tables.

    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Comment

    • Tom Lane

      #3
      Re: why does count take so long?

      Bruno Wolff III <bruno@wolff.to > writes:[color=blue]
      > Joseph Shraibman <jks@selectacas t.net> wrote:[color=green]
      >> It looks like the aggregate took 10 secs all by itself. What's taking
      >> so long?[/color][/color]
      [color=blue]
      > It looks like there are 8 million log records that need to be counted.[/color]

      Yeah, but I think he's complaining about the 10sec delta for the
      aggregate on top of the 71sec to read the 8 million rows. That
      seems high to me too. On a 10-mil-row test table, I get

      regression=# explain analyze select count(*) from foo;
      QUERY PLAN
      ------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=22.50..22 .50 rows=1 width=0) (actual time=189865.81. .189865.81 rows=1 loops=1)
      -> Seq Scan on foo (cost=0.00..20. 00 rows=1000 width=0) (actual time=18.88..163 833.61 rows=10240000 loops=1)
      Total runtime: 189865.91 msec
      (3 rows)

      in other words 26sec to do the aggregate on top of 163sec to read the
      rows.

      Unless Joseph's machine has a way better IO-to-CPU ratio than my little
      development machine, there's something odd about his numbers.

      regards, tom lane

      ---------------------------(end of broadcast)---------------------------
      TIP 4: Don't 'kill -9' the postmaster

      Comment

      • Greg Stark

        #4
        Re: why does count take so long?


        Tom Lane <tgl@sss.pgh.pa .us> writes:
        [color=blue]
        > Yeah, but I think he's complaining about the 10sec delta for the
        > aggregate on top of the 71sec to read the 8 million rows. That
        > seems high to me too. On a 10-mil-row test table, I get[/color]
        ....[color=blue]
        > in other words 26sec to do the aggregate on top of 163sec to read the
        > rows.
        >
        > Unless Joseph's machine has a way better IO-to-CPU ratio than my little
        > development machine, there's something odd about his numbers.[/color]

        Why is 10s (a 14% delta) for 8M records suspicious but 26s (16% delta) for 10M
        not suspicious? These results seem fairly consistent actually.

        I think what the original question was is "what work does this 10s represent".
        I'm curious too. Is it really just 10 million times the cpu cycles necessary
        to dispatch a call to the count() aggregate state function?



        PS:
        [color=blue]
        > regression=# explain analyze select count(*) from foo;
        > QUERY PLAN
        > ------------------------------------------------------------------------------------------------------------------
        > Aggregate (cost=22.50..22 .50 rows=1 width=0) (actual time=189865.81. .189865.81 rows=1 loops=1)
        > -> Seq Scan on foo (cost=0.00..20. 00 rows=1000 width=0) (actual time=18.88..163 833.61 rows=10240000 loops=1)
        > Total runtime: 189865.91 msec
        > (3 rows)[/color]

        Hey, you haven't analyzed your table! :)

        --
        greg


        ---------------------------(end of broadcast)---------------------------
        TIP 7: don't forget to increase your free space map settings

        Comment

        • Tom Lane

          #5
          Re: why does count take so long?

          Greg Stark <gsstark@mit.ed u> writes:[color=blue]
          > Tom Lane <tgl@sss.pgh.pa .us> writes:
          > Why is 10s (a 14% delta) for 8M records suspicious but 26s (16% delta) for 10M
          > not suspicious? These results seem fairly consistent actually.[/color]

          Duh --- must have gotten the decimal point off in my mental arithmetic.
          [color=blue]
          > I think what the original question was is "what work does this 10s represent".
          > I'm curious too. Is it really just 10 million times the cpu cycles necessary
          > to dispatch a call to the count() aggregate state function?[/color]

          Well, it's 10 mil cycles of the aggregate plan node, which is going to
          involve rather more work than just the aggregate function call. But the
          last time I profiled COUNT(), it seemed that the pallocs/pfrees needed
          for the int8 counter were the largest single chunk of CPU time.

          Something I've wanted to do for awhile is to look at making int8 and
          float8 be pass-by-value datatypes on machines where Datum is naturally
          8 bytes (ie, any 8-byte-pointer architecture). I doubt it would be a
          win to widen Datum on 32-bit machines, though; the distributed costs
          would swamp the advantage from making these datatypes more efficient.

          regards, tom lane

          ---------------------------(end of broadcast)---------------------------
          TIP 8: explain analyze is your friend

          Comment

          • Greg Stark

            #6
            Re: why does count take so long?


            Tom Lane <tgl@sss.pgh.pa .us> writes:
            [color=blue]
            > Something I've wanted to do for awhile is to look at making int8 and
            > float8 be pass-by-value datatypes on machines where Datum is naturally
            > 8 bytes (ie, any 8-byte-pointer architecture). I doubt it would be a
            > win to widen Datum on 32-bit machines, though; the distributed costs
            > would swamp the advantage from making these datatypes more efficient.[/color]

            Things like count(*) could use int4 until it overflows though.
            Is int4 a pass-by-value datatype on 32-bit machines?

            --
            greg


            ---------------------------(end of broadcast)---------------------------
            TIP 4: Don't 'kill -9' the postmaster

            Comment

            • Tom Lane

              #7
              Re: why does count take so long?

              Greg Stark <gsstark@mit.ed u> writes:[color=blue]
              > Things like count(*) could use int4 until it overflows though.[/color]

              I don't see a reasonable way for an aggregate to change state datatype
              on the fly; otherwise this would be a great solution.
              [color=blue]
              > Is int4 a pass-by-value datatype on 32-bit machines?[/color]

              Yeah. Not too long ago, count() used int4, but we got some complaints
              about it overflowing on big tables.

              regards, tom lane

              ---------------------------(end of broadcast)---------------------------
              TIP 4: Don't 'kill -9' the postmaster

              Comment

              • Tom Lane

                #8
                Re: why does count take so long?

                I said:[color=blue]
                > Greg Stark <gsstark@mit.ed u> writes:[color=green]
                >> Things like count(*) could use int4 until it overflows though.[/color][/color]
                [color=blue]
                > I don't see a reasonable way for an aggregate to change state datatype
                > on the fly; otherwise this would be a great solution.[/color]

                On the other hand, the cost is imposed by the generic aggregate
                definition that says the aggregate state transition function is an
                ordinary function. This is fine for user-defined aggregates, but there
                is no law that says that all the built-in aggregates must use that same
                API. We could probably invent some API that allows COUNT(*) to keep its
                running count someplace where it needn't be re-palloc'd on every cycle.
                Something to think about for 7.5 (too late for 7.4 I fear).

                regards, tom lane

                ---------------------------(end of broadcast)---------------------------
                TIP 2: you can get off all lists at once with the unregister command
                (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

                Comment

                • Joseph Shraibman

                  #9
                  Re: why does count take so long?

                  Tom Lane wrote:
                  [color=blue]
                  > Unless Joseph's machine has a way better IO-to-CPU ratio than my little
                  > development machine, there's something odd about his numbers.
                  >
                  > regards, tom lane[/color]
                  Doing some math:

                  81372.11 / 71542.53 : 1.1373949174008 804
                  189865.81 / 163833.61 : 1.1588941365572 059

                  So I have count() taking 14% over the seqscan, and you have 16%


                  ---------------------------(end of broadcast)---------------------------
                  TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

                  Comment

                  • Jean-Luc Lachance

                    #10
                    Re: why does count take so long?

                    How about keeping counts of inserts, deletes and updates per table per
                    transaction as part of the live statistics?



                    Tom Lane wrote:[color=blue]
                    >
                    > I said:[color=green]
                    > > Greg Stark <gsstark@mit.ed u> writes:[color=darkred]
                    > >> Things like count(*) could use int4 until it overflows though.[/color][/color]
                    >[color=green]
                    > > I don't see a reasonable way for an aggregate to change state datatype
                    > > on the fly; otherwise this would be a great solution.[/color]
                    >
                    > On the other hand, the cost is imposed by the generic aggregate
                    > definition that says the aggregate state transition function is an
                    > ordinary function. This is fine for user-defined aggregates, but there
                    > is no law that says that all the built-in aggregates must use that same
                    > API. We could probably invent some API that allows COUNT(*) to keep its
                    > running count someplace where it needn't be re-palloc'd on every cycle.
                    > Something to think about for 7.5 (too late for 7.4 I fear).
                    >
                    > regards, tom lane
                    >
                    > ---------------------------(end of broadcast)---------------------------
                    > TIP 2: you can get off all lists at once with the unregister command
                    > (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)[/color]

                    ---------------------------(end of broadcast)---------------------------
                    TIP 8: explain analyze is your friend

                    Comment

                    • Dennis Gearon

                      #11
                      Re: why does count take so long?

                      Is improving count(*) <a lot if possible> on one of the TODO lists?

                      Jean-Luc Lachance wrote:
                      [color=blue]
                      >How about keeping counts of inserts, deletes and updates per table per
                      >transaction as part of the live statistics?
                      >
                      >
                      >
                      >Tom Lane wrote:
                      >
                      >[color=green]
                      >>I said:
                      >>
                      >>[color=darkred]
                      >>>Greg Stark <gsstark@mit.ed u> writes:
                      >>>
                      >>>
                      >>>>Things like count(*) could use int4 until it overflows though.
                      >>>>
                      >>>>
                      >>>I don't see a reasonable way for an aggregate to change state datatype
                      >>>on the fly; otherwise this would be a great solution.
                      >>>
                      >>>[/color]
                      >>On the other hand, the cost is imposed by the generic aggregate
                      >>definition that says the aggregate state transition function is an
                      >>ordinary function. This is fine for user-defined aggregates, but there
                      >>is no law that says that all the built-in aggregates must use that same
                      >>API. We could probably invent some API that allows COUNT(*) to keep its
                      >>running count someplace where it needn't be re-palloc'd on every cycle.
                      >>Something to think about for 7.5 (too late for 7.4 I fear).
                      >>
                      >> regards, tom lane
                      >>
                      >>---------------------------(end of broadcast)---------------------------
                      >>TIP 2: you can get off all lists at once with the unregister command
                      >> (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)
                      >>
                      >>[/color]
                      >
                      >---------------------------(end of broadcast)---------------------------
                      >TIP 8: explain analyze is your friend
                      >
                      >
                      >[/color]


                      ---------------------------(end of broadcast)---------------------------
                      TIP 9: the planner will ignore your desire to choose an index scan if your
                      joining column's datatypes do not match

                      Comment

                      • scott.marlowe

                        #12
                        Re: why does count take so long?

                        The problem with that approach is that while there may only be one or two
                        tables that you want to do max(field) on, it would require that all tables
                        be kept track of, increasing overhead. Keep in mind, select id from table
                        order by id desc limit 1 is a snap, hits the indexes, and runs fast.

                        also, I don't think postgresql keeps "live" statistics, only "dead" ones
                        i.e. they are only gathered by running analyze. If you just need an
                        approximate account, you can already get that from the statistics table.

                        Now, if there were a simple way of assigning such behaviour to a table,
                        (i.e. with count or something like that) via a trigger, that might be
                        worth looking into, but keep in mind, it creates a serious bottleneck
                        during inserts/updates/deletes.

                        On Tue, 9 Sep 2003, Jean-Luc Lachance wrote:
                        [color=blue]
                        > How about keeping counts of inserts, deletes and updates per table per
                        > transaction as part of the live statistics?
                        >
                        >
                        >
                        > Tom Lane wrote:[color=green]
                        > >
                        > > I said:[color=darkred]
                        > > > Greg Stark <gsstark@mit.ed u> writes:
                        > > >> Things like count(*) could use int4 until it overflows though.[/color]
                        > >[color=darkred]
                        > > > I don't see a reasonable way for an aggregate to change state datatype
                        > > > on the fly; otherwise this would be a great solution.[/color]
                        > >
                        > > On the other hand, the cost is imposed by the generic aggregate
                        > > definition that says the aggregate state transition function is an
                        > > ordinary function. This is fine for user-defined aggregates, but there
                        > > is no law that says that all the built-in aggregates must use that same
                        > > API. We could probably invent some API that allows COUNT(*) to keep its
                        > > running count someplace where it needn't be re-palloc'd on every cycle.
                        > > Something to think about for 7.5 (too late for 7.4 I fear).
                        > >
                        > > regards, tom lane
                        > >
                        > > ---------------------------(end of broadcast)---------------------------
                        > > TIP 2: you can get off all lists at once with the unregister command
                        > > (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)[/color]
                        >
                        > ---------------------------(end of broadcast)---------------------------
                        > TIP 8: explain analyze is your friend
                        >[/color]


                        ---------------------------(end of broadcast)---------------------------
                        TIP 6: Have you searched our list archives?



                        Comment

                        • Christopher Browne

                          #13
                          Re: why does count take so long?

                          In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote:[color=blue]
                          > How about keeping counts of inserts, deletes and updates per table per
                          > transaction as part of the live statistics?[/color]

                          Aye, there's the rub.

                          That's exactly what _won't_ work, and that's exactly the case that is
                          somewhat pathological under MVCC.

                          With MVCC, data "appears as if by magic" when its transaction COMMITs,
                          thereby revealing the rows to the world.

                          Unfortunately, there's no simple way of making updates to counts
                          "simply appear" when they take effect, not without turning the updates
                          into a concurrency bottleneck.

                          Here's a bit of a wild thought...

                          Assume a table with schema as follows:
                          create table pg_count (
                          xid integer, --- Actually, it's not an integer, right, Andrew? :-(
                          reltype oid,
                          count integer
                          );

                          Every transaction, "xid," affects some number of tuples. So that for
                          a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
                          rows from table with 45678, part of the transaction would include
                          inserting these rows:

                          insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
                          insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);

                          In order to get the count for table 12345, you could then go to
                          pg_count and request:
                          select sum(count) from pg_count where reltype = 12345;

                          The size of this table would increase every time a transaction gets
                          committed, so presumably part of VACUUM TABLE would be a
                          collection/summarization, thus...

                          -- Collect existing stats into 1 row
                          insert into pg_count(xid, reltype, count) values (currxid,
                          currtable, select sum(count) from pg_count where reltype =
                          currtable);

                          -- Delete the old stats
                          delete from pg_count where reltype = currtable and xid <> currxid;

                          This will cope with concurrency reasonably well (modulo having to make
                          sure that the "collect/delete" transaction is a serialized one).

                          Unfortunately, if a table is updated frequently, the summary
                          select sum(count) from pg_count where reltype = 12345;
                          will have to collect together quite a large number of entries, which
                          makes this "less cheap."

                          That suggests an optimization; any time the COUNT is selected, the old
                          stats can and should be collected into 1 row and the old data deleted.
                          --
                          wm(X,Y):-write(X),write( '@'),write(Y). wm('aa454','fre enet.carleton.c a').

                          Sturgeon's Law: 90% of *EVERYTHING* is crud.

                          Comment

                          • Joseph Shraibman

                            #14
                            Re: why does count take so long?

                            Tom Lane wrote:
                            [color=blue]
                            > Something to think about for 7.5 (too late for 7.4 I fear).[/color]

                            What about 7.4.1? This wouldn't affect how data is stored on disk,
                            which is what would keep an upgrade out of a minor release, right?


                            ---------------------------(end of broadcast)---------------------------
                            TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

                            Comment

                            • Doug McNaught

                              #15
                              Re: why does count take so long?

                              Joseph Shraibman <jks@selectacas t.net> writes:
                              [color=blue]
                              > Tom Lane wrote:
                              >[color=green]
                              > > Something to think about for 7.5 (too late for 7.4 I fear).[/color]
                              >
                              > What about 7.4.1? This wouldn't affect how data is stored on disk,
                              > which is what would keep an upgrade out of a minor release, right?[/color]

                              Minor releases are basically bugfix-only for PG.

                              -Doug

                              ---------------------------(end of broadcast)---------------------------
                              TIP 2: you can get off all lists at once with the unregister command
                              (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

                              Comment

                              Working...