why does count take so long?

**Bruno Wolff III** · Nov 11 '05, 11:43 PM

Re: why does count take so long?

On Sun, Sep 07, 2003 at 21:06:26 -0400,
Joseph Shraibman <jks@selectacas t.net> wrote:[color=blue]
> On a 7.3.4 database:
>
> explain analyse select count(*) from elog;
>
>
> Aggregate (cost=223764.05 ..223764.05 rows=1 width=0) (actual
> time=81372.11.. 81372.11 rows=1 loops=1)
> -> Seq Scan on elog (cost=0.00..203 012.24 rows=8300724 width=0)
> (actual time=3.91..7154 2.53 rows=8313762 loops=1)
> Total runtime: 81378.42 msec
> (3 rows)
>
>
> It looks like the aggregate took 10 secs all by itself. What's taking
> so long?[/color]

It looks like there are 8 million log records that need to be counted.

There have been some discussions over the last few days about tricks to
get better performance if you need to use count on large tables.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Tom Lane** · Nov 11 '05, 11:43 PM

Re: why does count take so long?

Bruno Wolff III <bruno@wolff.to > writes:[color=blue]
> Joseph Shraibman <jks@selectacas t.net> wrote:[color=green]
>> It looks like the aggregate took 10 secs all by itself. What's taking
>> so long?[/color][/color]
[color=blue]
> It looks like there are 8 million log records that need to be counted.[/color]

Yeah, but I think he's complaining about the 10sec delta for the
aggregate on top of the 71sec to read the 8 million rows. That
seems high to me too. On a 10-mil-row test table, I get

regression=# explain analyze select count(*) from foo;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=22.50..22 .50 rows=1 width=0) (actual time=189865.81. .189865.81 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..20. 00 rows=1000 width=0) (actual time=18.88..163 833.61 rows=10240000 loops=1)
Total runtime: 189865.91 msec
(3 rows)

in other words 26sec to do the aggregate on top of 163sec to read the
rows.

Unless Joseph's machine has a way better IO-to-CPU ratio than my little
development machine, there's something odd about his numbers.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Greg Stark** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

Tom Lane <tgl@sss.pgh.pa .us> writes:
[color=blue]
> Yeah, but I think he's complaining about the 10sec delta for the
> aggregate on top of the 71sec to read the 8 million rows. That
> seems high to me too. On a 10-mil-row test table, I get[/color]
....[color=blue]
> in other words 26sec to do the aggregate on top of 163sec to read the
> rows.
>
> Unless Joseph's machine has a way better IO-to-CPU ratio than my little
> development machine, there's something odd about his numbers.[/color]

Why is 10s (a 14% delta) for 8M records suspicious but 26s (16% delta) for 10M
not suspicious? These results seem fairly consistent actually.

I think what the original question was is "what work does this 10s represent".
I'm curious too. Is it really just 10 million times the cpu cycles necessary
to dispatch a call to the count() aggregate state function?

PS:
[color=blue]
> regression=# explain analyze select count(*) from foo;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=22.50..22 .50 rows=1 width=0) (actual time=189865.81. .189865.81 rows=1 loops=1)
> -> Seq Scan on foo (cost=0.00..20. 00 rows=1000 width=0) (actual time=18.88..163 833.61 rows=10240000 loops=1)
> Total runtime: 189865.91 msec
> (3 rows)[/color]

Hey, you haven't analyzed your table! :)

--
greg

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

**Tom Lane** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

Greg Stark <gsstark@mit.ed u> writes:[color=blue]
> Tom Lane <tgl@sss.pgh.pa .us> writes:
> Why is 10s (a 14% delta) for 8M records suspicious but 26s (16% delta) for 10M
> not suspicious? These results seem fairly consistent actually.[/color]

Duh --- must have gotten the decimal point off in my mental arithmetic.
[color=blue]
> I think what the original question was is "what work does this 10s represent".
> I'm curious too. Is it really just 10 million times the cpu cycles necessary
> to dispatch a call to the count() aggregate state function?[/color]

Well, it's 10 mil cycles of the aggregate plan node, which is going to
involve rather more work than just the aggregate function call. But the
last time I profiled COUNT(), it seemed that the pallocs/pfrees needed
for the int8 counter were the largest single chunk of CPU time.

Something I've wanted to do for awhile is to look at making int8 and
float8 be pass-by-value datatypes on machines where Datum is naturally
8 bytes (ie, any 8-byte-pointer architecture). I doubt it would be a
win to widen Datum on 32-bit machines, though; the distributed costs
would swamp the advantage from making these datatypes more efficient.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

**Greg Stark** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

Tom Lane <tgl@sss.pgh.pa .us> writes:
[color=blue]
> Something I've wanted to do for awhile is to look at making int8 and
> float8 be pass-by-value datatypes on machines where Datum is naturally
> 8 bytes (ie, any 8-byte-pointer architecture). I doubt it would be a
> win to widen Datum on 32-bit machines, though; the distributed costs
> would swamp the advantage from making these datatypes more efficient.[/color]

Things like count(*) could use int4 until it overflows though.
Is int4 a pass-by-value datatype on 32-bit machines?

--
greg

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Tom Lane** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

Greg Stark <gsstark@mit.ed u> writes:[color=blue]
> Things like count(*) could use int4 until it overflows though.[/color]

I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.
[color=blue]
> Is int4 a pass-by-value datatype on 32-bit machines?[/color]

Yeah. Not too long ago, count() used int4, but we got some complaints
about it overflowing on big tables.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Tom Lane** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

I said:[color=blue]
> Greg Stark <gsstark@mit.ed u> writes:[color=green]
>> Things like count(*) could use int4 until it overflows though.[/color][/color]
[color=blue]
> I don't see a reasonable way for an aggregate to change state datatype
> on the fly; otherwise this would be a great solution.[/color]

On the other hand, the cost is imposed by the generic aggregate
definition that says the aggregate state transition function is an
ordinary function. This is fine for user-defined aggregates, but there
is no law that says that all the built-in aggregates must use that same
API. We could probably invent some API that allows COUNT(*) to keep its
running count someplace where it needn't be re-palloc'd on every cycle.
Something to think about for 7.5 (too late for 7.4 I fear).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

**Joseph Shraibman** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

Tom Lane wrote:
[color=blue]
> Unless Joseph's machine has a way better IO-to-CPU ratio than my little
> development machine, there's something odd about his numbers.
>
> regards, tom lane[/color]
Doing some math:

81372.11 / 71542.53 : 1.1373949174008 804
189865.81 / 163833.61 : 1.1588941365572 059

So I have count() taking 14% over the seqscan, and you have 16%

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

**Jean-Luc Lachance** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

How about keeping counts of inserts, deletes and updates per table per
transaction as part of the live statistics?

Tom Lane wrote:[color=blue]
>
> I said:[color=green]
> > Greg Stark <gsstark@mit.ed u> writes:[color=darkred]
> >> Things like count(*) could use int4 until it overflows though.[/color][/color]
>[color=green]
> > I don't see a reasonable way for an aggregate to change state datatype
> > on the fly; otherwise this would be a great solution.[/color]
>
> On the other hand, the cost is imposed by the generic aggregate
> definition that says the aggregate state transition function is an
> ordinary function. This is fine for user-defined aggregates, but there
> is no law that says that all the built-in aggregates must use that same
> API. We could probably invent some API that allows COUNT(*) to keep its
> running count someplace where it needn't be re-palloc'd on every cycle.
> Something to think about for 7.5 (too late for 7.4 I fear).
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)[/color]

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

**Dennis Gearon** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

Is improving count(*) <a lot if possible> on one of the TODO lists?

Jean-Luc Lachance wrote:
[color=blue]
>How about keeping counts of inserts, deletes and updates per table per
>transaction as part of the live statistics?
>
>
>
>Tom Lane wrote:
>
>[color=green]
>>I said:
>>
>>[color=darkred]
>>>Greg Stark <gsstark@mit.ed u> writes:
>>>
>>>
>>>>Things like count(*) could use int4 until it overflows though.
>>>>
>>>>
>>>I don't see a reasonable way for an aggregate to change state datatype
>>>on the fly; otherwise this would be a great solution.
>>>
>>>[/color]
>>On the other hand, the cost is imposed by the generic aggregate
>>definition that says the aggregate state transition function is an
>>ordinary function. This is fine for user-defined aggregates, but there
>>is no law that says that all the built-in aggregates must use that same
>>API. We could probably invent some API that allows COUNT(*) to keep its
>>running count someplace where it needn't be re-palloc'd on every cycle.
>>Something to think about for 7.5 (too late for 7.4 I fear).
>>
>> regards, tom lane
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)
>>
>>[/color]
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>
>[/color]

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

**scott.marlowe** · Nov 11 '05, 11:44 PM

Re: why does count take so long?

The problem with that approach is that while there may only be one or two
tables that you want to do max(field) on, it would require that all tables
be kept track of, increasing overhead. Keep in mind, select id from table
order by id desc limit 1 is a snap, hits the indexes, and runs fast.

also, I don't think postgresql keeps "live" statistics, only "dead" ones
i.e. they are only gathered by running analyze. If you just need an
approximate account, you can already get that from the statistics table.

Now, if there were a simple way of assigning such behaviour to a table,
(i.e. with count or something like that) via a trigger, that might be
worth looking into, but keep in mind, it creates a serious bottleneck
during inserts/updates/deletes.

On Tue, 9 Sep 2003, Jean-Luc Lachance wrote:
[color=blue]
> How about keeping counts of inserts, deletes and updates per table per
> transaction as part of the live statistics?
>
>
>
> Tom Lane wrote:[color=green]
> >
> > I said:[color=darkred]
> > > Greg Stark <gsstark@mit.ed u> writes:
> > >> Things like count(*) could use int4 until it overflows though.[/color]
> >[color=darkred]
> > > I don't see a reasonable way for an aggregate to change state datatype
> > > on the fly; otherwise this would be a great solution.[/color]
> >
> > On the other hand, the cost is imposed by the generic aggregate
> > definition that says the aggregate state transition function is an
> > ordinary function. This is fine for user-defined aggregates, but there
> > is no law that says that all the built-in aggregates must use that same
> > API. We could probably invent some API that allows COUNT(*) to keep its
> > running count someplace where it needn't be re-palloc'd on every cycle.
> > Something to think about for 7.5 (too late for 7.4 I fear).
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)[/color]
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>[/color]

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

PostgreSQL: PostgreSQL Mailing List Archives

http://archives.postgresql.org

**Christopher Browne** · Nov 11 '05, 11:45 PM

Re: why does count take so long?

In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote:[color=blue]
> How about keeping counts of inserts, deletes and updates per table per
> transaction as part of the live statistics?[/color]

Aye, there's the rub.

That's exactly what _won't_ work, and that's exactly the case that is
somewhat pathological under MVCC.

With MVCC, data "appears as if by magic" when its transaction COMMITs,
thereby revealing the rows to the world.

Unfortunately, there's no simple way of making updates to counts
"simply appear" when they take effect, not without turning the updates
into a concurrency bottleneck.

Here's a bit of a wild thought...

Assume a table with schema as follows:
create table pg_count (
xid integer, --- Actually, it's not an integer, right, Andrew? :-(
reltype oid,
count integer
);

Every transaction, "xid," affects some number of tuples. So that for
a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
rows from table with 45678, part of the transaction would include
inserting these rows:

insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);

In order to get the count for table 12345, you could then go to
pg_count and request:
select sum(count) from pg_count where reltype = 12345;

The size of this table would increase every time a transaction gets
committed, so presumably part of VACUUM TABLE would be a
collection/summarization, thus...

-- Collect existing stats into 1 row
insert into pg_count(xid, reltype, count) values (currxid,
currtable, select sum(count) from pg_count where reltype =
currtable);

-- Delete the old stats
delete from pg_count where reltype = currtable and xid <> currxid;

This will cope with concurrency reasonably well (modulo having to make
sure that the "collect/delete" transaction is a serialized one).

Unfortunately, if a table is updated frequently, the summary
select sum(count) from pg_count where reltype = 12345;
will have to collect together quite a large number of entries, which
makes this "less cheap."

That suggests an optimization; any time the COUNT is selected, the old
stats can and should be collected into 1 row and the old data deleted.
--
wm(X,Y):-write(X),write( '@'),write(Y). wm('aa454','fre enet.carleton.c a').

http://www3.sympatico.ca/cbbrowne/nonrdbms.html

Sturgeon's Law: 90% of *EVERYTHING* is crud.

**Joseph Shraibman** · Nov 11 '05, 11:45 PM

Re: why does count take so long?

Tom Lane wrote:
[color=blue]
> Something to think about for 7.5 (too late for 7.4 I fear).[/color]

What about 7.4.1? This wouldn't affect how data is stored on disk,
which is what would keep an upgrade out of a minor release, right?

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

**Doug McNaught** · Nov 11 '05, 11:45 PM

Re: why does count take so long?

Joseph Shraibman <jks@selectacas t.net> writes:
[color=blue]
> Tom Lane wrote:
>[color=green]
> > Something to think about for 7.5 (too late for 7.4 I fear).[/color]
>
> What about 7.4.1? This wouldn't affect how data is stored on disk,
> which is what would keep an upgrade out of a minor release, right?[/color]

Minor releases are basically bugfix-only for PG.

-Doug

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

why does count take so long?

why does count take so long?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment