Huge Data

**Richard Huxton** · Nov 22 '05, 04:21 AM

Re: Huge Data

On Wednesday 14 January 2004 11:11, Sezai YILMAZ wrote:[color=blue]
> Hi,
>
> I use PostgreSQL 7.4 for storing huge amount of data. For example 7
> million rows. But when I run the query "select count(*) from table;", it
> results after about 120 seconds. Is this result normal for such a huge
> table? Is there any methods for speed up the querying time? The huge
> table has integer primary key and some other indexes for other columns.[/color]

PG uses MVCC to manage concurrency. A downside of this is that to verify the
exact number of rows in a table you have to visit them all.

There's plenty on this in the archives, and probably the FAQ too.

What are you using the count() for?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

PostgreSQL: Not Found

http://www.postgresql.org/docs/faqs/FAQ.html

**Sezai YILMAZ** · Nov 22 '05, 04:22 AM

Re: Huge Data

Richard Huxton wrote:
[color=blue]
>On Wednesday 14 January 2004 11:11, Sezai YILMAZ wrote:
>
>[color=green]
>>Hi,
>>
>>I use PostgreSQL 7.4 for storing huge amount of data. For example 7
>>million rows. But when I run the query "select count(*) from table;", it
>>results after about 120 seconds. Is this result normal for such a huge
>>table? Is there any methods for speed up the querying time? The huge
>>table has integer primary key and some other indexes for other columns.
>>
>>[/color]
>
>PG uses MVCC to manage concurrency. A downside of this is that to verify the
>exact number of rows in a table you have to visit them all.
>
>There's plenty on this in the archives, and probably the FAQ too.
>
>What are you using the count() for?
>
>[/color]
I use count() for some statistics. Just to show how many records
collected so far.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

**Sezai YILMAZ** · Nov 22 '05, 04:22 AM

Re: Huge Data

Richard Huxton wrote:
[color=blue]
>PG uses MVCC to manage concurrency. A downside of this is that to verify the
>exact number of rows in a table you have to visit them all.
>
>There's plenty on this in the archives, and probably the FAQ too.
>
>What are you using the count() for?
>
>
>[/color]

select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?

-sezai

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

**Shridhar Daithankar** · Nov 22 '05, 04:22 AM

Re: Huge Data

On Wednesday 14 January 2004 17:57, Sezai YILMAZ wrote:[color=blue]
> Richard Huxton wrote:[color=green]
> >What are you using the count() for?[/color]
>
> I use count() for some statistics. Just to show how many records
> collected so far.[/color]

Rather than doing count(*), you should either cache the count in application
memory

or analyze often and use following.

'select reltuples from pg_class where relname = 'foo';

This would give you approximate count. I believe it should suffice for your
needs.

HTH

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

**Matthew Lunnon** · Nov 22 '05, 04:22 AM

Re: Huge Data

Have you run 'vacuum analyze log;'? Also I believe that in Oracle count(1) used to be quicker than count(*).
Matthew
----- Original Message -----
From: Sezai YILMAZ
To: Richard Huxton
Cc: pgsql-general@postgre sql.org
Sent: Wednesday, January 14, 2004 12:39 PM
Subject: Re: [GENERAL] Huge Data

Richard Huxton wrote:
[color=blue]
>PG uses MVCC to manage concurrency. A downside of this is that to verifythe
>exact number of rows in a table you have to visit them all.
>
>There's plenty on this in the archives, and probably the FAQ too.
>
>What are you using the count() for?
>
>
>[/color]

select logid, agentid, logbody from log where logid=3000000;

this query also returns after about 120 seconds. The table log has about
7 million records, and logid is the primary key of log table. What about
that? Why is it too slow?

-sezai

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

_______________ _______________ _______________ _______________ _________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com

**Shridhar Daithankar** · Nov 22 '05, 04:22 AM

Re: Huge Data

On Wednesday 14 January 2004 18:22, Matthew Lunnon wrote:[color=blue]
> select logid, agentid, logbody from log where logid=3000000;
>
> this query also returns after about 120 seconds. The table log has about
> 7 million records, and logid is the primary key of log table. What about
> that? Why is it too slow?[/color]

How about

select logid, agentid, logbody from log where logid='3000000' ;

or

select logid, agentid, logbody from log where logid=3000000:: int4;

Basically you need to typecast the constant. Then it would use the index.

I am not sure of first form of it though. I recommend you use the later form.

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Richard Huxton** · Nov 22 '05, 04:22 AM

Re: Huge Data

On Wednesday 14 January 2004 12:39, Sezai YILMAZ wrote:[color=blue]
>
> select logid, agentid, logbody from log where logid=3000000;[/color]

At a guess, because logid is bigint, whereas 300000 is taken to be integer.
Try ... where logid = 300000::bigint;

This is in the FAQ too I think, and is certainly in the archives.

Other things you might come across:
SELECT max() involves a sequential scan just like count(), you can rewrite it
as SELECT target_column FROM my_table ORDER BY target_column DESC LIMIT 1

The config values are very conservative. You will definitely want to tune them
for performance. See the articles here for a good introduction:

Error 404 - Not Found

http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

The VACUUM command is used to reclaim unused space, and the ANALYZE command to
regenerate statistics. It's worth reading up on both.

You can use EXPLAIN ANALYSE <query here> to see the plan that PG uses. I think
there's a discussion of it at http://techdocs.postgresql.org/

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

**Richard Huxton** · Nov 22 '05, 04:22 AM

Re: Huge Data

On Wednesday 14 January 2004 12:27, Sezai YILMAZ wrote:[color=blue]
> Richard Huxton wrote:[color=green]
> >There's plenty on this in the archives, and probably the FAQ too.
> >
> >What are you using the count() for?[/color]
>
> I use count() for some statistics. Just to show how many records
> collected so far.[/color]

If you want an accurate number without scanning the table, you'll need to use
a trigger to keep a count up to date.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

**Sezai YILMAZ** · Nov 22 '05, 04:22 AM

Re: Huge Data

Shridhar Daithankar wrote:
[color=blue]
>Rather than doing count(*), you should either cache the count in application
>memory
>
> or analyze often and use following.
>
>'select reltuples from pg_class where relname = 'foo';
>[/color]
Thank you very much Shridhar. This one is responsive immediately. I
think I will use this method for gathering row count. But I complain to
break SQL standards. The code will become unmovable.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

PostgreSQL: PostgreSQL Mailing List Archives

http://archives.postgresql.org

**Franco Bruno Borghesi** · Nov 22 '05, 04:22 AM

Re: Huge Data

If the mentioned solution fits your needs, you could create a stored
procedure using that. The postgreSQL implementation could select from
pg_class, while the same function in other database could execute the
select count() on the table.

On Wed, 2004-01-14 at 10:25, Sezai YILMAZ wrote:
[color=blue]
> Shridhar Daithankar wrote:
> [color=green]
> >Rather than doing count(*), you should either cache the count in application
> >memory
> >
> > or analyze often and use following.
> >
> >'select reltuples from pg_class where relname = 'foo';
> >[/color]
> Thank you very much Shridhar. This one is responsive immediately. I
> think I will use this method for gathering row count. But I complain to
> break SQL standards. The code will become unmovable.
>
> -sezai
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
> [/color]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQBABUbK21d VnhLsBV0RAuyGAJ 4z6AJcbnEw9njiO rtbsFnd/a7sGwCeNeTy
G/GkE8IfE/MSNGLqGsHGoNM=
=197f
-----END PGP SIGNATURE-----

**Sezai YILMAZ** · Nov 22 '05, 04:22 AM

Re: Huge Data

Shridhar Daithankar wrote:
[color=blue]
>On Wednesday 14 January 2004 18:22, Matthew Lunnon wrote:
>
>[color=green]
>> select logid, agentid, logbody from log where logid=3000000;
>>
>> this query also returns after about 120 seconds. The table log has about
>> 7 million records, and logid is the primary key of log table. What about
>> that? Why is it too slow?
>>
>>[/color]
>
>How about
>
>select logid, agentid, logbody from log where logid='3000000' ;
>
>[/color]
Oh my god. It is unbelievable. The result is great. Thanks for all guys
who helped me.

-sezai

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postg resql.org so that your
message can get through to the mailing list cleanly

**Shridhar Daithankar** · Nov 22 '05, 04:22 AM

Re: Huge Data

On Wednesday 14 January 2004 18:55, Sezai YILMAZ wrote:[color=blue]
> Shridhar Daithankar wrote:[color=green]
> >Rather than doing count(*), you should either cache the count in
> > application memory
> >
> > or analyze often and use following.
> >
> >'select reltuples from pg_class where relname = 'foo';[/color]
>
> Thank you very much Shridhar. This one is responsive immediately. I
> think I will use this method for gathering row count. But I complain to
> break SQL standards. The code will become unmovable.[/color]

Well, you could document it somewhere for your reference. It is not that
hard..:-)

And remember, the value you get is just an estimate. You need to analyze the
table often with respect to it's update/insert/delete activity to keep the
estimate reasonable accurate. Vacuuming would also update the estimate.

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Huge Data

Huge Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment