Buglist

**Bruno Wolff III** · Nov 11 '05, 11:27 PM

Re: Buglist

On Wed, Aug 20, 2003 at 10:31:25 -0400,
Vivek Khera <khera@kcilink. com> wrote:[color=blue]
>
> I care for the performance. And how are you so sure that it was
> faster the way it is now? Are you sure it was not done this way
> because of ease of implementation?
>
> Seriously, how much slower can it be if the backend were to do the
> checking for external references upon updating/deleting a row? The
> cost would be distributed across time as opposed to concentrated at
> once within a vacuum process. I am fairly certian it would reduce
> disk bandwidth requirements since at least one necessary page will
> already be in memory.[/color]

It would probably be a lot slower. Any transaction that has started
but not yet finished would need to lock all rows that exist at during
the transaction (for serialized transaction isolation you would only
need to worry about rows that existed at the start of the transaction
or that were modified by the transaction). Immediately that is a big
deal since a typical query may need to lock a bunch of rows that it
will never actually touch (but you don't know that when the transaction
starts). Managing all those locks would take up a lot of semiconductor
memory or a lot of disk writes and be a major source of contention.
The current system just has to mark rows when they are created and
deleted (an update does both operations). The normal vacuum clean up
actually isn't going to be much worse than what you would need to do
at both the start and end of each transaction. The overhead of letting
dead rows hang around for a while after they aren't needed isn't that high.

Also, since at least 7.3, normal vacuums aren't normally going to affect the
performance of your database server that much.

The main issue against the current vacuum system is that it requires the
DBA knowing what vacuum does and figuring out how it should be used in
their situation to get reasonable performance. This makes it a bit harder
for non-DBAs to jump right in to Postgres without running into problems.

However, the work on autovacuum seems to be providing a reasonable solution
to that problem.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

**scott.marlowe** · Nov 11 '05, 11:27 PM

Re: Buglist

On Wed, 20 Aug 2003, Vivek Khera wrote:
[color=blue][color=green][color=darkred]
> >>>>> "BW" == Bruno Wolff, <Bruno> writes:[/color][/color]
>[color=green][color=darkred]
> >> to see it incremental. This would result in pretty much near zero
> >> internal fragmentation, I think.[/color][/color]
>
> BW> Why do you care about about the details of the implementation (rather than
> BW> the performance)? If it were faster to do it that way, that's how it would
> BW> have been done in the first place. The cost of doing the above is almost
> BW> certainly going to be an overall performance loser.
>
> I care for the performance. And how are you so sure that it was
> faster the way it is now? Are you sure it was not done this way
> because of ease of implementation?
>
> Seriously, how much slower can it be if the backend were to do the
> checking for external references upon updating/deleting a row? The
> cost would be distributed across time as opposed to concentrated at
> once within a vacuum process. I am fairly certian it would reduce
> disk bandwidth requirements since at least one necessary page will
> already be in memory.[/color]

Time for a mental exercise.

Our server has 2 users. Each backend has to check with all the other
backends when it deletes a tuple (every update is also a delete, remember
every change in an MVCC database is a create / delte cycle.) Let's create
a name for the time it takes to do the update / mark deleted versus the
time it takes to contact each of those other backends. Tw is the Time to
do the work here, and Tc is the time to do the cleanup (i.e. vacuum the
tuple) Note that we'd also need a Ta for answering the requests of all
the other backends, but we can assume that on average, for each request a
child process makes, it will receive exactly that many from each other
backend running. Let x represent the number of backends. So the answer
time is equal to x*Tc

Time = Tw + Tc + Ta

Time = Tw + Tc + (x * Tc)

Time = Tw + ((x+1) * Tc)

and our cleanup time starts to grow at an ugly rate as the number of
backends increases. Lazy vacuuming allows the database to reclaim lost
space in the background, as the newer non-full vacuum does.

Many folks mistake this vacuum process for its older, slower cousin, full
vacuum, which does eat a lot more disk bandwidth and slow the machine
down.

On a Dual CPU X86 box a lazy vacuum running in a continuous loop will eat
about 5% of one CPU and drop pgbench scores by 10 to 15%. The important
thing here, is that the machine will still run quite snappily when you
throw several hundred clients at it, since the lazy vacuum just sits in
the background using the spare cycles and not much more.

that means your storage usage may baloon somewhat under intense usage, but
you won't have an IPC storm kill the performance of the postgresql server.

Knowing the postgresql development team, I'm sure the reasons they chose
are clearly stated in the hackers mailing list somewhere in time, so I'm
gonna go look, but trust me on one thing, the guys programming this
database don't do much because it's easier / faster to implement without
putting something in the TODO list about making it better some day.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postg resql.org so that your
message can get through to the mailing list cleanly

**Stephan Szabo** · Nov 11 '05, 11:27 PM

Re: Buglist

On Wed, 20 Aug 2003, Vivek Khera wrote:
[color=blue][color=green][color=darkred]
> >>>>> "BW" == Bruno Wolff, <Bruno> writes:[/color][/color]
>[color=green][color=darkred]
> >> to see it incremental. This would result in pretty much near zero
> >> internal fragmentation, I think.[/color][/color]
>
> BW> Why do you care about about the details of the implementation (rather than
> BW> the performance)? If it were faster to do it that way, that's how it would
> BW> have been done in the first place. The cost of doing the above is almost
> BW> certainly going to be an overall performance loser.
>
> I care for the performance. And how are you so sure that it was
> faster the way it is now? Are you sure it was not done this way
> because of ease of implementation?
>
> Seriously, how much slower can it be if the backend were to do the
> checking for external references upon updating/deleting a row? The
> cost would be distributed across time as opposed to concentrated at
> once within a vacuum process. I am fairly certian it would reduce
> disk bandwidth requirements since at least one necessary page will
> already be in memory.[/color]

In general, excepting a case with only one session running, I would expect
there to exist one or more transactions that can see the old state of the
row.

I would think it'd be something like:

T1: begin;
T1: update foo set bar=cheese where id=3;
T2: begin;
T2: select * from foo;
-- this should see the old state of the row, so that update better
-- not have removed the old one.
T1: end;
-- at this point read committed transactions may see the new
-- state, unless it's currently running a query which is accessing
-- that table, and in practice it's probably more complicated than
-- that (since you might run functions tha access the table).
-- You'd presumably need to wait until say the end of that full
-- statement at the very least. And any serializable transactions
-- would probably mean you can't do it either.

Under some circumstances it might be better, yes. And there are ways to
do things that don't involve non-overwriting (like Oracle's system) but
they all have downsides. This might not be the best solution, but you're
going to have to put alot more work into showing that your solution is
better. ;)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Jan Wieck** · Nov 11 '05, 11:27 PM

Re: Buglist

Alvaro Herrera wrote:
[color=blue]
> On Wed, Aug 20, 2003 at 10:31:25AM -0400, Vivek Khera wrote:
>[color=green]
>> Seriously, how much slower can it be if the backend were to do the
>> checking for external references upon updating/deleting a row? The
>> cost would be distributed across time as opposed to concentrated at
>> once within a vacuum process. I am fairly certian it would reduce
>> disk bandwidth requirements since at least one necessary page will
>> already be in memory.[/color]
>
> There's no way to check for "external references", because said
> references are actually the running transactions. So you can't drop a
> row until all the transactions that were active during your transaction
> are finished. Certainly your own backend can't do the checking, because
> there's no way to even assert that it will be live when those other
> transactions finish. Who will? The last of said transactions? It
> certainly will be expensive for a backend to keep track of the
> deleted/updated tuples by all _other_ backends, just in case...
> Long running transactions can't be neglected, so you can't keep it in
> memory.[/color]

This isn't so. It's regular backends that turn on the flag in a tuples
header telling that it's committed deleted. And they do so during a
normal scan, so even during a SELECT. It's not the backend who actually
deleted a row that knows, it's a later one checking the tuple visibility
that can tell "this one's dead for all of us and will not come alive via
rollback any more".

But, that doesn't help for the index removal explained earlier.

Jan

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== JanWieck@Yahoo. com #

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

**Vivek Khera** · Nov 11 '05, 11:27 PM

Re: Buglist

>>>>> "BW" == Bruno Wolff, <Bruno> writes:

BW> It would probably be a lot slower. Any transaction that has started
BW> but not yet finished would need to lock all rows that exist at during
BW> the transaction (for serialized transaction isolation you would only

Why would you need to lock rows? Does the current vacuum need it? I
don't think it does. Why can't the functionality of vacuum be made to
operate incrementally per row delete/update? I don't know if it is
possible.

BW> Also, since at least 7.3, normal vacuums aren't normally going to
BW> affect the performance of your database server that much.

I disagree. Triggering a vacuum on a db that is nearly saturating the
disk bandwidth has a significant impact.

BW> The main issue against the current vacuum system is that it requires the
BW> DBA knowing what vacuum does and figuring out how it should be used in
BW> their situation to get reasonable performance. This makes it a bit harder
BW> for non-DBAs to jump right in to Postgres without running into problems.

BW> However, the work on autovacuum seems to be providing a reasonable solution
BW> to that problem.

Yes, this is a good thing.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Bruno Wolff III** · Nov 11 '05, 11:27 PM

Re: Buglist

On Wed, Aug 20, 2003 at 12:40:03 -0400,
Vivek Khera <khera@kcilink. com> wrote:[color=blue][color=green][color=darkred]
> >>>>> "BW" == Bruno Wolff, <Bruno> writes:[/color][/color]
>
> BW> It would probably be a lot slower. Any transaction that has started
> BW> but not yet finished would need to lock all rows that exist at during
> BW> the transaction (for serialized transaction isolation you would only
>
> Why would you need to lock rows? Does the current vacuum need it? I
> don't think it does. Why can't the functionality of vacuum be made to
> operate incrementally per row delete/update? I don't know if it is
> possible.[/color]

How do you plan on knowing when no one is using a row any more?
The way vacuum uses is not suitable for deleting the row after the
last transaction that can see the row completes. When rows are created
they are marked with the transaction id of the transaction that created
the row. When they are deleted they are marked with the transaction id
of the transaction that deleted the row. Any transaction with an id
between those two ids can see the row. So it isn't the transaction that
deletes a row that needs to worry about marking its space as available,
but the last transaction that a row is visible to that would have to
do it. Naively, this entails scanning the whole database, just like a vacuum
would, after each completed transaction.
[color=blue]
> BW> Also, since at least 7.3, normal vacuums aren't normally going to
> BW> affect the performance of your database server that much.
>
> I disagree. Triggering a vacuum on a db that is nearly saturating the
> disk bandwidth has a significant impact.[/color]

If you are running that close to the edge you have potential problems
of which running vacuum is just one.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

PostgreSQL: PostgreSQL Mailing List Archives

http://archives.postgresql.org

**Vivek Khera** · Nov 11 '05, 11:27 PM

Re: Buglist

>>>>> "JW" == Jan Wieck <JanWieck@Yahoo .com> writes:

JW> remove all the index entries pointing to these ctid's. Your idea is (so
JW> far) lacking a place where to remember all the single removed rows and I
JW> assume you're not planning to pay the cost of a full scan over all
JW> indexes of a table to reclaim the space of one data row, are you?

Well, that pretty much kills my idea... back to autovacuum ;-)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postg resql.org so that your
message can get through to the mailing list cleanly

**Dennis Gearon** · Nov 11 '05, 11:27 PM

Re: Buglist

I don't think the problem will be deleting the tuple itself, if I've been following this correclty. If every transaction left it's signature in some common, global array, where each transaction finishing its work could find out 'who's out there', killing unnneed transactions could be part of the exit routine.

But getting the indexes cleaned up, that's another story, from what I've been listening to.

Bruno Wolff III wrote:
[color=blue]
> On Wed, Aug 20, 2003 at 12:40:03 -0400,
> Vivek Khera <khera@kcilink. com> wrote:
>[color=green][color=darkred]
>>>>>>>"BW" == Bruno Wolff, <Bruno> writes:[/color]
>>
>>BW> It would probably be a lot slower. Any transaction that has started
>>BW> but not yet finished would need to lock all rows that exist at during
>>BW> the transaction (for serialized transaction isolation you would only
>>
>>Why would you need to lock rows? Does the current vacuum need it? I
>>don't think it does. Why can't the functionality of vacuum be made to
>>operate incrementally per row delete/update? I don't know if it is
>>possible.[/color]
>
>
> How do you plan on knowing when no one is using a row any more?
> The way vacuum uses is not suitable for deleting the row after the
> last transaction that can see the row completes. When rows are created
> they are marked with the transaction id of the transaction that created
> the row. When they are deleted they are marked with the transaction id
> of the transaction that deleted the row. Any transaction with an id
> between those two ids can see the row. So it isn't the transaction that
> deletes a row that needs to worry about marking its space as available,
> but the last transaction that a row is visible to that would have to
> do it. Naively, this entails scanning the whole database, just like a vacuum
> would, after each completed transaction.
>
>[color=green]
>>BW> Also, since at least 7.3, normal vacuums aren't normally going to
>>BW> affect the performance of your database server that much.
>>
>>I disagree. Triggering a vacuum on a db that is nearly saturating the
>>disk bandwidth has a significant impact.[/color]
>
>
> If you are running that close to the edge you have potential problems
> of which running vacuum is just one.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>[/color]

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

**Jan Wieck** · Nov 11 '05, 11:27 PM

Re: Buglist

Vivek Khera wrote:
[color=blue][color=green][color=darkred]
>>>>>> "JW" == Jan Wieck <JanWieck@Yahoo .com> writes:[/color][/color]
>
> JW> remove all the index entries pointing to these ctid's. Your idea is (so
> JW> far) lacking a place where to remember all the single removed rows and I
> JW> assume you're not planning to pay the cost of a full scan over all
> JW> indexes of a table to reclaim the space of one data row, are you?
>
> Well, that pretty much kills my idea... back to autovacuum ;-)[/color]

Sorry that.

While at the "reclaim" and [non-] overwriting topic, the other day I had
some brilliant idea about index entries, so here comes your chance for
revenge ...

Assume we would have multiple ctid slots per index entry. An UPDATE on a
row that doesn't change indexed fields allways inserts the same index
value with a new ctid, so it'll come across this index tuple anyway. If
said index tuple has an empty slot, just put the ctid there and done. If
it does not, check the ctid's that are there if they point to "known
dead and ready to vacuum" tuples and make space. If that fails too, well
then insert another index tuple.

This has some advantages. Updating rows without changing key fields does
not result in constant index growth. An index scan will be fast even for
a value with lots of dead heap tuples. For non-unique indexes, multiple
data rows can "share" one index entry.

Index cleanup for vacuum remains pretty easy. Clear the ctid slots
pointing to removed tuples. Remove index tuples that have all slots cleared.

Jan

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== JanWieck@Yahoo. com #

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Tom Lane** · Nov 11 '05, 11:28 PM

Re: Buglist

Vivek Khera <khera@kcilink. com> writes:[color=blue]
> "JW" == Jan Wieck <JanWieck@Yahoo .com> writes:
> JW> remove all the index entries pointing to these ctid's. Your idea is (so
> JW> far) lacking a place where to remember all the single removed rows and I
> JW> assume you're not planning to pay the cost of a full scan over all
> JW> indexes of a table to reclaim the space of one data row, are you?[/color]
[color=blue]
> Well, that pretty much kills my idea... back to autovacuum ;-)[/color]

In addition to the index-cleanup issue that Jan explained, there are
locking problems. The tuple-is-dead hint bit mechanism is very
carefully designed so that a backend can set the hint bits while holding
only a shared lock on the page containing the tuple. Physically
removing a tuple requires a far stronger lock (see the source code for
details). Thus, having ordinary interactive backends remove tuples
would have bad consequences for concurrent performance.

But I think the real point here is that there's no reason to think that
doing tuple deletion on-the-fly in foreground transactions is superior
to doing it in background with a vacuum process. You're taking what
should be noncritical maintenance work and moving it into the critical
paths of your foreground applications. Not only that, but you're
probably doing more total work per tuple --- VACUUM "batches" its work
in more ways than just the index cleanup aspect, IIRC.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

**Andrew Sullivan** · Nov 11 '05, 11:28 PM

Re: Buglist

On Wed, Aug 20, 2003 at 12:40:03PM -0400, Vivek Khera wrote:[color=blue][color=green][color=darkred]
> >>>>> "BW" == Bruno Wolff, <Bruno> writes:[/color][/color]
> BW> Also, since at least 7.3, normal vacuums aren't normally going to
> BW> affect the performance of your database server that much.
>
> I disagree. Triggering a vacuum on a db that is nearly saturating the
> disk bandwidth has a significant impact.[/color]

Vivek is right about this. If your system is already very busy, then
a vacuum on a largish table is painful.

I don't actually think having the process done in real time will
help, though -- it seems to me what would be more useful is an even
lazier vacuum: something that could be told "clean up as cycles are
available, but make sure you stay out of the way." Of course, that's
easy to say glibly, and mighty hard to do, I expect.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew@liberty rms.info> M2P 2A8
+1 416 646 3304 x110

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

**Jan Wieck** · Nov 11 '05, 11:28 PM

Re: Buglist

Andrew Sullivan wrote:
[color=blue]
> On Wed, Aug 20, 2003 at 12:40:03PM -0400, Vivek Khera wrote:[color=green][color=darkred]
>> >>>>> "BW" == Bruno Wolff, <Bruno> writes:[/color]
>> BW> Also, since at least 7.3, normal vacuums aren't normally going to
>> BW> affect the performance of your database server that much.
>>
>> I disagree. Triggering a vacuum on a db that is nearly saturating the
>> disk bandwidth has a significant impact.[/color]
>
> Vivek is right about this. If your system is already very busy, then
> a vacuum on a largish table is painful.
>
> I don't actually think having the process done in real time will
> help, though -- it seems to me what would be more useful is an even
> lazier vacuum: something that could be told "clean up as cycles are
> available, but make sure you stay out of the way." Of course, that's
> easy to say glibly, and mighty hard to do, I expect.[/color]

What about a little hint to the buffer management that if it has to
evict another buffer to physically read this one (meaning the buffer
pool was full already) then it will not put this buffer at the top of
the LRU chain but rather at it's end? This way a vacuum on a large table
will not cause a complete cache eviction.

Might be a useful hint for sequential scans too.

Jan

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== JanWieck@Yahoo. com #

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

PostgreSQL: PostgreSQL Mailing List Archives

http://archives.postgresql.org

**Karsten Hilbert** · Nov 11 '05, 11:28 PM

Re: Buglist

> Just nice'ing the VACUUM process is likely to be counterproducti ve[color=blue]
> because of locking issues (priority inversion).[/color]
OK, getting out the brown paper bag |-(

Is there any concept of db engine idleness obtainable from
states of PG internal variables that might be leveraged ?

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

**Tom Lane** · Nov 11 '05, 11:28 PM

Re: Buglist

Jan Wieck <JanWieck@Yahoo .com> writes:[color=blue]
> What about a little hint to the buffer management that if it has to
> evict another buffer to physically read this one (meaning the buffer
> pool was full already) then it will not put this buffer at the top of
> the LRU chain but rather at it's end? This way a vacuum on a large table
> will not cause a complete cache eviction.[/color]

I think what we really need is a way to schedule VACUUM's I/O at a lower
priority than normal I/Os. Wouldn't be very portable :-( ... but if the
OS offers a facility for requesting this, it'd be worth experimenting
with.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postg resql.org so that your
message can get through to the mailing list cleanly

**Alvaro Herrera** · Nov 11 '05, 11:28 PM

Re: Buglist

On Wed, Aug 20, 2003 at 05:21:35PM -0400, Jan Wieck wrote:
[color=blue]
> What about a little hint to the buffer management that if it has to
> evict another buffer to physically read this one (meaning the buffer
> pool was full already) then it will not put this buffer at the top of
> the LRU chain but rather at it's end? This way a vacuum on a large table
> will not cause a complete cache eviction.
>
> Might be a useful hint for sequential scans too.[/color]

Somebody was playing with using LRU-2 or ARC or some other algorithm for
page replacement instead of the current LRU... wouldn't it help with
this too?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La vida es para el que se aventura"

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

PostgreSQL: PostgreSQL Mailing List Archives

http://archives.postgresql.org

Buglist

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment