Database Query Contains Old Data

**Jerry Hill** · Jun 27 '08, 04:25 PM

Re: Database Query Contains Old Data

On Wed, May 21, 2008 at 6:30 AM, <giraffeboy@gma il.comwrote:

The first time you run a report, everything works as expected but if
you run it a second time, after modifying data, it seems that the data
from before the modification is selected on the second report run.

Did you remember to commit your changes before re-running the report?
Python's DB API requires that any auto-commit feature of the
underlying database be turned off by default, so you are required to
commit changes yourself. If you're used to having auto-commit turned
on, this can be confusing.

See http://www.python.org/dev/peps/pep-0249/ for more details of
python's DB API.

--
Jerry

**Paul Boddie** · Jun 27 '08, 04:25 PM

Re: Database Query Contains Old Data

On 21 Mai, 15:22, giraffe...@gmai l.com wrote:

>
I did and I confirmed this by modifying the data, selecting it from
the mysql command line client to verify the changes, then running the
report again. If I exit the application and then start it again,
everything works as expected until the second instance of the report
is run.

Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data. The reason for this behaviour
is that the DB-API modules will have begun a transaction on your
behalf, and while that transaction is open, changes committed in other
transactions may be unavailable to your own transaction, depending on
the transaction isolation level.

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

Paul

[1] http://dev.mysql.com/doc/refman/5.1/...isolation.html
[2] http://www.postgresql.org/docs/8.1/i...ction-iso.html

**giraffeboy@gmail.com** · Jun 27 '08, 04:25 PM

Re: Database Query Contains Old Data

On May 21, 3:23 pm, Paul Boddie <p...@boddie.or g.ukwrote:

Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data. The reason for this behaviour
is that the DB-API modules will have begun a transaction on your
behalf, and while that transaction is open, changes committed in other
transactions may be unavailable to your own transaction, depending on
the transaction isolation level.

Thanks for that Paul, seems to have solved the problem perfectly. I
had always just thought querying a database would always give you the
most current data, guess it just goes to show that things are never as
simple as they first appear!

**John Nagle** · Jun 27 '08, 04:26 PM

Re: Database Query Contains Old Data

Paul Boddie wrote:

On 21 Mai, 15:22, giraffe...@gmai l.com wrote:

>I did and I confirmed this by modifying the data, selecting it from
>the mysql command line client to verify the changes, then running the
>report again. If I exit the application and then start it again,
>everything works as expected until the second instance of the report
>is run.

>
Note that if you have a connection open in your program, especially if
that connection has already been used to select data, it may be the
case that you then have to perform a rollback or commit before
attempting to access newly added data.

Exactly. Although it seems counterintutive , it's not enough
to do a COMMIT after UPDATE and INSERT operations. You also have to
do a COMMIT after a SELECT if you're going to reuse the database handle
and do another SELECT. Otherwise, you reread the same data forever.

COMMIT, by the way, is per database handle, so if you have
multiple database handles, each needs to handle its own COMMIT needs.

John Nagle

**Paul Boddie** · Jun 27 '08, 04:26 PM

Re: Database Query Contains Old Data

On 23 Mai, 17:18, John Nagle <na...@animats. comwrote:

>
Exactly. Although it seems counterintutive , it's not enough
to do a COMMIT after UPDATE and INSERT operations. You also have to
do a COMMIT after a SELECT if you're going to reuse the database handle
and do another SELECT. Otherwise, you reread the same data forever.

You can also do a rollback, as I noted, since the aim is merely to
obtain a new transaction by discarding the current one. Upon
performing a new select using the DB-API such a new transaction will
then be obtained, since the rollback or commit will have ensured that
no transaction is currently open.

If the DB-API exposed the MySQL/PostgreSQL semantics with explicit
transactions (see [1] for pertinent material), then the technique
discussed above would either be superfluous (you might not be using
transactions at all) or more obvious (you would have issued an
explicit "start transaction" or "begin" command), but there are
obviously good arguments for exposing the standard semantics through
the API instead.

COMMIT, by the way, is per database handle, so if you have
multiple database handles, each needs to handle its own COMMIT needs.

That's worth remembering, yes.

Paul

[1] http://www.postgresql.org/docs/8.1/s...ansaction.html

**James A. Donald** · Jun 27 '08, 04:27 PM

Re: Database Query Contains Old Data

On Wed, 21 May 2008 07:23:04 -0700 (PDT), Paul Boddie

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

If one has transactions open for a long time, or transactions that
involve a great deal of data, this will result in poor performance or
poor scalability. But one may have such large transactions without
being aware of it. Is there any way to make transaction size salient
to the developer? Any way to make sure one is committing as early and
often as possible?

--
----------------------
We have the right to defend ourselves and our property, because
of the kind of animals that we are. True law derives from this
right, not from the arbitrary power of the omnipotent state.

http://www.jim.com/ James A. Donald

**Paul Boddie** · Jun 27 '08, 04:27 PM

Re: Database Query Contains Old Data

On 3 Jun, 00:17, James A. Donald <jam...@echeque .comwrote:

On Wed, 21 May 2008 07:23:04 -0700 (PDT), Paul Boddie
>

MySQL appears to use "repeatable read" by default [1] as its
transaction isolation level, whereas PostgreSQL (for example) uses
"read committed" by default [2]. I would guess that if you were using
PostgreSQL, this particular problem would not have occurred, but there
are other reasons to be aware of the effects of long duration
transactions in PostgreSQL, and the practice of periodically
performing a rollback would still be worth considering with that
database system.

>
If one has transactions open for a long time, or transactions that
involve a great deal of data, this will result in poor performance or
poor scalability.

I think you need to explain this to me. If there's a long-running
transaction happening in the background and my own transactions get
created and rolled back periodically, how would the long-running
transaction be affected? If, on the other hand, my own transaction is
long-running, I can see that rolling it back would incur a cost, but
what choice do I have other than to perform a rollback more often (or
to disable transactions, which might incur other costs)? I don't want
to perform a commit instead merely for performance reasons, especially
if it impacts correctness.

I was actually thinking of lock acquisition in PostgreSQL when I made
the remark. With lots of tables in a database, it's possible to
acquire a large number of locks, and retaining locks can also prevent
other operations from being carried out.

But one may have such large transactions without
being aware of it. Is there any way to make transaction size salient
to the developer? Any way to make sure one is committing as early and
often as possible?

I'm not aware of anything which will tell you how big your transaction
is, but there may be some kind of table or function which provides
some details about such things. However, it is possible to see how
many locks your transaction has, and on which tables.

Paul

**M.-A. Lemburg** · Jun 27 '08, 04:27 PM

Re: Database Query Contains Old Data

On 2008-06-03 00:17, James A. Donald wrote:

On Wed, 21 May 2008 07:23:04 -0700 (PDT), Paul Boddie

>MySQL appears to use "repeatable read" by default [1] as its
>transaction isolation level, whereas PostgreSQL (for example) uses
>"read committed" by default [2]. I would guess that if you were using
>PostgreSQL, this particular problem would not have occurred, but there
>are other reasons to be aware of the effects of long duration
>transactions in PostgreSQL, and the practice of periodically
>performing a rollback would still be worth considering with that
>database system.

>
If one has transactions open for a long time, or transactions that
involve a great deal of data, this will result in poor performance or
poor scalability.

Poor performance is usually not an issue since databases are
optimized to work with transactions.

What's more important is that an open transaction will cause locks
on the tables you are writing to. Depending on the database
backend these locks may lock the entire table or just a few rows.

In any case, such locks prevent accessing the tables or rows
in question from other connections and that will quickly turn
into a major problem if you have more than just one connection
to the database.

But one may have such large transactions without
being aware of it. Is there any way to make transaction size salient
to the developer? Any way to make sure one is committing as early and
often as possible?

This depends on the database in question. By accessing system tables
directly you can usually find out a lot about the database and your
current transaction.

That said, it's not a good idea to commit a logical transaction (ie.
a user entering data) in multiple chunks. You'd lose the most important
feature of transactions: that of being able to rollback to the start
of the transaction.

As others have mentioned, in systems that have long running logical
transactions, it's usually best to collect the data until the very
end and then apply all changes in one go (and one database
transaction).

Another problem with long running transactions is that the data
in the tables may change after the start of the transaction. This
can result in invalid data being committed (e.g. one part of a calculation
uses the data at time t1 and another at time t2). This can be
avoided by using snapshots, versioning and timestamps in the tables,
so that all queries use the same data.

And if this were not enough, you often run into conflicts during
the commit phase due to changes made by others to the same tables.

These can usually only be avoided by implementing merge strategies
in your application, unless you want to lock out all other users
during the transaction ... which would bring you back to the
original problem.

In summary: long running transactions are not easy to get right :-)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jun 03 2008)

>>Python/Zope Consulting and Support ... http://www.egenix.com/
>>mxODBC.Zope.D atabase.Adapter ... http://zope.egenix.com/
>>mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

_______________ _______________ _______________ _______________ ____________
2008-07-07: EuroPython 2008, Vilnius, Lithuania 33 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,S olaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

**James A. Donald** · Jun 27 '08, 04:27 PM

Re: Database Query Contains Old Data

On Tue, 03 Jun 2008 12:07:07 +0200, "M.-A. Lemburg" <mal@egenix.com >
wrote:

As others have mentioned, in systems that have long running logical
transactions, it's usually best to collect the data until the very
end and then apply all changes in one go (and one database
transaction).

I understand you to mean that one should arrange matters so that what
is a lengthy transaction from the point of view of the user is a short
transaction from the point of view of the database.

--
----------------------
We have the right to defend ourselves and our property, because
of the kind of animals that we are. True law derives from this
right, not from the arbitrary power of the omnipotent state.

http://www.jim.com/ James A. Donald

**M.-A. Lemburg** · Jun 27 '08, 04:27 PM

Re: Database Query Contains Old Data

On 2008-06-03 14:29, James A. Donald wrote:

On Tue, 03 Jun 2008 12:07:07 +0200, "M.-A. Lemburg" <mal@egenix.com >
wrote:

>As others have mentioned, in systems that have long running logical
>transactions , it's usually best to collect the data until the very
>end and then apply all changes in one go (and one database
>transaction) .

>
I understand you to mean that one should arrange matters so that what
is a lengthy transaction from the point of view of the user is a short
transaction from the point of view of the database.

Yes, mainly to avoid database locks.

In a multi-user environment, you will need to add some merge logic
in your application, to prevent conflict errors in the database.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jun 04 2008)

>>Python/Zope Consulting and Support ... http://www.egenix.com/
>>mxODBC.Zope.D atabase.Adapter ... http://zope.egenix.com/
>>mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

_______________ _______________ _______________ _______________ ____________
2008-07-07: EuroPython 2008, Vilnius, Lithuania 32 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,S olaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

Database Query Contains Old Data

Database Query Contains Old Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment