Re: Temporal Databases (Database design questions)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • David

    Re: Temporal Databases (Database design questions)

    On Wed, Jun 18, 2008 at 11:16 AM, M.-A. Lemburg <mal@egenix.com wrote:
    On 2008-06-18 09:41, David wrote:
    >>
    >Question 3: Temporal databases
    >>
    >http://en.wikipedia.org/wiki/Temporal_database
    >>
    >I haven't used them before, but I like the idea of never
    >deleting/updating records so you have a complete history (a bit like
    >source code version control).
    >>
    >How well do temporal databases work? Do RDBMS (like Postgresql) need
    >add-ons to make it effective, or can you just add extra temporal
    >columns to all your tables and add them to your app queries? Does this
    >increase app complexity and increase server load a lot?
    >>
    >Are there Python libraries which simplify this? (eg: add-ons for
    >Elixir or SQLAlchemy).
    >>
    >Or should apps all implement their own 'temporal data access' module,
    >which transparently uses the current date & time until queried for
    >historical data?
    >
    You can have complete history in a database schema by:
    >
    * adding a version column
    * adding a modification timestamp (and modification username,
    if that's relevant for you)
    * updating the version upon INSERT and UPDATE
    * have a history table for each "live" table that gets
    filled using a trigger on the version column which moves
    the inserted/updated/deleted rows into the history table
    * the history table will have to have an additional column
    for storing the method of how the row got into the table
    (ie. insert/update/delete)
    >
    The main app will only use the "live" tables with the current
    data. An audit tool would then provide access to the history
    tables.
    >
    This works for all databases that have triggers, even SQLite.
    >
    Thanks for your reply.

    How do you maintain foreign key references with this approach?

    eg, you have these 4 tables:

    table1
    - id
    - field1
    - field2
    - field3
    - version
    - modified

    table1_history
    - id
    - field1
    - field2
    - field3
    - version
    - modified
    - updatemethod

    table2
    - id
    - table1_id
    - field1
    - field2
    - field3
    - version
    - modified

    table2_history
    - id
    - table1_id
    - field1
    - field2
    - field3
    - version
    - modified
    - updatemethod

    Should table2_history. table1_id point to table1.id, or table1_history. id?

    If table2_history. table1_id points to table1.id, then you will have
    problems with when you remove records from table1.

    If table2_history. table1_id points to table1_history. id, then you need
    to make a table1_history entry for the new values.

    Also, when you start updating records in table1, then table2 and/or
    table2_history will still be pointing to old table1 records, instead
    of the new value.

    What this probably means, is that whenever you make any changes to
    records, then:

    1) Make a history entry for the record (as you described)

    2) Also make new history entries for all records that depend on the
    record that was updated, and for their sub-dependencies too,
    recursively (even if those dependent records weren't themselves
    updated). The new history record foreign keys should always point to
    other historical records.

    There may be cases where you can skip adding redundant records to the
    history tables. But if your primary keys (in history tables) are
    auto-incrementing integers, then the foreign keys (in the dependant
    history tables) will all need to update in a 'cascading' way (so they
    all point to records which are correct for that point in time).

    This would probably also be a problem for regular temporal databases,
    unless they have some built-in 'snapshot all foreign dependencies'
    function.

    How is this normally handled?

    One method (stealing idea from git) would be for historical tables to
    use 'hash' values for primary and foreign keys. The 'hash' (for
    primary key) would be calculated from the other values in the record.
    Foreign keys for dependent then also become hash strings, pointing to
    the correct parent record.

    David.
Working...