How do I compare these 2 tables ?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jeddiki
    Contributor
    • Jan 2009
    • 290

    How do I compare these 2 tables ?

    Hi,

    I have two comparisons that I need to do and they are a bit
    beyond my knowledge, would appreciate some help. :)

    I have two tables that have an identical structure,
    one is the incoming daily update, tableA which I want to
    compare to the history tableB.

    Both structures look like this:
    id, title, desc, data1, data2, ... data18

    (1)
    The first comparison is to check for new rows. I think I can
    do this by using the LEFT JOIN command

    Is this how I do it ?

    Code:
    $query = "SELECT tableA.*, tableB.* FROM tableA LEFT JOIN tableB ON tableA.id = tableB.id"; 
    	 
    $result = mysql_query($query) or die(mysql_error());
    (2)
    My second comparison is to compare all the rows that have same ids
    but have differences in the other fields i.e. not NEW records but CHANGED records.
  • Atli
    Recognized Expert Expert
    • Nov 2006
    • 5062

    #2
    Hey.

    Would I be right in guessing that the goal of these comparisons is to insert the data from the "input" table into the "history" table, adding new rows as new rows but updating old rows with the new data? (New and old rows being defined by whether their IDs preexist in the history table.)

    If so, you don't have to do that manually. You can use the ON DUPLICATE KEY UPDATE clause of the INSERT statement to do this automatically.

    For example, say I have a "input_tabl e" and "storage_ta ble" tables who both share the exact same structure:
    Code:
    (
        `id` Serial Primary Key, 
        `title` VarChar(255) Not Null Default 'Untitled', 
        `desc` VarChar(255) Not Null Default 'No description'
    )
    Given that they have the following data:
    [code=sql]INSERT INTO `storage_table`
    (`id`, `title`, `desc`)
    VALUES
    (1, 'First in storage', 'This is the first row defined in the storage table'),
    (2, 'Second in storage', 'This is the second row defined in the storage table'),
    (3, 'Third in storage', 'This is the third row defined in the storage table');

    INSERT INTO `input_table`
    (`id`, `title`, `desc`)
    VALUES
    (1, 'First from input', 'This is the first row, updated fromt he input table.'),
    (3, 'Third from input', 'This is the third row, updated fromt he input table.'),
    (4, 'Fourth from input', 'This is the fourth row, new from the input table.');[/code]

    I could issue this command:
    [code=sql]INSERT INTO `storage_table`
    (`id`, `title`, `desc`)
    SELECT
    `id`, `title`, `desc`
    FROM `input_table`
    ON DUPLICATE KEY UPDATE
    `id` = VALUES(`id`),
    `title` = VALUES(`title`) ,
    `desc` = VALUES(`desc`);[/code]

    After which the data in the "storage_ta ble" would become:
    Code:
    +----+-------------------+------------------------------------------------------+
    | id | title             | desc                                                 |
    +----+-------------------+------------------------------------------------------+
    |  1 | First from input  | This is the first row, updated fromt he input table. | 
    |  2 | Second in storage | This is the second row defined in the storage table  | 
    |  3 | Third from input  | This is the third row, updated fromt he input table. | 
    |  4 | Fourth from input | This is the fourth row, new from the input table.    | 
    +----+-------------------+------------------------------------------------------+
    See what I mean?

    Comment

    • dgreenhouse
      Recognized Expert Contributor
      • May 2008
      • 250

      #3
      UPDATE:
      Looking at Atli's post I realized that (his?) recommendation works in one fell-swoop.
      I'd go with that... But you will need to specify all the column names...
      You can leave out the column names in insert and the sub-select, but you WILL need all of the column names after the 'on duplicate key update' clause.
      i.e.
      Code:
      insert into tableb (select * from tablea)
      on duplicate key update
      id = values(id),
      title = values(title),
      `desc` = values(`desc`),
      data1 = values(data1),
      ...
      data18 = values(data18);
      
      Note the backticks ` encapsulating `desc`
      desc is a reserved word in MySQL
      - ergo the need for the backtick.
      As Atli showed, it's probably best to always use backticks 
      to avoid query failures when using reserved words as 
      column and/or table names. 
      But it's best to avoid reserved words altogether.
      END-UPDATE:

      For the first criteria - (pulling new records in) - this should work:
      insert into tableB (select * from tableA where id not in (select id from tableB));

      For the second criteria, you could use Atli's suggestion by using the "on duplicate key update" clause of an insert statement.

      The one problem I see with this is when you get a lot of records in your history table (tableB). The other way requires a fairly complex select/insert statement.

      But it's best to K.I.S.S.

      Hope that helps...

      By the way, unless there's a compelling reason to have the tables structured the way you have them, I can foresee problems later when you might want to expand the application.

      (i.e. having that many fields named data1, data2, ..., data18)

      But there's nothing inherently bad about the structure if that's indeed what you need.

      (note: if indeed your structure is like data1,data2..., it will make the more complex select update command(s) easier.)

      If you haven't already, look into "normalization" , but as many times is the case, it's just too hard to go back and redesign an application that works for the most part. But DO look into normalization for future projects.

      Comment

      • Atli
        Recognized Expert Expert
        • Nov 2006
        • 5062

        #4
        Originally posted by dgreenhouse
        If you haven't already, look into "normalization" , but as many times is the case, it's just too hard to go back and redesign an application that works for the most part. But DO look into normalization for future projects.
        I agree, it is very important to keep the normalization rules in mind when designing you databases. Normalized databases are generally easier to maintain and upgrade.

        You can check out Database Normalization and Table Structures in our MS Access forums. It describes these rules very nicely.

        Originally posted by dgreenhouse
        (his?)
        Indeed :)

        Comment

        Working...