What do you think about the checksum function ?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Kurt

    What do you think about the checksum function ?

    Hi,

    I'd like advices about an idea I add to resolve a problem. thanks to
    you in advance for yours answers.
    I have a database with tables that I load with flat file. The size of
    each table is 600 Mb. The flat file are the image of an application
    and there is no updated date or created date on any table. So my
    tables are just a copy of the data from the flat file.

    Now I'd like to create an History Table. So I have to determine which
    lines changed and which one did'nt.
    As I don't have any date on my row the only answer I had unil know was
    to check each column on each row to see if any data changed. If the
    data changed I add a new line in my history date.

    My idea is to add a checksum column in both table on all columns. To
    know if any data change I just have to check my PK + my checksum
    column.
    Do you think that is a good idea ? Is checksum a quick function or
    not ?.

    Thanks.

    --
    K

  • Erland Sommarskog

    #2
    Re: What do you think about the checksum function ?

    Kurt (nicolas.agrapa rt@gmail.com) writes:
    I'd like advices about an idea I add to resolve a problem. thanks to
    you in advance for yours answers.
    I have a database with tables that I load with flat file. The size of
    each table is 600 Mb. The flat file are the image of an application
    and there is no updated date or created date on any table. So my
    tables are just a copy of the data from the flat file.
    >
    Now I'd like to create an History Table. So I have to determine which
    lines changed and which one did'nt.
    As I don't have any date on my row the only answer I had unil know was
    to check each column on each row to see if any data changed. If the
    data changed I add a new line in my history date.
    >
    My idea is to add a checksum column in both table on all columns. To
    know if any data change I just have to check my PK + my checksum
    column.
    Do you think that is a good idea ? Is checksum a quick function or
    not ?.
    Neither checksum() nor binary_checksum () are very useful. I think they
    based on XOR, and they would too often say a row is unchanged when it
    has not. It would be a lot safer to compare all columns?

    Exactly how do update your tables? To blow all existing data away and
    reload, or do you INSERT new, update existing ones etc? In such case a
    timestamp column could work for you. (Timestamp here has nothing to
    do with date and time.)


    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server 2005 at

    Books Online for SQL Server 2000 at

    Comment

    • Kurt

      #3
      Re: What do you think about the checksum function ?


      On 29 mai, 23:31, Erland Sommarskog <esq...@sommars kog.sewrote:
      Kurt (nicolas.agrap. ..@gmail.com) writes:
      I'd like advices about an idea I add to resolve a problem. thanks to
      you in advance for yours answers.
      I have a database with tables that I load with flat file. The size of
      each table is 600 Mb. The flat file are the image of an application
      and there is no updated date or created date on any table. So my
      tables are just a copy of the data from the flat file.
      >
      Now I'd like to create an History Table. So I have to determine which
      lines changed and which one did'nt.
      As I don't have any date on my row the only answer I had unil know was
      to check each column on each row to see if any data changed. If the
      data changed I add a new line in my history date.
      >
      My idea is to add a checksum column in both table on all columns. To
      know if any data change I just have to check my PK + my checksum
      column.
      Do you think that is a good idea ? Is checksum a quick function or
      not ?.
      >
      Neither checksum() nor binary_checksum () are very useful. I think they
      based on XOR, and they would too often say a row is unchanged when it
      has not. It would be a lot safer to compare all columns?
      >
      Exactly how do update your tables? To blow all existing data away and
      reload, or do you INSERT new, update existing ones etc? In such case a
      timestamp column could work for you. (Timestamp here has nothing to
      do with date and time.)
      >
      --
      Erland Sommarskog, SQL Server MVP, esq...@sommarsk og.se
      >
      Books Online for SQL Server 2005 athttp://www.microsoft.c om/technet/prodtechnol/sql/2005/downloads/books...
      Books Online for SQL Server 2000 athttp://www.microsoft.c om/sql/prodinfo/previousversion s/books.mspx- Masquer le texte des messages précédents -
      >
      - Afficher le texte des messages précédents -
      Hello,

      I read on the msdn website that BINARY_CHECKSUM can be used to detect
      changes to a row of a table.
      To update my current tables, I truncate and after I insert the new
      datas.
      What do you call a timestamp ? How timestamp could work for me ?

      Thanks,

      --
      K

      Comment

      • Erland Sommarskog

        #4
        Re: What do you think about the checksum function ?

        Kurt (nicolas.agrapa rt@gmail.com) writes:
        I read on the msdn website that BINARY_CHECKSUM can be used to detect
        changes to a row of a table.
        To update my current tables, I truncate and after I insert the new
        datas.
        What do you call a timestamp ? How timestamp could work for me ?
        If you truncate and insert, timestamp is not going help you. A timestamp
        column is automatically updated when a column is updated with a database-
        unique monotonically increasing value. So if you had updated only rows that
        had changed, and inserted new ones, you could have saved the current
        timestamp value when you started, and then all rows with a higher timestamp
        value would have been new.

        But since you truncate and re-insert, all timestamp values will be higher
        when you started.

        As I recall binary_checksum () returns a 32-bit integer. If your tables
        are wide, this means that collisions will not at all be unlikely. Note
        also that binary_checksum ignores some data types entirely.

        While more boring to code, I would recommend that you make a comparison
        column by column.


        --
        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

        Books Online for SQL Server 2005 at

        Books Online for SQL Server 2000 at

        Comment

        Working...