BINARY_CHECKSUM algorithm

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Orly Junior

    BINARY_CHECKSUM algorithm

    Hello,

    Do you know if the algorithm for the BINARY_CHECKSUM function in documented
    somewhere?
    I would like to use it to avoid returning some string fields from the
    server.
    By returning only the checksum I could lookup the string in a hashtable and
    I think this could make the code more efficient on slow connections.

    Thanks in advanced and kind regards,

    Orly Junior



  • David Portas

    #2
    Re: BINARY_CHECKSUM algorithm

    I don't know where the algorithm is documented but I don't think it will
    help you. There are many more possible strings than checksums so there isn't
    a one-to-one correspondence between them. Unless your set of possible
    strings is constrained to a small set you couldn't guarantee to translate a
    checksum back into a string and you'd still have to maintain a lookup table
    of strings on the client side. So if your set of strings is small and
    constrained then you may as well invent your own codes. Alternatively, take
    a look at Huffman Coding or one of the zip compression algorithms.

    --
    David Portas
    SQL Server MVP
    --


    Comment

    • Erland Sommarskog

      #3
      Re: BINARY_CHECKSUM algorithm

      Orly Junior (nomail@nomail. com) writes:[color=blue]
      > Do you know if the algorithm for the BINARY_CHECKSUM function in
      > documented somewhere? I would like to use it to avoid returning some
      > string fields from the server. By returning only the checksum I could
      > lookup the string in a hashtable and I think this could make the code
      > more efficient on slow connections.[/color]

      Risky business. The checksum algorithm is fairly simple-minded. I beleive
      it uses some xor mechanism. I don't have the references around right now,
      but I recall that SQL Server MVP Steve Kass demonstrated how some quite
      small changes could result in the same checksum.

      Better in such case, to augment the table with a timestamp column. Such
      a column is automatically updated every time SQL Server updates the
      row. So you could store the timestamp client side, and pass that value,
      if the table has the same value, there is no need for a refresh.

      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server SP3 at
      Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

      Comment

      • Steve Kass

        #4
        Re: BINARY_CHECKSUM algorithm

        Orly,

        Here is what I believe it does for a single varchar
        column. It's not a particularly good hash function at
        all.

        create function binary_checksum _varchar (
        @t varchar(1000)
        ) returns int as begin
        declare @b bigint set @b = 0
        declare @c tinyint
        declare @s bit set @s = 0
        declare @i int set @i = 1


        while @i <= len(@t) begin
        set @c = ascii(substring (@t,@i,1))
        set @b = @b / 16 * 16 + @b % 16 ^ @c / 16
        set @b = @b * 16 + @c % 16
        if @c >= 128 begin
        set @b = @b ^ 0xFF
        set @s = 1 - @s
        end
        set @b = @b % 0x0100000000 ^ @b / 0x0100000000
        set @i = @i + 1
        end


        if @s = 1 set @b = @b ^ 0xFFFFFFFF
        if @b >= 0x80000000 set @b = @b | 0xFFFFFFFF00000 000
        return @b
        end
        go


        You'll find more information if some of the threads
        here:



        Steve Kass
        Drew University

        Orly Junior wrote:
        [color=blue]
        > Hello,
        >
        > Do you know if the algorithm for the BINARY_CHECKSUM function in documented
        > somewhere?
        > I would like to use it to avoid returning some string fields from the
        > server.
        > By returning only the checksum I could lookup the string in a hashtable and
        > I think this could make the code more efficient on slow connections.
        >
        > Thanks in advanced and kind regards,
        >
        > Orly Junior
        >
        >
        >[/color]

        Comment

        Working...