Generating a Hash

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Stan

    Generating a Hash

    Is it possible to hash a 100 bytes string to a integer? I found a few .NET
    classes for that
    such as Sha1Managed.Com puteHash but they return bytes. I am just not sure
    about the idea of converting 100 bytes to four or eight without loosing
    uniqueness.

    The issue has come up because I am storing bills with customers in database
    and I would like to reuse customers, so that not every bill has its own
    customer. In order to do that I need to make a some sort of unique code for
    each customer based on name, address, city, state, zip. I want to use the
    whole customer name, because very often there are customers in the same city
    with the name only different in the last few characters.

    Thanks,

    -Stan


  • Jon Skeet

    #2
    Re: Generating a Hash

    Stan <nospam@yahoo.c om> wrote:[color=blue]
    > Is it possible to hash a 100 bytes string to a integer? I found a few .NET
    > classes for that such as Sha1Managed.Com puteHash but they return bytes.[/color]

    Sure, but you can convert 4 bytes into an integer or 8 bytes into a
    long pretty easily.
    [color=blue]
    > I am just not sure about the idea of converting 100 bytes to four or
    > eight without loosing uniqueness.[/color]

    Well obviously you can't do that - there are far more sequences of 100
    bytes than there are of 4 or 8.
    [color=blue]
    > The issue has come up because I am storing bills with customers in database
    > and I would like to reuse customers, so that not every bill has its own
    > customer. In order to do that I need to make a some sort of unique code for
    > each customer based on name, address, city, state, zip. I want to use the
    > whole customer name, because very often there are customers in the same city
    > with the name only different in the last few characters.[/color]

    Rather than assume the hash code itself is identical, just assign each
    customer a unique ID and look it up based on name, address, city, state
    and zip when you need to retrieve it.

    --
    Jon Skeet - <skeet@pobox.co m>
    Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.

    If replying to the group, please do not mail me too

    Comment

    • Stan

      #3
      Re: Generating a Hash

      > Rather than assume the hash code itself is identical, just assign each[color=blue]
      > customer a unique ID and look it up based on name, address, city, state
      > and zip when you need to retrieve it.[/color]

      Then I will have this query:

      select * ..... from ... where name = @name and address = @address and city
      = @city
      and state = @state and zip = @zip

      It is by far more efficient to have

      select * ..... from ... where code = @code
      [color=blue]
      >
      > --
      > Jon Skeet - <skeet@pobox.co m>
      > http://www.pobox.com/~skeet
      > If replying to the group, please do not mail me too[/color]


      Comment

      • Guinness Mann

        #4
        Re: Generating a Hash

        In article <uG4mtz1gDHA.36 16@TK2MSFTNGP11 .phx.gbl>, nospam@yahoo.co m
        says...[color=blue]
        > select * ..... from ... where name = @name and address = @address and city
        > = @city
        > and state = @state and zip = @zip
        >
        > It is by far more efficient to have
        >
        > select * ..... from ... where code = @code[/color]

        I don't think you quite understand what a hash is, Stan. Hashes are not
        guaranteed to be unique. They're just a way of localizing sparse data.
        You *always* have to check for collisions with a hash.

        As Jon mentioned, how could you possibly generate unique 8 (or 4) byte
        values for each possible value of a 100-byte string? Think about it.

        Why not look up "hashing with linear probing" to see a possible solution
        for your problem.

        -- Rick

        Comment

        • Jon Skeet

          #5
          Re: Generating a Hash

          Stan <nospam@yahoo.c om> wrote:[color=blue][color=green]
          > > Rather than assume the hash code itself is identical, just assign each
          > > customer a unique ID and look it up based on name, address, city, state
          > > and zip when you need to retrieve it.[/color]
          >
          > Then I will have this query:
          >
          > select * ..... from ... where name = @name and address = @address and city
          > = @city
          > and state = @state and zip = @zip
          >
          > It is by far more efficient to have
          >
          > select * ..... from ... where code = @code[/color]

          Sure - if you don't mind the fact that your code won't necessarily be
          unique...

          Of course, it's *unlikely* that you'll get a hash collision, if you
          only have a few thousand entries - but that may not be good enough.

          (What you could do is search by hash and then verify each field
          separately, of course.)

          --
          Jon Skeet - <skeet@pobox.co m>
          Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.

          If replying to the group, please do not mail me too

          Comment

          • Stan

            #6
            Re: Generating a Hash

            > I don't think you quite understand what a hash is, Stan. Hashes are not[color=blue]
            > guaranteed to be unique. They're just a way of localizing sparse data.
            > You *always* have to check for collisions with a hash.[/color]

            Yes, I thought hash is guaranteed to be unique - similar to when NT encrypts
            user's passwords and stores them as hash...

            What I probably need is not hashing but compressing or compacting
            name+address+ci ty+
            state+zip. Even without spaces I end up with 100-150 characters... There is
            got to be some algoritms that do that (similar to ZIP, ARJ, etc)...


            Comment

            • Jon Skeet

              #7
              Re: Generating a Hash

              Stan <nospam@yahoo.c om> wrote:[color=blue][color=green]
              > > I don't think you quite understand what a hash is, Stan. Hashes are not
              > > guaranteed to be unique. They're just a way of localizing sparse data.
              > > You *always* have to check for collisions with a hash.[/color]
              >
              > Yes, I thought hash is guaranteed to be unique - similar to when NT encrypts
              > user's passwords and stores them as hash...[/color]

              That doesn't guarantee it to be unique, I rather suspect. One way
              hashes like that are basically used so that an attacker has a *very,
              very small* chance of getting access without having the right password,
              and the password itself doesn't need to be stored in plain text.
              [color=blue]
              > What I probably need is not hashing but compressing or compacting
              > name+address+ci ty+
              > state+zip. Even without spaces I end up with 100-150 characters... There is
              > got to be some algoritms that do that (similar to ZIP, ARJ, etc)...[/color]

              Well, hashing would be a good start, if you wanted something small to
              search on: write a hash into your database (and make sure it's up to
              date!) but having retrieved results by hashcode, check that you get the
              right record (by the individual fields) before doing anything else.

              Note that although compression algorithms like zip etc will *usually*
              save space, there's no guarantee that they will - and there *can't* be,
              for exactly the same reason you can't get a unique hash when you're
              going from x bytes to y bytes and y is smaller than x.

              --
              Jon Skeet - <skeet@pobox.co m>
              Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.

              If replying to the group, please do not mail me too

              Comment

              Working...