hashing an array - howto

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Helmut Jarausch

    hashing an array - howto

    Hi,

    I need to hash arrays of integers (from the hash module).

    So, I have to derive from array and supply a __hash__ method.
    But how to hash an array (of fixed length, say 25)?
    What I need is a function which maps a tuple of 25 integers
    into 1 integer with good hashing properties.

    Does anybody know such a thing?

    Many thanks for a hint,

    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
  • Peter Otten

    #2
    Re: hashing an array - howto

    Helmut Jarausch wrote:
    I need to hash arrays of integers (from the hash module).
    >
    So, I have to derive from array and supply a __hash__ method.
    But how to hash an array (of fixed length, say 25)?
    What I need is a function which maps a tuple of 25 integers
    into 1 integer with good hashing properties.
    >
    Does anybody know such a thing?
    Have you tried this already?

    def __hash__(self):
    return hash(self.tostr ing())

    Peter


    Comment

    • bearophileHUGS@lycos.com

      #3
      Re: hashing an array - howto

      Helmut Jarausch:
      I need to hash arrays of integers (from the hash module).
      One of the possible solutions is to hash the equivalent tuple, but it
      requires some memory (your sequence must not be tuples already):

      assert not isinstance(some list, tuple)
      hash(tuple(some list))

      This is an alternative solution, it doesn't use much memory, but I am
      not sure it works correctly:

      from operator import xor
      hash(reduce(xor , somelist))

      Bye,
      bearophile

      Comment

      • Michael Palmer

        #4
        Re: hashing an array - howto

        On Sep 5, 11:18 am, bearophileH...@ lycos.com wrote:
        Helmut Jarausch:
        >
        I need to hash arrays of integers (from the hash module).
        >
        One of the possible solutions is to hash the equivalent tuple, but it
        requires some memory (your sequence must not be tuples already):
        why can't it be tuple already? Doesn't matter:
        >>from numpy import arange
        >>a=arange(5)
        >>a
        array([0, 1, 2, 3, 4])
        >>hash(a)
        Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        TypeError: unhashable type
        >>b=tuple(a)
        >>b
        (0, 1, 2, 3, 4)
        >>c=tuple(b)
        >>c
        (0, 1, 2, 3, 4)
        >>hash(c)
        1286958229

        you can discard the tuple, so the memory requirement is transient.

        Comment

        • bearophileHUGS@lycos.com

          #5
          Re: hashing an array - howto

          Michael Palmer:
          why can't it be tuple already?
          Because if the input list L has tuples and lists, they end having the
          same hash value:
          >>L = [[1,2,3], (1,2,3)]
          >>hash(tuple( L[0])), hash(tuple(L[1]))
          (-378539185, -378539185)

          But it's a not much common situation, and few hash collision pairs
          can't damage much, so I agree with you that my assert was useless.
          This may solve that problem anyway:

          hash(type(L)) ^ hash(tuple(L))

          Generally a good hashing functions uses all the input information. If
          you use tuple() you ignore part of the input information, that is the
          type of L. So xor-ing hash(type(L)) you use that information too.

          you can discard the tuple, so the memory requirement is transient.
          Right, but there's lot of GC action, it may slow down the code. So you
          can start using hash(tuple(L)), but if later the code performance
          comes out bad, you may try a different version that creates less
          intermediate garbage.

          Bye,
          bearophile

          Comment

          • bearophileHUGS@lycos.com

            #6
            Re: hashing an array - howto

            John Machin:
            Consider this:>>hash(123 ) == hash(123.0) == hash(123L)
            True
            Right... Can you explain me why Python designers have chosen to build
            a hash() like that?

            Try "uses all the information that is relevant to the task".
            My knowledge of hash data structures seems not enough to understand
            why.

            Your alternative solution using reduce and xor may have suboptimal
            characteristics ...
            Right, sorry.

            Bye,
            bearophile

            Comment

            • John Machin

              #7
              Re: hashing an array - howto

              On Sep 6, 7:49 am, bearophileH...@ lycos.com wrote:
              John Machin:
              >
              Consider this:>>hash(123 ) == hash(123.0) == hash(123L)
              True
              >
              Right... Can you explain me why Python designers have chosen to build
              a hash() like that?
              I can't channel them; my rationalisation is this:

              Following the Law of Least Astonishment,
              >123 == 123.0 == 123L
              True

              Consequently if x == y, then adict[x] and adict[y] should give the
              same result.

              Cheers,
              John

              Comment

              • John Machin

                #8
                Re: hashing an array - howto

                On Sep 6, 9:30 am, John Machin <sjmac...@lexic on.netwrote:
                On Sep 6, 7:49 am, bearophileH...@ lycos.com wrote:
                >
                John Machin:
                >
                Consider this:>>hash(123 ) == hash(123.0) == hash(123L)
                True
                >
                Right... Can you explain me why Python designers have chosen to build
                a hash() like that?
                >
                I can't channel them; my rationalisation is this:
                >
                Following the Law of Least Astonishment,>1 23 == 123.0 == 123L
                >
                True
                >
                Consequently if x == y, then adict[x] and adict[y] should give the
                same result.
                >
                Another reason for not folding in the type of the object is this:
                >>type([])
                <type 'list'>
                >>hash(type([]))
                505252536
                >>id(type([]))
                505252536

                IOW hash(T) == id(T) where T is a type. id(obj) is just a memory
                address which can vary between executions of the same Python binary on
                the same machine ... not very reproducible. There is no guarantee in
                the docs for hash about under what circumstances hash(x) != hash(x) of
                course; I'm just relying on the least astonishment law again :-)

                And, again, we don't know what the OP's full requirements are ...

                Comment

                • sudhi.herle@gmail.com

                  #9
                  Re: hashing an array - howto

                  On Sep 5, 9:38 am, Helmut Jarausch <jarau...@skyne t.bewrote:
                  Hi,
                  >
                  I need to hash arrays of integers (from the hash module).
                  >
                  So, I have to derive from array and supply a __hash__ method.
                  I don't believe you need to derive from an array.

                  Here are two simple and well known hash functions you can use readily:

                  def djbhash(a):
                  """Hash function from D J Bernstein"""

                  h = 5381L
                  for i in a:
                  t = (h * 33) & 0xffffffffL
                  h = t ^ i

                  return h


                  def fnvhash(a):
                  """Fowler, Noll, Vo Hash function"""
                  h = 2166136261
                  for i in a:
                  t = (h * 16777619) & 0xffffffffL
                  h = t ^ i

                  return h

                  if __name__ == '__main__':
                  arr = [1001, 3001, 5001, 9001, 10011, 10013, 10015, 10017, 10019,
                  20011, 23001]
                  print djbhash(arr)
                  print fnvhash(arr)


                  And finally, here is an excellent page that explains hash functions:
                  PGSLOT มีธีมเกมที่หลากหลาย ทั้งแบบธีมเกมที่แปลกใหม่และเป็นธีมเกมที่ผู้เล่นชาวไทยคุ้นเคย ผู้ให้บริการเกมสล็อตออนไลน์จากค่าย PG SOFT


                  Here is Noll's page where he explains the FNV Hash:


                  Hope this helps,
                  --
                  Sudhi

                  Comment

                  Working...