Perfect hashing for Py

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bearophileHUGS@lycos.com

    Perfect hashing for Py

    Following links from this thread:


    I have found this perfect hash (minimal too) implementation:
    minimal perfect hash, perfect hash, hash, function, hashing, C, algorithm, code, unique, collision, minimal, switch, optimization


    I have already translated part of it to D, and it seems to work well
    enough. As discussed in the PyConDue, I think this may be used in
    frozenset (and frozendict) to build a (minimal too?) perfect hash on
    the fly, to allow (hopefully) faster retrieval of items that don't
    change.
    That code is C and I think it's public domain, so if experiments show
    it gives enough speed up, it may be added to CPython 2.6/3.

    Bye,
    bearophile
  • Casey

    #2
    Re: Perfect hashing for Py

    On Jul 11, 8:01 am, bearophileH...@ lycos.com wrote:
    Following links from this thread:http://groups.google.com/group/comp....thread/thread/...
    >
    I have found this perfect hash (minimal too) implementation:http://burtleburtle.net/bob/hash/perfect.html
    >
    I have already translated part of it to D, and it seems to work well
    enough. As discussed in the PyConDue, I think this may be used in
    frozenset (and frozendict) to build a (minimal too?) perfect hash on
    the fly, to allow (hopefully) faster retrieval of items that don't
    change.
    That code is C and I think it's public domain, so if experiments show
    it gives enough speed up, it may be added to CPython 2.6/3.
    >
    Bye,
    bearophile
    It would be interesting to see if such an algorithm could actually
    provide any significant performance improvements for the size of sets
    that I suspect are most often used in practice. The chance of a hash
    collision for a good 32-bit general is fairly low even for a set of
    1,000,000 unique elements, which seems to me to be a pretty large
    memory-based set. Compare that with the cost of determining a perfect
    hash (O(n**2)?).

    From my perspective, a perfect hash would certainly be a welcome
    addition to the Python library or even as an optional algorithm
    supporting
    hash-based collections.

    Comment

    • Raymond Hettinger

      #3
      Re: Perfect hashing for Py

      On Jul 12, 10:13 am, Raymond Hettinger <pyt...@rcn.com wrote:
      On Jul 11, 3:01 pm, bearophileH...@ lycos.com wrote:
      >
      I have found this perfect hash (minimal too) implementation:http://burtleburtle.net/bob/hash/perfect.html
      >
      I have already translated part of it to D, and it seems to work well
      enough. As discussed in the PyConDue, I think this may be used in
      frozenset (and frozendict) to build a (minimal too?) perfect hash on
      the fly, to allow (hopefully) faster retrieval of items that don't
      change.
      I played around with the idea and have a rough implementation sketch:

      class PerfectSet(coll ections.Set):
      __slots__ = ['len', 'hashvalue', 'hashfunc', 'hashtable']
      DUMMY = object()
      def __init__(self, keys, sparseness=0.5) :
      keys = frozenset(keys)
      self.len = len(keys)
      self.hashvalue = hash(keys)
      n = (len(keys) / sparseness) + 1
      assert n self.len
      self.hashfunc = make_perfect_ha sh_function(key s, n)
      self.hashtable = [self.DUMMY] * n
      for k in keys:
      self.hashtable[self.hashfunc(k )] = k
      del __len__(self, k):
      return self.len
      def __iter__(self, k)
      return (k for k in self.hashtable if k is not self.DUMMY)
      def __contains__(se lf, k):
      return self.table[self.hashfunc(k )] is not self.DUMMY

      The perfect hash function will need to use the regular hash(obj) as a
      starting point. This helps with non-string types and it preserves
      existing relationships between __hash__ and __eq__.

      The construction is more expensive than with regular frozensets so it
      is only useful when lookups are very common.

      Playing around with the idea suggests that a sparseness variable for
      frozensets would largely accomplish the same goal without the overhead
      of creating and applying a perfect hash function. Sparse hash tables
      rarely collide.


      Raymond

      Comment

      Working...