Hamming Distance

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • godavemon

    Hamming Distance

    I need to calculate the Hamming Distance of two integers. The hamming
    distance is the number of bits in two integers that don't match. I
    thought there'd be a function in math or scipy but i haven't been able
    to find one. This is my function but it seems like there should be a
    faster way. I do this computation many times and speed up is
    important.

    def hamdist( a, b , bits = 32):
    def _hamdist( x, bits):
    if bits:
    return (x & 1) + _hamdist(x >1, bits-1)
    return x & 1
    return _hamdist( a ^ b, bits)


    Another alternative would be to convert the XOR to a binary string and
    count the # of 1's.

    Which would be fastest? Are there better alternatives?

    Thanks!
  • Raymond Hettinger

    #2
    Re: Hamming Distance

    On Jun 19, 4:27 pm, godavemon <davefow...@gma il.comwrote:
    I need to calculate the Hamming Distance of two integers.  The hamming
    distance is the number of bits in two integers that don't match.  I
    thought there'd be a function in math or scipy but i haven't been able
    to find one.  This is my function but it seems like there should be a
    faster way.  I do this computation many times and speed up is
    important.
    The simplest speed-up is to break the inputs into n-length blocks and
    then look them up in an n-by-n table.

    Also, re-write your function to use iteration instead of recursion
    (the latter is *very* expensive in Python).

    The fastest way is to write a C function or use Pyrex.


    Raymond

    Comment

    • John Machin

      #3
      Re: Hamming Distance

      On Jun 20, 9:27 am, godavemon <davefow...@gma il.comwrote:
      I need to calculate the Hamming Distance of two integers. The hamming
      distance is the number of bits in two integers that don't match. I
      thought there'd be a function in math or scipy but i haven't been able
      to find one. This is my function but it seems like there should be a
      faster way. I do this computation many times and speed up is
      important.
      >
      def hamdist( a, b , bits = 32):
      def _hamdist( x, bits):
      if bits:
      return (x & 1) + _hamdist(x >1, bits-1)
      return x & 1
      return _hamdist( a ^ b, bits)
      >
      Another alternative would be to convert the XOR to a binary string and
      Isn't it a "binary string" already?
      count the # of 1's.
      My guess is that counting the 1-bits in (a ^ b) would be hard to beat,
      and that a recursive function is just not in the race.
      >
      Which would be fastest?
      Implement both and time them.
      Are there better alternatives?
      I doubt it.

      BTW one presumes that your integers are non-negative ...

      HTH

      Cheers,
      John

      Comment

      • Mensanator

        #4
        Re: Hamming Distance

        On Jun 19, 6:27 pm, godavemon <davefow...@gma il.comwrote:
        I need to calculate the Hamming Distance of two integers.  The hamming
        distance is the number of bits in two integers that don't match.  I
        thought there'd be a function in math or scipy but i haven't been able
        to find one.  This is my function but it seems like there should be a
        faster way.  I do this computation many times and speed up is
        important.
        >
        def hamdist( a, b , bits = 32):
            def _hamdist( x, bits):
                if bits:
                    return (x & 1) + _hamdist(x >1, bits-1)
                return x & 1
            return _hamdist( a ^ b, bits)
        >
        Another alternative would be to convert the XOR to a binary string and
        count the # of 1's.
        >
        Which would be fastest?  Are there better alternatives?
        Yep, use the gmpy module.
        >>import gmpy
        >>help(gmpy.ham dist)
        Help on built-in function hamdist in module gmpy:
        hamdist(...)
        hamdist(x,y): returns the Hamming distance (number of bit-
        positions
        where the bits differ) between x and y. x and y must be mpz, or
        else
        get coerced to mpz.
        >>gmpy.hamdist( 15,6)
        2
        >>gmpy.hamdist( 2**177149,10**5 3330)
        61877
        >>gmpy.hamdist( 2**177149-1,10**53330)
        115285

        >
        Thanks!

        Comment

        • Robert Kern

          #5
          Re: Hamming Distance

          godavemon wrote:
          I need to calculate the Hamming Distance of two integers. The hamming
          distance is the number of bits in two integers that don't match. I
          thought there'd be a function in math or scipy but i haven't been able
          to find one. This is my function but it seems like there should be a
          faster way. I do this computation many times and speed up is
          important.
          >
          def hamdist( a, b , bits = 32):
          def _hamdist( x, bits):
          if bits:
          return (x & 1) + _hamdist(x >1, bits-1)
          return x & 1
          return _hamdist( a ^ b, bits)
          >
          >
          Another alternative would be to convert the XOR to a binary string and
          count the # of 1's.
          >
          Which would be fastest? Are there better alternatives?
          I think this works:

          In [26]: hexbits = {'0': 0,
          ....: '1': 1,
          ....: '2': 1,
          ....: '3': 2,
          ....: '4': 1,
          ....: '5': 2,
          ....: '6': 2,
          ....: '7': 3,
          ....: '8': 1,
          ....: '9': 2,
          ....: 'A': 2,
          ....: 'B': 3,
          ....: 'C': 2,
          ....: 'D': 3,
          ....: 'E': 3,
          ....: 'F': 4}

          In [28]: def hamming(a, b):
          ....: return sum(hexbits[h] for h in hex(a^b)[2:])
          ....:

          In [29]: hamming(1,1)
          Out[29]: 0

          In [30]: hamming(1,2)
          Out[30]: 2

          In [31]: hamming(2,2)
          Out[31]: 0

          In [32]: hamming(2,3)
          Out[32]: 1


          This might be a wee faster, I haven't timed it:

          sum(map(h.get, hex(a^b)[2:]))

          --
          Robert Kern

          "I have come to believe that the whole world is an enigma, a harmless enigma
          that is made terrible by our own mad attempt to interpret it as though it had
          an underlying truth."
          -- Umberto Eco

          Comment

          • Matimus

            #6
            Re: Hamming Distance

            On Jun 19, 4:27 pm, godavemon <davefow...@gma il.comwrote:
            I need to calculate the Hamming Distance of two integers.  The hamming
            distance is the number of bits in two integers that don't match.  I
            thought there'd be a function in math or scipy but i haven't been able
            to find one.  This is my function but it seems like there should be a
            faster way.  I do this computation many times and speed up is
            important.
            >
            def hamdist( a, b , bits = 32):
                def _hamdist( x, bits):
                    if bits:
                        return (x & 1) + _hamdist(x >1, bits-1)
                    return x & 1
                return _hamdist( a ^ b, bits)
            >
            Another alternative would be to convert the XOR to a binary string and
            count the # of 1's.
            >
            Which would be fastest?  Are there better alternatives?
            >
            Thanks!
            I see no good reason to use recursion for this type of thing. Here are
            some of my attempts:

            Code:
            from math import log
            
            def yours(a, b , bits = 32):
            def _hamdist( x, bits):
            if bits:
            return (x & 1) + _hamdist(x >1, bits-1)
            return x & 1
            return _hamdist(a ^ b, bits)
            
            
            def simple(a, b, bits=32):
            x = a ^ b
            return sum((x >i & 1) for i in xrange(bits))
            
            def lazy(a, b, bits=32):
            x = (a ^ b) & ((1 << bits) - 1)
            tot = 0
            while x:
            tot += x & 1
            x >>= 1
            return tot
            
            def fancy(a, b, bits=32):
            x = (a ^ b) & ((1 << bits) - 1)
            tot = 0
            while x:
            tot += 1
            x ^= 1 << int(log(x, 2))
            return tot
            
            test_vals = (
            ((0xffffffff, 0), 32),
            ((0,0), 0),
            ((1,0), 1),
            ((0x80000000, 0), 1),
            ((0x55555555, 0), 16)
            )
            
            def test(f):
            test_vals = (
            ((0xffffffff, 0), 32), # ALL
            ((0,0), 0), # None
            ((1,0), 1), # First
            ((0x80000000, 0), 1), # Last
            ((0x55555555, 0), 16), # Every Other
            ((0xffff, 0), 16), # First Half
            ((0xffff0000, 0), 16), # Last Half
            )
            for i, (args, exp) in enumerate(test_vals):
            if f(*args) != exp:
            return 0
            return 1
            
            if __name__ == "__main__":
            for f in (yours, simple, lazy, fancy):
            if not test(f):
            print "%s failed"%f.__name__
            The python module `timeit` is handy for testing speed:

            python -mtimeit -s"from hamdist import *" "test(yours )"
            10000 loops, best of 3: 95.1 usec per loop

            python -mtimeit -s"from hamdist import *" "test(simpl e)"
            10000 loops, best of 3: 65.3 usec per loop

            python -mtimeit -s"from hamdist import *" "test(lazy) "
            10000 loops, best of 3: 59.8 usec per loop

            python -mtimeit -s"from hamdist import *" "test(fancy )"
            10000 loops, best of 3: 77.2 usec per loop

            Even the ridiculous `fancy` version beat the recursive version.

            Matt

            Comment

            • Raymond Hettinger

              #7
              Re: Hamming Distance

              Non-recursive, 8-bit block table lookup version:

              def ham(a, b, ht=[hamdist(a,0) for a in range(256)]):
              x = a ^ b
              dist = 0
              while x:
              dist += ht[x & 255]
              x >>= 8
              return dist

              Raymond


              Comment

              • godavemon

                #8
                Re: Hamming Distance

                Awesome! Thanks a lot.

                On Jun 19, 5:00 pm, Mensanator <mensana...@aol .comwrote:
                On Jun 19, 6:27 pm, godavemon <davefow...@gma il.comwrote:
                >
                >
                >
                I need to calculate the Hamming Distance of two integers. The hamming
                distance is the number of bits in two integers that don't match. I
                thought there'd be a function in math or scipy but i haven't been able
                to find one. This is my function but it seems like there should be a
                faster way. I do this computation many times and speed up is
                important.
                >
                def hamdist( a, b , bits = 32):
                def _hamdist( x, bits):
                if bits:
                return (x & 1) + _hamdist(x >1, bits-1)
                return x & 1
                return _hamdist( a ^ b, bits)
                >
                Another alternative would be to convert the XOR to a binary string and
                count the # of 1's.
                >
                Which would be fastest? Are there better alternatives?
                >
                Yep, use the gmpy module.
                >
                >import gmpy
                >help(gmpy.hamd ist)
                >
                Help on built-in function hamdist in module gmpy:
                hamdist(...)
                hamdist(x,y): returns the Hamming distance (number of bit-
                positions
                where the bits differ) between x and y. x and y must be mpz, or
                else
                get coerced to mpz.
                >
                >gmpy.hamdist(1 5,6)
                2
                >gmpy.hamdist(2 **177149,10**53 330)
                61877
                >gmpy.hamdist(2 **177149-1,10**53330)
                >
                115285
                >
                >
                >
                Thanks!


                Comment

                • godavemon

                  #9
                  Re: Hamming Distance

                  Great thanks!

                  On Jun 19, 5:37 pm, Matimus <mccre...@gmail .comwrote:
                  On Jun 19, 4:27 pm, godavemon <davefow...@gma il.comwrote:
                  >
                  >
                  >
                  I need to calculate the Hamming Distance of two integers. The hamming
                  distance is the number of bits in two integers that don't match. I
                  thought there'd be a function in math or scipy but i haven't been able
                  to find one. This is my function but it seems like there should be a
                  faster way. I do this computation many times and speed up is
                  important.
                  >
                  def hamdist( a, b , bits = 32):
                  def _hamdist( x, bits):
                  if bits:
                  return (x & 1) + _hamdist(x >1, bits-1)
                  return x & 1
                  return _hamdist( a ^ b, bits)
                  >
                  Another alternative would be to convert the XOR to a binary string and
                  count the # of 1's.
                  >
                  Which would be fastest? Are there better alternatives?
                  >
                  Thanks!
                  >
                  I see no good reason to use recursion for this type of thing. Here are
                  some of my attempts:
                  >
                  Code:
                  from math import log
                  >
                  def yours(a, b , bits = 32):
                       def _hamdist( x, bits):
                           if bits:
                               return (x & 1) + _hamdist(x >1, bits-1)
                           return x & 1
                       return _hamdist(a ^ b, bits)
                  >
                  def simple(a, b, bits=32):
                      x = a ^ b
                      return sum((x >i & 1) for i in xrange(bits))
                  >
                  def lazy(a, b, bits=32):
                      x = (a ^ b) & ((1 << bits) - 1)
                      tot = 0
                      while x:
                          tot += x & 1
                          x >>= 1
                      return tot
                  >
                  def fancy(a, b, bits=32):
                      x = (a ^ b) & ((1 << bits) - 1)
                      tot = 0
                      while x:
                          tot += 1
                          x ^= 1 << int(log(x, 2))
                      return tot
                  >
                  test_vals = (
                          ((0xffffffff, 0), 32),
                          ((0,0), 0),
                          ((1,0), 1),
                          ((0x80000000, 0), 1),
                          ((0x55555555, 0), 16)
                          )
                  >
                  def test(f):
                      test_vals = (
                              ((0xffffffff, 0), 32), # ALL
                              ((0,0), 0), # None
                              ((1,0), 1), # First
                              ((0x80000000, 0), 1), # Last
                              ((0x55555555, 0), 16), # Every Other
                              ((0xffff, 0), 16), # First Half
                              ((0xffff0000, 0), 16), # Last Half
                              )
                      for i, (args, exp) in enumerate(test_vals):
                          if f(*args) != exp:
                              return 0
                      return 1
                  >
                  if __name__ == "__main__":
                      for f in (yours, simple, lazy, fancy):
                          if not test(f):
                              print "%s failed"%f.__name__
                  >
                  The python module `timeit` is handy for testing speed:
                  >
                  python -mtimeit -s"from hamdist import *" "test(yours )"
                  10000 loops, best of 3: 95.1 usec per loop
                  >
                  python -mtimeit -s"from hamdist import *" "test(simpl e)"
                  10000 loops, best of 3: 65.3 usec per loop
                  >
                  python -mtimeit -s"from hamdist import *" "test(lazy) "
                  10000 loops, best of 3: 59.8 usec per loop
                  >
                  python -mtimeit -s"from hamdist import *" "test(fancy )"
                  10000 loops, best of 3: 77.2 usec per loop
                  >
                  Even the ridiculous `fancy` version beat the recursive version.
                  >
                  Matt


                  Comment

                  Working...