A note on heapq module

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bearophileHUGS@lycos.com

    A note on heapq module

    In few minutes I have just written this quite raw class, it lacks
    doctring (the same of the functions of the heapq module), it may
    contain bugs still, I haven't tested it much. It's just a simple
    wrapper around some of the functions of the heapq module (nsmallest and
    nlargest are missing). Usually I use classes only when I need then, I
    like functional programming enough, and this class seems to just slow
    down the functions of the heapq module. But I think an improved class
    similar to this one may be useful (added, replacing isn't necessary)
    into the heapq module, because it can avoid cetain errors: I know what
    Heaps are and how they work, but some time ago I have written a bug
    into small program that uses the functions of the heapq module, because
    I have added an item to the head of the heap using a normal list
    method, breaking the heap invariant. With a simple class like this one
    all the methods of your object keep the heap invariant, avoiding that
    kind of bugs. (If you are interested I can improve this class some,
    it't not difficult.)


    from heapq import heapify, heappush, heappop, heapreplace

    class Heap(object):
    def __init__(self, init=None):
    if init is None:
    self.h = []
    elif isinstance(init , Heap):
    self.h = init.h[:]
    else:
    self.h = list(init)
    heapify(self.h)
    def smallest(self):
    return self.h[0]
    def sort(self, cmp=None, key=None):
    self.h.sort(cmp =cmp, key=key)
    def heappush(self, item):
    heappush(self.h , item)
    def heappop(self):
    return heappop(self.h)
    def heapreplace(sel f, item):
    return heapreplace(sel f.h, item)
    def clear(self):
    del self.h[:]
    def __contains__(se lf, item):
    return item in self.h
    def __hash__(self):
    raise TypeError("Heap objects are unhashable.")
    def __iter__(self):
    return self.h.__iter__ ()
    def __eq__(self, other):
    return isinstance(othe r, Heap) and self.h == other.h
    def __ne__(self, other):
    return not isinstance(othe r, Heap) or self.h != other.h
    def __len__(self):
    return len(self.h)
    def __nonzero__(sel f):
    return bool(self.h)
    def __str__(self):
    return str(self.h)
    def __repr__(self):
    return "Heap(%s)" % self.h if self.h else "Heap()"

    Bye,
    bearophile

  • Steven Bethard

    #2
    Re: A note on heapq module

    bearophileHUGS@ lycos.com wrote:
    some time ago I have written a bug
    into small program that uses the functions of the heapq module, because
    I have added an item to the head of the heap using a normal list
    method, breaking the heap invariant.
    I know I've had similar bugs in my code before.
    from heapq import heapify, heappush, heappop, heapreplace
    >
    class Heap(object):
    [snip implementation]

    +1 for adding a Heap object like this to the collections module. If you
    can make a little time to add some tests (you could probably steal a lot
    of what you need from test_heapq.py) you should definitely submit it as
    a patch for 2.6.

    STeVe

    Comment

    • bearophileHUGS@lycos.com

      #3
      Re: A note on heapq module

      I think the sort has to be simplified, otherwise it can't keep the heap
      invariant...

      def sort(self):
      self.h.sort()

      Bye,
      bearophile

      Comment

      • Jussi Salmela

        #4
        Re: A note on heapq module

        bearophileHUGS@ lycos.com kirjoitti:
        I think the sort has to be simplified, otherwise it can't keep the heap
        invariant...
        >
        def sort(self):
        self.h.sort()
        >
        Bye,
        bearophile
        >
        And __repr__ should be something like this:
        =========
        def __repr__(self):
        if self.h:
        return "Heap(%s)" % self.h
        else:
        return "Heap()"
        ==========
        to make it compatible with versions earlier than 2.5

        Cheers,
        Jussi

        Comment

        • Scott David Daniels

          #5
          Re: A note on heapq module

          bearophileHUGS@ lycos.com wrote:
          In few minutes I have just written this quite raw class, ....
          I'd suggest some changes. It is nice to have Heaps with equal
          contents equal no matter what order the inserts have been done.
          Consider how you want Heap([1, 2, 3]) and Heap([3, 1, 2]) to behave.
          Similarly, it is nice to have str and repr produce canonical
          representations (I would skip the __str__ code, myself, though).
          Also, subclasses should get their identities tweaked as so:
          class Heap(object):
          ...
          def sort(self, cmp=None, key=None):
          self.h.sort(cmp =cmp, key=key)
          I'd remove this method.
          ...
          def __eq__(self, other):
          return isinstance(othe r, Heap) and self.h == other.h
          return isinstance(othe r, self.__class__) and sorted(
          self.h) == sorted(other.h)
          def __ne__(self, other):
          return not isinstance(othe r, Heap) or self.h != other.h
          return not isinstance(othe r, self.__class__) and sorted(
          self.h) != sorted(other.h)
          ...
          def __str__(self):
          return str(self.h)
          return str(sorted(self .h))
          def __repr__(self):
          return "Heap(%s)" % self.h if self.h else "Heap()"
          return "%s(%s)" % (self.__class__ .__name__, sorted(self.h))
          And I'd add:
          def __lt__(self, other):
          raise TypeError('no ordering relation is defined for %s'
          % self.__class__. __name__)
          __gt__ = __le__ = __ge__ = __lt__

          --Scott David Daniels
          scott.daniels@a cm.org

          Comment

          • Scott David Daniels

            #6
            Re: A note on heapq module

            Scott David Daniels wrote:

            Sorry, I blew the __ne__:
            > def __ne__(self, other):
            > return not isinstance(othe r, Heap) or self.h != other.h
            return not isinstance(othe r, self.__class__) and sorted(
            self.h) != sorted(other.h)
            Should be:
            return not isinstance(othe r, self.__class__) or sorted(
            self.h) != sorted(other.h)

            --Scott David Daniels
            scott.daniels@a cm.org

            Comment

            • bearophileHUGS@lycos.com

              #7
              Re: A note on heapq module

              Scott David Daniels:
              I'd suggest some changes. It is nice to have Heaps with equal
              contents equal no matter what order the inserts have been done.
              Consider how you want Heap([1, 2, 3]) and Heap([3, 1, 2]) to behave.
              Similarly, it is nice to have str and repr produce canonical
              representations (I would skip the __str__ code, myself, though).
              Also, subclasses should get their identities tweaked as so:
              My version was a *raw* one, just an idea, this is a bit better:

              I like your changes. In the beginning I didn't want to put __eq__
              __ne__ methods at all, because they are too much slow, but being them
              O(n ln n) I think your solution is acceptable.

              Some methods may have a name different from the heap functions:
              def smallest(self):
              def push(self, item):
              def pop(self):
              def replace(self, item):

              Two things left I can see: I think the __init__ may have a boolean
              inplace parameter to avoid copying the given list, so this class is
              about as fast as the original heapify function (but maybe such thing is
              too much dirty for a Python stdlib):

              def __init__(self, sequence=None, inplace=False):
              if sequence is None:
              self._heap = []
              elif isinstance(sequ ence, self.__class__) :
              self._heap = sequence._heap[:]
              else:
              if inplace and isinstance(sequ ence, list):
              self._heap = sequence
              else:
              self._heap = list(sequence)
              heapify(self._h eap)

              The second thing, I don't like much the __iter__ because it yields
              unsorted items:

              def __iter__(self):
              return self._heap.__it er__()

              Is this good? I think this can be accepted.

              Thank you,
              bye,
              bearophile

              Comment

              • Antoon Pardon

                #8
                Re: A note on heapq module

                On 2007-01-16, bearophileHUGS@ lycos.com <bearophileHUGS @lycos.comwrote :
                In few minutes I have just written this quite raw class, it lacks
                doctring (the same of the functions of the heapq module), it may
                contain bugs still, I haven't tested it much. It's just a simple
                wrapper around some of the functions of the heapq module (nsmallest and
                nlargest are missing). Usually I use classes only when I need then, I
                like functional programming enough, and this class seems to just slow
                down the functions of the heapq module. But I think an improved class
                similar to this one may be useful (added, replacing isn't necessary)
                into the heapq module, because it can avoid cetain errors: I know what
                Heaps are and how they work, but some time ago I have written a bug
                into small program that uses the functions of the heapq module, because
                I have added an item to the head of the heap using a normal list
                method, breaking the heap invariant. With a simple class like this one
                all the methods of your object keep the heap invariant, avoiding that
                kind of bugs. (If you are interested I can improve this class some,
                it't not difficult.)
                >
                [ Heap class based on heapq ]
                For me, your class has the same drawback as the heappush, heappop
                procedurers: no way to specify a comparision function. Somewhere
                in my experimental code I work with line segments in two dimensions.
                Now in one place I want from the available segments the one in which the
                first point is farthest to the left. In a second place I want from the
                available segments the one in which the second point is farthest to the
                left. Both can be done with a heap, but currently I can't get both
                behaviours while using the same class and using the heapq module or
                your Heap class.

                --
                Antoon Pardon

                Comment

                • Neil Cerutti

                  #9
                  Re: A note on heapq module

                  On 2007-01-16, bearophileHUGS@ lycos.com
                  <bearophileHUGS @lycos.comwrote :
                  Scott David Daniels:
                  >I'd suggest some changes. It is nice to have Heaps with equal
                  >contents equal no matter what order the inserts have been
                  >done. Consider how you want Heap([1, 2, 3]) and Heap([3, 1,
                  >2]) to behave. Similarly, it is nice to have str and repr
                  >produce canonical representations (I would skip the __str__
                  >code, myself, though). Also, subclasses should get their
                  >identities tweaked as so:
                  >
                  My version was a *raw* one, just an idea, this is a bit better:

                  I like your changes. In the beginning I didn't want to put
                  __eq__ __ne__ methods at all, because they are too much slow,
                  but being them O(n ln n) I think your solution is acceptable.
                  >
                  Some methods may have a name different from the heap functions:
                  def smallest(self):
                  def push(self, item):
                  def pop(self):
                  def replace(self, item):
                  >
                  Two things left I can see: I think the __init__ may have a
                  boolean inplace parameter to avoid copying the given list, so
                  this class is about as fast as the original heapify function
                  (but maybe such thing is too much dirty for a Python stdlib):
                  One more idea, cribbed from the linked list thread elsewhere: it
                  might be nice if your Heap could optionally use an underlying
                  collections.deq ue instead of a list.

                  I don't know how excellent Python's deque is, but it's possible a
                  deque would provide a faster heap than a contiguous array. C++'s
                  std::deque is the default implementation of C++'s
                  std::priority_q ueue for that reason (unless I'm confused again).

                  --
                  Neil Cerutti
                  We will sell gasoline to anyone in a glass container. --sign at Santa Fe gas
                  station

                  Comment

                  • Steven Bethard

                    #10
                    Re: A note on heapq module

                    Antoon Pardon wrote:
                    For me, your class has the same drawback as the heappush, heappop
                    procedurers: no way to specify a comparision function.
                    Agreed. I'd love to see something like ``Heap(key=my_k ey_func)``.

                    STeVe

                    Comment

                    • bearophileHUGS@lycos.com

                      #11
                      Re: A note on heapq module

                      Steven Bethard:
                      Antoon Pardon:
                      For me, your class has the same drawback as the heappush, heappop
                      procedurers: no way to specify a comparision function.
                      >
                      Agreed. I'd love to see something like ``Heap(key=my_k ey_func)``.
                      It can be done, but the code becomes more complex and hairy:


                      (If someone spots a problem please tell me, thank you.)

                      Bye,
                      bearophile

                      Comment

                      • bearophileHUGS@lycos.com

                        #12
                        Re: A note on heapq module

                        Neil Cerutti:
                        One more idea, cribbed from the linked list thread elsewhere: it
                        might be nice if your Heap could optionally use an underlying
                        collections.deq ue instead of a list.
                        I don't know how excellent Python's deque is, but it's possible a
                        deque would provide a faster heap than a contiguous array. C++'s
                        std::deque is the default implementation of C++'s
                        std::priority_q ueue for that reason (unless I'm confused again).
                        If you have some minutes you can do few speed tests and show us the
                        code and the timing results...

                        Bye,
                        bearophile

                        Comment

                        • Neil Cerutti

                          #13
                          Re: A note on heapq module

                          On 2007-01-18, bearophileHUGS@ lycos.com <bearophileHUGS @lycos.comwrote :
                          Neil Cerutti:
                          >One more idea, cribbed from the linked list thread elsewhere:
                          >it might be nice if your Heap could optionally use an
                          >underlying collections.deq ue instead of a list. I don't know
                          >how excellent Python's deque is, but it's possible a deque
                          >would provide a faster heap than a contiguous array. C++'s
                          >std::deque is the default implementation of C++'s
                          >std::priority_ queue for that reason (unless I'm confused
                          >again).
                          >
                          If you have some minutes you can do few speed tests and show us
                          the code and the timing results...
                          I'll see what I can come up with. I'll have to test using the
                          native Python implementation, which should work with deque,
                          rather than the optimized C implementation, which doesn't.

                          Unfortunately the inability to take advantage of the C
                          implementation of heapq might swamp any possible advantage of
                          using deque instead of list in a Heap class.

                          --
                          Neil Cerutti

                          --
                          Posted via a free Usenet account from http://www.teranews.com

                          Comment

                          • Neil Cerutti

                            #14
                            Re: A note on heapq module

                            On 2007-01-18, bearophileHUGS@ lycos.com <bearophileHUGS @lycos.comwrote :
                            Neil Cerutti:
                            >One more idea, cribbed from the linked list thread elsewhere:
                            >it might be nice if your Heap could optionally use an
                            >underlying collections.deq ue instead of a list. I don't know
                            >how excellent Python's deque is, but it's possible a deque
                            >would provide a faster heap than a contiguous array. C++'s
                            >std::deque is the default implementation of C++'s
                            >std::priority_ queue for that reason (unless I'm confused
                            >again).
                            >
                            If you have some minutes you can do few speed tests and show us
                            the code and the timing results...
                            collections.deq ue is the loser.

                            Here's the output of the program from my last run:

                            list: 5.81679827554
                            deque: 6.40347742423
                            C heapq: 2.24028186815

                            Here's the program. You can customize it somewhat to attempt to
                            model a real program. It builds up a heap of random integers to
                            some size, performs random pushes and pops for a while, and then
                            pops the heap down until it's empty.

                            import random
                            import timeit

                            OPCOUNT = 5000
                            HEAP_SIZE = 100

                            # Create a random sequence of pushes and pops.
                            pushes = 0
                            ops = []
                            for i in xrange(OPCOUNT) :
                            x = random.randint( 0, 1)
                            if x == 0 or pushes < HEAP_SIZE:
                            pushes += 1
                            ops.append(0)
                            else:
                            pushes -= 1
                            ops.append(1)
                            # Pop off the rest
                            for i in xrange(pushes):
                            ops.append(1)

                            def test(mod, cont):
                            for op in ops:
                            if op:
                            mod.heappop(con t)
                            else:
                            mod.heappush(co nt, random.randint( 0, 150))

                            # heapqd is the pure Python heapq module without the C implementation.
                            t1 = timeit.Timer("t est(heapqd, list())",
                            "from __main__ import test; import heapqd")
                            t2 = timeit.Timer("t est(heapqd, deque())",
                            "from __main__ import test; "\
                            "from collections import deque; "\
                            "import heapqd")
                            t3 = timeit.Timer("t est(heapq, list())",
                            "from __main__ import test; import heapq")
                            print "list:", t1.timeit(100)
                            print "deque:", t2.timeit(100)
                            print "C heapq:", t3.timeit(100)

                            --
                            Neil Cerutti
                            The Pastor would appreciate it if the ladies of the congregation would lend
                            him their electric girdles for the pancake breakfast next Sunday morning.
                            --Church Bulletin Blooper

                            --
                            Posted via a free Usenet account from http://www.teranews.com

                            Comment

                            • Steven Bethard

                              #15
                              Re: A note on heapq module

                              bearophileHUGS@ lycos.com wrote:
                              Steven Bethard:
                              >Antoon Pardon:
                              >>For me, your class has the same drawback as the heappush, heappop
                              >>procedurers : no way to specify a comparision function.
                              >Agreed. I'd love to see something like ``Heap(key=my_k ey_func)``.
                              >
                              It can be done, but the code becomes more complex and hairy:
                              http://rafb.net/p/iCCmDz16.html
                              Cool!

                              The current code fails when using unbound methods however::
                              >>heap = Heap()
                              >>Heap.push(hea p, 1)
                              Traceback (most recent call last):
                              File "<interacti ve input>", line 1, in <module>
                              AttributeError: type object 'Heap' has no attribute 'push'

                              I would have written the code like::

                              def smallest(self):
                              result = self._heap[0]
                              if self._key is not None:
                              result = result[2]
                              return result

                              def push(self, item):
                              if self._key is not None:
                              item = self._key(item) , self._itemid, item
                              self._itemid += 1
                              heappush(self._ heap, item)

                              def pop(self):
                              result = heappop(self._h eap)
                              if self._key is not None:
                              result = result[2]
                              return result

                              def popn(self, n):
                              result = [heappop(self._h eap) for _ in xrange(n)]
                              if self._key is not None:
                              result = [item[2] for item in result]
                              return result

                              def replace(self, item):
                              if self._key is not None:
                              item = self._key(item) , self._itemid, item
                              self._itemid += 1
                              result = heapreplace(sel f._heap, item)
                              if self._key is not None:
                              result = result[2]
                              return result

                              This allows unbound methods to work, and substantially lowers the code
                              duplication. It does add a few extra if statements, but since they're
                              ``is`` tests, they should be pretty fast.

                              STeVe

                              Comment

                              Working...