Python Memory Manager

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Pie Squared

    Python Memory Manager

    I've been looking at the Python source code recently, more
    specifically trying to figure out how it's garbage collector works.

    I've gathered that it uses refcounting as well as some cycle-detection
    algorithms, but I haven't been able to figure out some other things.

    Does Python actually have a single 'heap' where all the data is
    stored? Because PyObject_HEAD seemed to imply to me it was just a
    linked list of objects, although perhaps I didnt understand this
    correctly.

    Also, if it does, how does it deal with memory segmentation? This
    question bothers me because I've been trying to implement a moving
    garbage collector, and am not sure how to deal with updating all
    program pointers to objects on the heap, and thought perhaps an answer
    to this question would give me some ideas. (Also, if you know any
    resources for things like this, I'd be grateful for links/names)

    Thanks in advance,

    Pie Squared
  • Paul Rubin

    #2
    Re: Python Memory Manager

    Pie Squared <PieSquared@gma il.comwrites:
    Also, if it does, how does it deal with memory segmentation? This
    question bothers me because I've been trying to implement a moving
    garbage collector, and am not sure how to deal with updating all
    program pointers to objects on the heap, and thought perhaps an answer
    to this question would give me some ideas.
    As I understand it, Python primarily uses reference counting, with a
    mark and sweep scheme for cycle breaking tacked on as an afterthought.
    It doesn't move objects in memory during GC so you can get
    fragmentation. It's probably not feasible to change this in CPython
    without extensive rewriting of CPython and maybe a lot of external C
    modules.
    (Also, if you know any
    resources for things like this, I'd be grateful for links/names)
    If you mean about GC in general, the old book by Jones and Lins is
    still standard, I think.

    Comment

    • Pie Squared

      #3
      Re: Python Memory Manager

      On Feb 17, 1:57 pm, Paul Rubin <http://phr...@NOSPAM.i nvalidwrote:
      Pie Squared <PieSqua...@gma il.comwrites:
      Also, if it does, how does it deal with memory segmentation? This
      question bothers me because I've been trying to implement a moving
      garbage collector, and am not sure how to deal with updating all
      program pointers to objects on the heap, and thought perhaps an answer
      to this question would give me some ideas.
      >
      As I understand it, Python primarily uses reference counting, with a
      mark and sweep scheme for cycle breaking tacked on as an afterthought.
      It doesn't move objects in memory during GC so you can get
      fragmentation. It's probably not feasible to change this in CPython
      without extensive rewriting of CPython and maybe a lot of external C
      modules.
      >
      (Also, if you know any
      resources for things like this, I'd be grateful for links/names)
      >
      If you mean about GC in general, the old book by Jones and Lins is
      still standard, I think.
      Thanks for the quick reply!

      That answered my question, and I'll check out the book you're
      referring to - it's exactly what I need, I think. Again, thanks!

      Comment

      • Christian Heimes

        #4
        Re: Python Memory Manager

        Pie Squared wrote:
        I've been looking at the Python source code recently, more
        specifically trying to figure out how it's garbage collector works.
        >
        I've gathered that it uses refcounting as well as some cycle-detection
        algorithms, but I haven't been able to figure out some other things.
        Python uses ref counting for all objects and an additional GC for
        container objects like lists and dicts. The cyclic GC depends on ref
        counting.
        Does Python actually have a single 'heap' where all the data is
        stored? Because PyObject_HEAD seemed to imply to me it was just a
        linked list of objects, although perhaps I didnt understand this
        correctly.
        In release builds PyObject_HEAD only contains the ref count and a link
        to the object type. In Py_DEBUG builds it also contains a double linked
        list of all allocated objects to debug reference counting bugs.

        Python uses its own optimized memory allocator. Be sure you have read
        Object/obmalloc.c!
        Also, if it does, how does it deal with memory segmentation? This
        question bothers me because I've been trying to implement a moving
        garbage collector, and am not sure how to deal with updating all
        program pointers to objects on the heap, and thought perhaps an answer
        to this question would give me some ideas. (Also, if you know any
        resources for things like this, I'd be grateful for links/names)
        I don't think it's possible to implement a moving GC. You'd have to
        chance some fundamental parts of Python.

        Christian

        Comment

        • thebjorn

          #5
          Re: Python Memory Manager

          On Feb 17, 10:01 pm, Pie Squared <PieSqua...@gma il.comwrote:
          [...]
          It seems to me that another, perhaps better strategy, would be to
          allocate a large heap space, then store a pointer to the base of the
          heap, the current heap size, and the beginning of the free memory.
          When you need to 'allocate' more room, just return a pointer to some
          location in the heap and increment the start-of-free-memory pointer.
          That way, allocation really IS free, more or less. Wouldn't that be
          more efficient? Perhaps I'm missing something.
          Deallocation?
          As a side note, I'm new to Usenet, so I'm not exactly sure... are
          'tangents' like this - since this IS a Python newsgroup, after all -
          okay?
          It varies depending on the group, c.l.py is pretty tolerant as long as
          it's interesting ;-)

          To bring it back to Python, I was under the impression that the GC was
          a generational collector and not a simple mark-sweep, but it's been a
          while since I read about it...

          -- bjorn

          Comment

          • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

            #6
            Re: Python Memory Manager

            >Also, if it does, how does it deal with memory segmentation? This
            >question bothers me because I've been trying to implement a moving
            >garbage collector, and am not sure how to deal with updating all
            >program pointers to objects on the heap, and thought perhaps an answer
            >to this question would give me some ideas.
            >
            As I understand it, Python primarily uses reference counting, with a
            mark and sweep scheme for cycle breaking tacked on as an afterthought.
            That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
            scheme that allows incremental collection without write barriers. This
            particular scheme heavily relies on refcounting itself (specifically,
            an object is garbage in a certain generation when all references to
            it come from the same generation).

            As for the consequences of the scheme (i.e. no compaction), you are
            right.

            Regards,
            Martin

            Comment

            • Paul Rubin

              #7
              Re: Python Memory Manager

              "Martin v. Löwis" <martin@v.loewi s.dewrites:
              That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
              scheme that allows incremental collection without write barriers. This
              particular scheme heavily relies on refcounting itself (specifically,
              an object is garbage in a certain generation when all references to
              it come from the same generation).
              Ah, thanks. I made another post which I guess is also somewhat wrong.

              Comment

              • greg

                #8
                Re: Python Memory Manager

                Christian Heimes wrote:
                In release builds PyObject_HEAD only contains the ref count and a link
                to the object type. In Py_DEBUG builds it also contains a double linked
                list of all allocated objects to debug reference counting bugs.
                There's also a doubly-linked list used by the cycle detector,
                but it doesn't hold all objects, only those that need to
                participate in cyclic GC. Objects such as numbers and strings,
                which don't contain references to other objects, don't need
                to be on this list, since they can never be part of a cycle.

                Some other immutable types such as tuples also don't need to
                be on it, even though they contain references, because it's
                impossible to create a cycle consisting entirely of such
                objects. There has to be at least one mutable object in the
                cycle, and the GC will be able to find the cycle via that
                object.

                --
                Greg

                Comment

                • MartinRinehart@gmail.com

                  #9
                  Re: Python Memory Manager



                  Paul Rubin wrote:
                  The problem here is with a high allocation rate, you have to GC a lot
                  more often, which typically involves copying live data.
                  This is last century's issue. Copying data, RAM to RAM, is nearly free
                  using the Intel architecture.

                  This short article, http://www.martinrinehart.com/articles/repz.html
                  explains why.

                  I'd use one int per clock as a rule of thumb for the current copy rate
                  in a single-core CPU.

                  Comment

                  • rbossy@jouy.inra.fr

                    #10
                    Re: Python Memory Manager

                    Quoting Steve Holden <steve@holdenwe b.com>:
                    [...]
                    Not only that, but all pointers to an object have to be updated when it
                    is relocated.
                    "Any problem in computer science can be solved by another level of indirection"
                    -- David John Wheeler

                    ;-)

                    RB

                    Comment

                    • Jeff Schwab

                      #11
                      Re: Python Memory Manager

                      MartinRinehart@ gmail.com wrote:
                      >
                      Paul Rubin wrote:
                      >The problem here is with a high allocation rate, you have to GC a lot
                      >more often, which typically involves copying live data.
                      >
                      This is last century's issue. Copying data, RAM to RAM, is nearly free
                      using the Intel architecture.
                      What's "the Intel architecture?" Do you mean the x86_64 architecture
                      that was actually developed by AMD, or x86 for x some number, or do
                      you actually mean IA64?
                      >
                      This short article, http://www.martinrinehart.com/articles/repz.html
                      explains why.
                      >
                      I'd use one int per clock as a rule of thumb for the current copy rate
                      in a single-core CPU.

                      Comment

                      • MartinRinehart@gmail.com

                        #12
                        Re: Python Memory Manager



                        Jeff Schwab wrote:
                        What's "the Intel architecture?" Do you mean the x86_64 architecture
                        that was actually developed by AMD, or x86 for x some number, or do
                        you actually mean IA64?
                        I mean chips of the family that goes back to the 8086 and 8088 chips,
                        chips that support the REPZ prefix to the MOVSW instruction.

                        Comment

                        • Paul Rubin

                          #13
                          Re: Python Memory Manager

                          MartinRinehart@ gmail.com writes:
                          I mean chips of the family that goes back to the 8086 and 8088 chips,
                          chips that support the REPZ prefix to the MOVSW instruction.
                          repz movsw is a pretty lame way to copy data on current x86's.
                          Use XMM instead.

                          Comment

                          • MartinRinehart@gmail.com

                            #14
                            Re: Python Memory Manager



                            Steve Holden wrote:
                            You have a strange idea of "nearly free" ...
                            >
                            Extending an integer array from 100 to 150 items is a pretty puny
                            operation when you compare it with the amount of data that might need to
                            be moved during a compactifying garbage collection of a 20MB Python
                            program image.
                            20 MBs = 5 M 32-bit words = 1.25 millis to move half of them on a
                            2GHz
                            machine. Don't know how much a milli costs where you live.


                            Comment

                            • MartinRinehart@gmail.com

                              #15
                              Re: Python Memory Manager



                              Paul Rubin wrote:
                              repz movsw is a pretty lame way to copy data on current x86's.
                              Use XMM instead.
                              Thank you, Paul. I'm pretty sure you meant MMX, Multi-Media
                              eXtensions.

                              Paul's just told me to upgrade my 32-bit thinking to use newer 64-bit
                              registers, even on a 32-bit cpu. Please divide my prior timings by
                              two.
                              Pentium or later required.

                              Comment

                              Working...