Python reliability

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jepler@unpythonic.net

    #31
    Re: Python's garbage collection was Re: Python reliability

    On Mon, Oct 10, 2005 at 08:37:03PM +0100, Tom Anderson wrote:[color=blue]
    > So python doesn't use the old SmallTalk 80 SmallInteger hack, or similar?
    > Fair enough - the performance gain is nice, but the extra complexity would
    > be a huge pain, i imagine.[/color]

    I tried to implement this once. There was not a performance gain for general
    code, and I also made the interpreter buggy in the process.

    I wrote in 2002:
    | Many Lisp interpreters use 'tagged types' to, among other things, let
    | small ints reside directly in the machine registers.
    |
    | Python might wish to take advantage of this by designating pointers to odd
    | addresses stand for integers according to the following relationship:
    | p = (i<<1) | 1
    | i = (p>>1)
    | (due to alignment requirements on all common machines, all valid
    | pointers-to-struct have 0 in their low bit) This means that all integers
    | which fit in 31 bits can be stored without actually allocating or deallocating
    | anything.
    |
    | I modified a Python interpreter to the point where it could run simple
    | programs. The changes are unfortunately very invasive, because they
    | make any C code which simply executes
    | o->ob_type
    | or otherwise dereferences a PyObject* invalid when presented with a
    | small int. This would obviously affect a huge amount of existing code in
    | extensions, and is probably enough to stop this from being implemented
    | before Python 3000.
    |
    | This also introduces another conditional branch in many pieces of code, such
    | as any call to PyObject_TypeCh eck().
    |
    | Performance results are mixed. A small program designed to test the
    | speed of all-integer arithmetic comes out faster by 14% (3.38 vs 2.90
    | "user" time on my machine) but pystone comes out 5% slower (14124 vs 13358
    | "pystones/second").
    |
    | I don't know if anybody's barked up this tree before, but I think
    | these results show that it's almost certainly not worth the effort to
    | incorporate this "performanc e" hack in Python. I'll keep my tree around
    | for awhile, in case anybody else wants to see it, but beware that it
    | still has serious issues even in the core:
    | >>> 0+0j
    | Traceback (most recent call last):
    | File "<stdin>", line 1, in ?
    | TypeError: unsupported operand types for +: 'int' and 'complex'
    | >>> (0).__class__
    | Segmentation fault
    |
    |


    Note that the tree where I worked on this is long since lost.

    Jeff

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.1 (GNU/Linux)

    iD8DBQFDTaFYJd0 1MZaTXX0RAsOkAJ 41KG4qEmy2GeH8V DR9r7nyZaYAFgCg iBPO
    XljYbktL1wdmt0O 3892AwJA=
    =Hmpn
    -----END PGP SIGNATURE-----

    Comment

    • Tom Anderson

      #32
      Re: Python's garbage collection was Re: Python reliability

      On Tue, 11 Oct 2005, Alex Martelli wrote:
      [color=blue]
      > Tom Anderson <twic@urchin.ea rth.li> wrote:
      > ...[color=green]
      >> Has anyone looked into using a real GC for python? I realise it would be a[/color]
      >
      > If you mean mark-and-sweep, with generational twists,[/color]

      Yes, more or less.
      [color=blue]
      > that's what gc uses for cyclic garbage.[/color]

      Do you mean what python uses for cyclic garbage? If so, i hadn't realised
      that. There are algorithms for extending refcounting to cyclic structures
      (i forget the details, but you sort of go round and experimentally
      decrement an object's count and see it ends up with a negative count or
      something), so i assumed python used one of those. Mind you, those are
      probably more complex than mark-and-sweep!
      [color=blue][color=green]
      >> lot more complexity in the interpreter itself, but it would be faster,
      >> more reliable, and would reduce the complexity of extensions.[/color]
      >
      > ??? It adds no complexity (it's already there), it's slower,[/color]

      Ah. That would be why all those java, .net, LISP, smalltalk and assorted
      other VMs out there, with decades of development, hojillions of dollars
      and the serried ranks of some of the greatest figures in computer science
      behind them all use reference counting rather than garbage collection,
      then.

      No, wait ...
      [color=blue]
      > it is, if anything, LESS reliable than reference counting (which is way
      > simpler!),[/color]

      Reliability is a red herring - in the absence of ill-behaved native
      extensions, and with correct implementations , both refcounting and GC are
      perfectly reliable. And you can rely on the implementation being correct,
      since any incorrectness will be detected very quickly!
      [color=blue]
      > and (if generalized to deal with ALL garbage) it might make it almost
      > impossible to write some kinds of extensions (ones which need to
      > interface existing C libraries that don't cooperate with whatever GC
      > collection you choose).[/color]

      Lucky those existing C libraries were written to use python's refcounting!

      Oh, you have to write a wrapper round the library to interface with the
      automatic memory management? Well, as it happens, the stuff you need to do
      is more or less identical for refcounting and GC - the extension has to
      tell the VM which of the VM's objects it holds references to, so that the
      VM knows that they aren't garbage.
      [color=blue]
      > Are we talking about the same thing?![/color]

      Doesn't look like it, does it?
      [color=blue][color=green]
      >> So python doesn't use the old SmallTalk 80 SmallInteger hack, or similar?
      >> Fair enough - the performance gain is nice, but the extra complexity would
      >> be a huge pain, i imagine.[/color]
      >
      > CPython currently is implemented on a strict "minimize all tricks"
      > strategy.[/color]

      A very, very sound principle. If you have the aforementioned decades,
      hojillions and serried ranks, an all-tricks-turned-up-to-eleven strategy
      can be made to work. If you're a relatively small non-profit outfit like
      the python dev team, minimising tricks buys you reliability and agility,
      which is, really, what we all want.

      tom

      --
      That's no moon!

      Comment

      • Fredrik Lundh

        #33
        Re: Python's garbage collection was Re: Python reliability

        Tom Anderson wrote:
        [color=blue]
        > In both smalltalk and python, every single variable contains a reference
        > to an object - there isn't the object/primitive distinction you find in
        > less advanced languages like java.
        >
        > Except that in smalltalk, this isn't true: in ST, every variable *appears*
        > to contain a reference to an object, but implementations may not actually
        > work like that.[/color]

        Python implementations don't have to work that way either. Please don't
        confuse "the Python language" with "the CPython implementation" and with
        other implementations (existing as well as hypothetical).

        (fwiw, switching to tagging in CPython would break most about everything.
        might as well start over, and nobody's likely to do that to speed up integer-
        dominated programs a little...)

        </F>



        Comment

        • Paul Rubin

          #34
          Re: Python's garbage collection was Re: Python reliability

          "Fredrik Lundh" <fredrik@python ware.com> writes:[color=blue]
          > (fwiw, switching to tagging in CPython would break most about
          > everything. might as well start over, and nobody's likely to do
          > that to speed up integer- dominated programs a little...)[/color]

          Yeah, a change of that magnitude in CPython would be madness, but
          the question is well worth visiting for PyPy.

          Comment

          • Alex Martelli

            #35
            Re: Python's garbage collection was Re: Python reliability

            Tom Anderson <twic@urchin.ea rth.li> wrote:
            [color=blue]
            > On Tue, 11 Oct 2005, Alex Martelli wrote:
            >[color=green]
            > > Tom Anderson <twic@urchin.ea rth.li> wrote:
            > > ...[color=darkred]
            > >> Has anyone looked into using a real GC for python? I realise it would be a[/color]
            > >
            > > If you mean mark-and-sweep, with generational twists,[/color]
            >
            > Yes, more or less.
            >[color=green]
            > > that's what gc uses for cyclic garbage.[/color]
            >
            > Do you mean what python uses for cyclic garbage? If so, i hadn't realised[/color]

            Yes, gc (a standard library module) gives you access to the mechanism
            (to some reasonable extent).
            [color=blue]
            > that. There are algorithms for extending refcounting to cyclic structures
            > (i forget the details, but you sort of go round and experimentally
            > decrement an object's count and see it ends up with a negative count or
            > something), so i assumed python used one of those. Mind you, those are
            > probably more complex than mark-and-sweep![/color]

            Not sure about that, when you consider the "generation al twists", but
            maybe.

            [color=blue][color=green][color=darkred]
            > >> lot more complexity in the interpreter itself, but it would be faster,
            > >> more reliable, and would reduce the complexity of extensions.[/color]
            > >
            > > ??? It adds no complexity (it's already there), it's slower,[/color]
            >
            > Ah. That would be why all those java, .net, LISP, smalltalk and assorted
            > other VMs out there, with decades of development, hojillions of dollars
            > and the serried ranks of some of the greatest figures in computer science
            > behind them all use reference counting rather than garbage collection,
            > then.
            >
            > No, wait ...[/color]

            Not everybody agrees that "practicali ty beats purity", which is one of
            Python's principles. A strategy based on PURE reference counting just
            cannot deal with cyclic garbage -- you'd also need the kind of kludges
            you refer to above, or a twin-barreled system like Python's. A strategy
            based on PURE mark-and-sweep *CAN* be complete and correct... at the
            cost of horrid delays, of course, but what's such a practical
            consideration to a real purist?-)

            In practice, more has probably been written about garbage collection
            implementations than about almost every issue in CS (apart from sorting
            and searching;-). Good techniques need to be "incrementa l" -- the need
            to "stop the world" for unbounded amounts of time (particularly in a
            paged virtual memory world...), typical of pure m&s (even with
            generational twists), is simply unacceptable in all but the most "batch"
            type of computations, which occupy a steadily narrowing niche.
            Reference counting is intrinsically "reasonably incremental"; the
            worst-case of very long singly-linked lists (such that a dec-to-0 at the
            head causes a cascade of N dec-to-0's all along) is as rare in Python as
            it is frequent in LISP (and other languages that go crazy with such
            lists -- Haskell, which defines *strings* as single linked lists of
            characters, being a particularly egregious example) [[admittedly, the
            techniques for amortizing the cost of such worst-cases are well known in
            any case, though CPython has not implemented them]].

            In any case, if you like Python (which is a LANGUAGE, after all) and
            don't like one implementation of it, why not use a different
            implementation, which uses a different virtual machine? Jython, for the
            JVM, and IronPython, for MSCLR (presumably what you call ".net"), are
            quite usable; project pypy is producing others (an implementation based
            on Common LISP was one of the first practical results, over a year ago);
            not to count Parrot, and other projects yet...

            [color=blue][color=green]
            > > it is, if anything, LESS reliable than reference counting (which is way
            > > simpler!),[/color]
            >
            > Reliability is a red herring - in the absence of ill-behaved native
            > extensions, and with correct implementations , both refcounting and GC are
            > perfectly reliable. And you can rely on the implementation being correct,
            > since any incorrectness will be detected very quickly![/color]

            Not necessarily: tiny memory leaks in supposedly "stable" versions of
            the JVM, for example, which get magnified in servers operating for
            extremely long times and on very large scales, keep turning up. So, you
            can't count on subtle and complicated implementations of garbage
            collection algorithms being correct, any more than you can count on that
            for (for example) subtle and complicated optimizations -- corner cases
            can be hidden everywhere.

            There are two ways to try to make a software system reliable: make it so
            simple that it obviously has no bugs, or make it so complicated that it
            has no obvious bugs. RC is definitely tilted towards the first of the
            two options (and so would be mark-and-sweep in the pure form, the one
            where you may need to stop everything for a LONG time once in a while),
            while more sophisticated GC schemes get more and more complicated.

            BTW, RC _IS_ a form of GC, just like, say, MS is.

            [color=blue][color=green]
            > > and (if generalized to deal with ALL garbage) it might make it almost
            > > impossible to write some kinds of extensions (ones which need to
            > > interface existing C libraries that don't cooperate with whatever GC
            > > collection you choose).[/color]
            >
            > Lucky those existing C libraries were written to use python's refcounting!
            >
            > Oh, you have to write a wrapper round the library to interface with the
            > automatic memory management? Well, as it happens, the stuff you need to do
            > is more or less identical for refcounting and GC - the extension has to
            > tell the VM which of the VM's objects it holds references to, so that the
            > VM knows that they aren't garbage.[/color]

            Ah, but there is an obvious difference, when we're comparing reference
            counting with mark and sweep in similarly simple incarnations: reference
            counting has no "sweep" phase! M&S relies on any memory area not
            otherwise accounted for being collectable during the "sweep" part, while
            RC will intrinsically and happily leave alone any memory area it does
            not know about. Adding sophistication to M&S often makes things even
            more ticklish, if there are "random" pieces of memory which must be
            hands-off -- the existing C library you're interfacing may give you no
            idea, on an API-accessible level, where such internal "random" pieces
            might be at any time. E.g., said existing libraries might be returning
            to you "opaque handles" -- say you know they're pointers (already you're
            having to breach the encapsulation and abstraction of the library you're
            interfacing...) , but pointers to WHAT? To structures which may
            internally hold other pointers yet -- and what do THOSE point to...?

            By ``handwaving'' about "the VM's objects" you imply a distinction
            between such "objects" and other generic "areas of memory" that may not
            be easy to maintain. In RC, no problem: the reference-counting
            operations intrinsically discriminate (you don't addref or decref to
            anything but such an "object"). In MS, the problem is definitely there;
            your VM's allocator needs to be able to control the "sweep", which may
            require quite a lot of extra overhead if it can't just assume it "owns"
            all of the memory.

            [color=blue][color=green]
            > > Are we talking about the same thing?![/color]
            >
            > Doesn't look like it, does it?[/color]

            Apparently not. Most of my "production-level" implementations of
            garbage collection schemes hark back to the late '70s (as part of my
            thesis) and early '80s (working at Texas Instruments to architect a
            general purpose CPU with some kind of GC support in hardware); after
            leaving the field, when I got back to it, years later, I essentially
            found out that the universality of paged virtual memory had changed
            every single parameter in the game. I did some work on
            pointer-swizzling "incrementa l sort-of-compacting" collectors, and
            conservative M&S a la Boehm, but by that time I was more interested in
            real-world applications and none of those efforts ever yielded anything
            practical and production-quality -- so, I either used existing libraries
            and VMs (and more often than not cursed at them -- and, generally, the
            FFIs whose gyrations I had to go through to work with them), or, when I
            was implementing GC in my applications, relied on simple and solid
            techniques such as variants on reference-counting, arenas, etc etc.

            The crusher was a prototype application based on Microsoft's CLR (or
            ".net", as you call it) which needed to use MSCLR's ``advanced, modern,
            sophisticated'' GC for many new parts, and "unmanaged" mode for a lot of
            existing "legacy" libraries and subsystems. I won't say that it wasted
            a year of my life, because I was leading that effort at about
            half-time... so, it only wasted HALF a year of my life!-) Of course,
            that was in the bad dark ages of about 5 years ago -- I'm sure that by
            now everything is perfect and flawless and the experience of the
            previous 25 years is thereby nullified, right?-)

            From your tone I assume that your experience in implementing and
            cooperating with modern, "advanced" GC techniques is much fresher and
            more successful than mine. Personally, I'm just happy that other Python
            developers must clearly have scars similar to mine in these respects, so
            that Python's implementation is solid and conservative, one whose
            correctness you CAN essentially count on.

            [color=blue][color=green][color=darkred]
            > >> So python doesn't use the old SmallTalk 80 SmallInteger hack, or similar?
            > >> Fair enough - the performance gain is nice, but the extra complexity would
            > >> be a huge pain, i imagine.[/color]
            > >
            > > CPython currently is implemented on a strict "minimize all tricks"
            > > strategy.[/color]
            >
            > A very, very sound principle. If you have the aforementioned decades,
            > hojillions and serried ranks, an all-tricks-turned-up-to-eleven strategy
            > can be made to work.[/color]

            78% of software projects fail -- and I believe the rate of failures in
            large IT departments and software houses is higher than average.
            [color=blue]
            > If you're a relatively small non-profit outfit like
            > the python dev team, minimising tricks buys you reliability and agility,
            > which is, really, what we all want.[/color]

            And if you're a relatively large (and tumultuously growing),
            pretty-good-profit outfit, minimizing tricks builds you scalability and
            solidity. Funny enough, I've found that my attitude that "clarity,
            solidity and prudence are THE prime criteria of good implementations ",
            born of thirty years' worth of scars and "arrows in my back", made me an
            "instant cultural fit" for Google (which I joined as Uber Technical Lead
            just over six months ago, and where I'm currently happily prospering)...


            Alex

            Comment

            Working...