2.6, 3.0, and truly independent intepreters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jesse Noller

    #31
    Re: 2.6, 3.0, and truly independent intepreters

    On Fri, Oct 24, 2008 at 3:17 PM, Andy O'Meara <andy55@gmail.c omwrote:
    I'm a lousy writer sometimes, but I feel bad if you took the time to
    describe threads vs processes. The only reason I raised IPC with my
    "messaging isn't very attractive" comment was to respond to Glenn
    Linderman's points regarding tradeoffs of shared memory vs no.
    >
    I actually took the time to bring anyone listening in up to speed, and
    to clarify so I could better understand your use case. Don't feel bad,
    things in the thread are moving fast and I just wanted to clear it up.

    Ideally, we all want to improve the language, and the interpreter.
    However trying to push it towards a particular use case is dangerous
    given the idea of "general use".

    -jesse

    Comment

    • Rhamphoryncus

      #32
      Re: 2.6, 3.0, and truly independent intepreters

      On Oct 24, 1:02 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
      On approximately 10/24/2008 8:42 AM, came the following characters from
      the keyboard of Andy O'Meara:
      >
      Glenn, great post and points!
      >
      Thanks. I need to admit here that while I've got a fair bit of
      professional programming experience, I'm quite new to Python -- I've not
      learned its internals, nor even the full extent of its rich library. So
      I have some questions that are partly about the goals of the
      applications being discussed, partly about how Python is constructed,
      and partly about how the library is constructed. I'm hoping to get a
      better understanding of all of these; perhaps once a better
      understanding is achieved, limitations will be understood, and maybe
      solutions be achievable.
      >
      Let me define some speculative Python interpreters; I think the first is
      today's Python:
      >
      PyA: Has a GIL. PyA threads can run within a process; but are
      effectively serialized to the places where the GIL is obtained/released.
      Needs the GIL because that solves lots of problems with non-reentrant
      code (an example of non-reentrant code, is code that uses global (C
      global, or C static) variables – note that I'm not talking about Python
      vars declared global... they are only module global). In this model,
      non-reentrant code could include pieces of the interpreter, and/or
      extension modules.
      >
      PyB: No GIL. PyB threads acquire/release a lock around each reference to
      a global variable (like "with" feature). Requires massive recoding of
      all code that contains global variables. Reduces performance
      significantly by the increased cost of obtaining and releasing locks.
      >
      PyC: No locks. Instead, recoding is done to eliminate global variables
      (interpreter requires a state structure to be passed in). Extension
      modules that use globals are prohibited... this eliminates large
      portions of the library, or requires massive recoding. PyC threads do
      not share data between threads except by explicit interfaces.
      >
      PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
      global variables, and each interpreter instance is provided a state
      structure. There is still a GIL, however, because globals are
      potentially still used by some modules. Code is added to detect use of
      global variables by a module, or some contract is written whereby a
      module can be declared to be reentrant and global-free. PyA threads will
      obtain the GIL as they would today. PyC threads would be available to be
      created. PyC instances refuse to call non-reentrant modules, but also
      need not obtain the GIL... PyC threads would have limited module support
      initially, but over time, most modules can be migrated to be reentrant
      and global-free, so they can be used by PyC instances. Most 3rd-party
      libraries today are starting to care about reentrancy anyway, because of
      the popularity of threads.
      PyE: objects are reclassified as shareable or non-shareable, many
      types are now only allowed to be shareable. A module and its classes
      become shareable with the use of a __future__ import, and their
      shareddict uses a read-write lock for scalability. Most other
      shareable objects are immutable. Each thread is run in its own
      private monitor, and thus protected from the normal threading memory
      module nasties. Alas, this gives you all the semantics, but you still
      need scalable garbage collection.. and CPython's refcounting needs the
      GIL.

      Our software runs in real time (so performance is paramount),
      interacts with other static libraries, depends on worker threads to
      perform real-time image manipulation, and leverages Windows and Mac OS
      API concepts and features.  Python's performance hits have generally
      been a huge challenge with our animators because they often have to go
      back and massage their python code to improve execution performance.
      So, in short, there are many reasons why we use python as a part
      rather than a whole.
      [...]
      As a python language fan an enthusiast, don't let lua win!  (I say
      this endearingly of course--I have the utmost respect for both
      communities and I only want to see CPython be an attractive pick when
      a company is looking to embed a language that won't intrude upon their
      app's design).
      I agree with the problem, and desire to make python fill all niches,
      but let's just say I'm more ambitious with my solution. ;)

      Comment

      • Andy O'Meara

        #33
        Re: 2.6, 3.0, and truly independent intepreters


        Another great post, Glenn!! Very well laid-out and posed!! Thanks for
        taking the time to lay all that out.
        >
        Questions for Andy: is the type of work you want to do in independent
        threads mostly pure Python? Or with libraries that you can control to
        some extent? Are those libraries reentrant? Could they be made
        reentrant? How much of the Python standard library would need to be
        available in reentrant mode to provide useful functionality for those
        threads? I think you want PyC
        >
        I think you've defined everything perfectly, and you're you're of
        course correct about my love for for the PyC model. :^)

        Like any software that's meant to be used without restrictions, our
        code and frameworks always use a context object pattern so that
        there's never and non-const global/shared data). I would go as far to
        say that this is the case with more performance-oriented software than
        you may think since it's usually a given for us to have to be parallel
        friendly in as many ways as possible. Perhaps Patrick can back me up
        there.

        As to what modules are "essential" ... As you point out, once
        reentrant module implementations caught on in PyC or hybrid world, I
        think we'd start to see real effort to whip them into compliance--
        there's just so much to be gained imho. But to answer the question,
        there's the obvious ones (operator, math, etc), string/buffer
        processing (string, re), C bridge stuff (struct, array), and OS basics
        (time, file system, etc). Nice-to-haves would be buffer and image
        decompression (zlib, libpng, etc), crypto modules, and xml. As far as
        I can imagine, I have to believe all of these modules already contain
        little, if any, global data, so I have to believe they'd be super easy
        to make "PyC happy". Patrick, what would you see you guys using?

        That's the rub...  In our case, we're doing image and video
        manipulation--stuff not good to be messaging from address space to
        address space.  The same argument holds for numerical processing with
        large data sets.  The workers handing back huge data sets via
        messaging isn't very attractive.
        >
        In the module multiprocessing environment could you not use shared
        memory, then, for the large shared data items?
        >
        As I understand things, the multiprocessing puts stuff in a child
        process (i.e. a separate address space), so the only to get stuff to/
        from it is via IPC, which can include a shared/mapped memory region.
        Unfortunately, a shared address region doesn't work when you have
        large and opaque objects (e.g. a rendered CoreVideo movie in the
        QuickTime API or 300 megs of audio data that just went through a
        DSP). Then you've got the hit of serialization if you're got
        intricate data structures (that would normally would need to be
        serialized, such as a hashtable or something). Also, if I may speak
        for commercial developers out there who are just looking to get the
        job done without new code, it's usually always preferable to just a
        single high level sync object (for when the job is complete) than to
        start a child processes and use IPC. The former is just WAY less
        code, plain and simple.


        Andy


        Comment

        • Jesse Noller

          #34
          Re: 2.6, 3.0, and truly independent intepreters

          On Fri, Oct 24, 2008 at 4:51 PM, Andy O'Meara <andy55@gmail.c omwrote:
          >In the module multiprocessing environment could you not use shared
          >memory, then, for the large shared data items?
          >>
          >
          As I understand things, the multiprocessing puts stuff in a child
          process (i.e. a separate address space), so the only to get stuff to/
          from it is via IPC, which can include a shared/mapped memory region.
          Unfortunately, a shared address region doesn't work when you have
          large and opaque objects (e.g. a rendered CoreVideo movie in the
          QuickTime API or 300 megs of audio data that just went through a
          DSP). Then you've got the hit of serialization if you're got
          intricate data structures (that would normally would need to be
          serialized, such as a hashtable or something). Also, if I may speak
          for commercial developers out there who are just looking to get the
          job done without new code, it's usually always preferable to just a
          single high level sync object (for when the job is complete) than to
          start a child processes and use IPC. The former is just WAY less
          code, plain and simple.
          >
          Are you familiar with the API at all? Multiprocessing was designed to
          mimic threading in about every way possible, the only restriction on
          shared data is that it must be serializable, but event then you can
          override or customize the behavior.

          Also, inter process communication is done via pipes. It can also be
          done with messages if you want to tweak the manager(s).

          -jesse

          Comment

          • Rhamphoryncus

            #35
            Re: 2.6, 3.0, and truly independent intepreters

            On Oct 24, 2:59 pm, Glenn Linderman <gl...@nevcal.c omwrote:
            On approximately 10/24/2008 1:09 PM, came the following characters from
            the keyboard of Rhamphoryncus:
            PyE: objects are reclassified as shareable or non-shareable, many
            types are now only allowed to be shareable.  A module and its classes
            become shareable with the use of a __future__ import, and their
            shareddict uses a read-write lock for scalability.  Most other
            shareable objects are immutable.  Each thread is run in its own
            private monitor, and thus protected from the normal threading memory
            module nasties.  Alas, this gives you all the semantics, but you still
            need scalable garbage collection.. and CPython's refcounting needs the
            GIL.
            >
            Hmm.  So I think your PyE is an instance is an attempt to be more
            explicit about what I said above in PyC: PyC threads do not share data
            between threads except by explicit interfaces.  I consider your
            definitions of shared data types somewhat orthogonal to the types of
            threads, in that both PyA and PyC threads could use these new shared
            data items.
            Unlike PyC, there's a *lot* shared by default (classes, modules,
            function), but it requires only minimal recoding. It's as close to
            "have your cake and eat it too" as you're gonna get.

            I think/hope that you meant that "many types are now only allowed to be
            non-shareable"?  At least, I think that should be the default; they
            should be within the context of a single, independent interpreter
            instance, so other interpreters don't even know they exist, much less
            how to share them.  If so, then I understand most of the rest of your
            paragraph, and it could be a way of providing shared objects, perhaps.
            There aren't multiple interpreters under my model. You only need
            one. Instead, you create a monitor, and run a thread on it. A list
            is not shareable, so it can only be used within the monitor it's
            created within, but the list type object is shareable.

            I've no interest in *requiring* a C/C++ extension to communicate
            between isolated interpreters. Without that they're really no better
            than processes.

            Comment

            • Rhamphoryncus

              #36
              Re: 2.6, 3.0, and truly independent intepreters

              On Oct 24, 3:02 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
              On approximately 10/23/2008 2:24 PM, came the following characters from the
              keyboard of Rhamphoryncus:
              >>
              >On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
              >>
              >>>
              >>On approximately 10/23/2008 12:24 AM, came the following characters from
              >>the keyboard of Christian Heimes
              >>>>
              >>>Andy wrote:
              >>>I'm very - not absolute, but very - sure that Guido and the initial
              >>>designers of Python would have added the GIL anyway. The GIL makes
              >>>Python faster on single core machines and more stable on multi core
              >>>machines.
              >
              Actually, the GIL doesn't make Python faster; it is a design decision that
              reduces the overhead of lock acquisition, while still allowing use of global
              variables.
              >
              Using finer-grained locks has higher run-time cost; eliminating the use of
              global variables has a higher programmer-time cost, but would actually run
              faster and more concurrently than using a GIL. Especially on a
              multi-core/multi-CPU machine.
              Those "globals" include classes, modules, and functions. You can't
              have *any* objects shared. Your interpreters are entirely isolated,
              much like processes (and we all start wondering why you don't use
              processes in the first place.)

              Or use safethread. It imposes safe semantics on shared objects, so
              you can keep your global classes, modules, and functions. Still need
              garbage collection though, and on CPython that means refcounting and
              the GIL.

              >Another peeve I have is his characterizatio n of the observer pattern.
              >The generalized form of the problem exists in both single-threaded
              >sequential programs, in the form of unexpected reentrancy, and message
              >passing, with infinite CPU usage or infinite number of pending
              >messages.
              >>
              >
              So how do you get reentrancy is a single-threaded sequential program? I
              think only via recursion? Which isn't a serious issue for the observer
              pattern. If you add interrupts, then your program is no longer sequential..
              Sorry, I meant recursion. Why isn't it a serious issue for
              single-threaded programs? Just the fact that it's much easier to
              handle when it does happen?

              >Try looking at it on another level: when your CPU wants to read from a
              >bit of memory controlled by another CPU it sends them a message
              >requesting they get it for us. They send back a message containing
              >that memory. They also note we have it, in case they want to modify
              >it later. We also note where we got it, in case we want to modify it
              >(and not wait for them to do modifications for us).
              >>
              >
              I understand that level... one of my degrees is in EE, and I started college
              wanting to design computers (at about the time the first microprocessor chip
              came along, and they, of course, have now taken over). But I was side-lined
              by the malleability of software, and have mostly practiced software during
              my career.
              >
              Anyway, that is the level that Herb Sutter was describing in the Dr Dobbs
              articles I mentioned. And the overhead of doing that at the level of a cache
              line is high, if there is lots of contention for particular memory locations
              between threads running on different cores/CPUs. So to achieve concurrency,
              you must not only limit explicit software locks, but must also avoid memory
              layouts where data needed by different cores/CPUs are in the same cache
              line.
              I suspect they'll end up redesigning the caching to use a size and
              alignment of 64 bits (or smaller). Same cache line size, but with
              masking.

              You still need to minimize contention of course, but that should at
              least be more predictable. Having two unrelated mallocs contend could
              suck.

              >Message passing vs shared memory isn't really a yes/no question. It's
              >about ratios, usage patterns, and tradeoffs. *All* programs will
              >share data, but in what way? If it's just the code itself you can
              >move the cache validation into software and simplify the CPU, making
              >it faster. If the shared data is a lot more than that, and you use it
              >to coordinate accesses, then it'll be faster to have it in hardware.
              >>
              >
              I agree there are tradeoffs... unfortunately, the hardware architectures
              vary, and the languages don't generally understand the hardware. So then it
              becomes an OS API, which adds the overhead of an OS API call to the cost of
              the synchronization ... It could instead be (and in clever applications is) a
              non-portable assembly level function that wraps on OS locking or waiting
              API.
              In practice I highly doubt we'll see anything that doesn't extend
              traditional threading (posix threads, whatever MS has, etc).

              Nonetheless, while putting the shared data accesses in hardware might be
              more efficient per unit operation, there are still tradeoffs: A software
              solution can group multiple accesses under a single lock acquisition; the
              hardware probably doesn't have enough smarts to do that. So it may well
              require many more hardware unit operations for the same overall concurrently
              executed function, and the resulting performance may not be any better.
              Speculative ll/sc? ;)

              Sidestepping the whole issue, by minimizing shared data in the application
              design, avoiding not only software lock calls, and hardware cache
              contention, is going to provide the best performance... it isn't the things
              you do efficiently that make software fast — it is the things you don'tdo
              at all.
              Minimizing contention, certainly. Minimizing the shared data itself
              is iffier though.

              Comment

              • Adam Olsen

                #37
                Re: 2.6, 3.0, and truly independent intepreters

                On Fri, Oct 24, 2008 at 4:48 PM, Glenn Linderman <v+python@g.nev cal.comwrote:
                On approximately 10/24/2008 2:15 PM, came the following characters from the
                keyboard of Rhamphoryncus:
                >>
                >On Oct 24, 2:59 pm, Glenn Linderman <gl...@nevcal.c omwrote:
                >>
                >>>
                >>On approximately 10/24/2008 1:09 PM, came the following characters from
                >>the keyboard of Rhamphoryncus:
                >>>
                >>>>
                >>>PyE: objects are reclassified as shareable or non-shareable, many
                >>>types are now only allowed to be shareable. A module and its classes
                >>>become shareable with the use of a __future__ import, and their
                >>>shareddict uses a read-write lock for scalability. Most other
                >>>shareable objects are immutable. Each thread is run in its own
                >>>private monitor, and thus protected from the normal threading memory
                >>>module nasties. Alas, this gives you all the semantics, but you still
                >>>need scalable garbage collection.. and CPython's refcounting needs the
                >>>GIL.
                >>>>
                >>>
                >>Hmm. So I think your PyE is an instance is an attempt to be more
                >>explicit about what I said above in PyC: PyC threads do not share data
                >>between threads except by explicit interfaces. I consider your
                >>definitions of shared data types somewhat orthogonal to the types of
                >>threads, in that both PyA and PyC threads could use these new shared
                >>data items.
                >>>
                >>
                >Unlike PyC, there's a *lot* shared by default (classes, modules,
                >function), but it requires only minimal recoding. It's as close to
                >"have your cake and eat it too" as you're gonna get.
                >>
                >
                Yes, but I like my cake frosted with performance; Guido's non-acceptance of
                granular locks in the blog entry someone referenced was due to the slowdown
                acquired with granular locking and shared objects. Your PyE model, with
                highly granular sharing, will likely suffer the same fate.
                No, my approach includes scalable performance. Typical paths will
                involve *no* contention (ie no locking). classes and modules use
                shareddict, which is based on a read-write lock built into the
                interpreter, so it's uncontended for read-only usage patterns. Pretty
                much everything else is immutable.

                Of course that doesn't include the cost of garbage collection.
                CPython's refcounting can't scale.

                The independent threads model, with only slight locking for a few explicitly
                shared objects, has a much better chance of getting better performance
                overall. With one thread running, it would be the same as today; with
                multiple threads, it should scale at the same rate as the system... minus
                any locking done at the higher level.
                So use processes with a little IPC for these expensive-yet-"shared"
                objects. multiprocessing does it already.

                >>I think/hope that you meant that "many types are now only allowed to be
                >>non-shareable"? At least, I think that should be the default; they
                >>should be within the context of a single, independent interpreter
                >>instance, so other interpreters don't even know they exist, much less
                >>how to share them. If so, then I understand most of the rest of your
                >>paragraph, and it could be a way of providing shared objects, perhaps.
                >>>
                >>
                >There aren't multiple interpreters under my model. You only need
                >one. Instead, you create a monitor, and run a thread on it. A list
                >is not shareable, so it can only be used within the monitor it's
                >created within, but the list type object is shareable.
                >>
                >
                The python interpreter code should be sharable, having been written in C,
                and being/becoming reentrant. So in that sense, there is only one
                interpreter. Similarly, any other reentrant C extensions would be that way.
                On the other hand, each thread of execution requires its own interpreter
                context, so that would have to be independent for the threads to be
                independent. It is the combination of code+context that I call an
                interpreter, and there would be one per thread for PyC threads. Bytecode
                for loaded modules could potentially be shared, if it is also immutable.
                However, that could be in my mental "phase 2", as it would require an extra
                level of complexity in the interpreter as it creates shared bytecode...
                there would be a memory savings from avoiding multiple copies of shared
                bytecode, likely, and maybe also a compilation performance savings. So it
                sounds like a win, but it is a win that can deferred for initial simplicity,
                to prove the concept is or is not workable.
                >
                A monitor allows a single thread to run at a time; that is the same
                situation as the present GIL. I guess I don't fully understand your model.
                To use your terminology, each monitor is a context. Each thread
                operates in a different monitor. As you say, most C functions are
                already thread-safe (reentrant). All I need to do is avoid letting
                multiple threads modify a single mutable object (such as a list) at a
                time, which I do by containing it within a single monitor (context).


                --
                Adam Olsen, aka Rhamphoryncus

                Comment

                • Adam Olsen

                  #38
                  Re: 2.6, 3.0, and truly independent intepreters

                  On Fri, Oct 24, 2008 at 5:38 PM, Glenn Linderman <v+python@g.nev cal.comwrote:
                  On approximately 10/24/2008 2:16 PM, came the following characters from the
                  keyboard of Rhamphoryncus:
                  >>
                  >On Oct 24, 3:02 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
                  >>
                  >>>
                  >>On approximately 10/23/2008 2:24 PM, came the following characters from
                  >>the
                  >>keyboard of Rhamphoryncus:
                  >>>
                  >>>>
                  >>>On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
                  >>>>
                  >>>>
                  >>>>>
                  >>>>On approximately 10/23/2008 12:24 AM, came the following characters
                  >>>>from
                  >>>>the keyboard of Christian Heimes
                  >>>>>
                  >>>>>>
                  >>>>>Andy wrote:
                  >>>>>I'm very - not absolute, but very - sure that Guido and the initial
                  >>>>>designer s of Python would have added the GIL anyway. The GIL makes
                  >>>>>Python faster on single core machines and more stable on multi core
                  >>>>>machines .
                  >>>>>>
                  >>>
                  >>Actually, the GIL doesn't make Python faster; it is a design decision
                  >>that
                  >>reduces the overhead of lock acquisition, while still allowing use of
                  >>global
                  >>variables.
                  >>>
                  >>Using finer-grained locks has higher run-time cost; eliminating the use
                  >>of
                  >>global variables has a higher programmer-time cost, but would actually
                  >>run
                  >>faster and more concurrently than using a GIL. Especially on a
                  >>multi-core/multi-CPU machine.
                  >>>
                  >>
                  >Those "globals" include classes, modules, and functions. You can't
                  >have *any* objects shared. Your interpreters are entirely isolated,
                  >much like processes (and we all start wondering why you don't use
                  >processes in the first place.)
                  >>
                  >
                  Indeed; isolated, independent interpreters are one of the goals. It is,
                  indeed, much like processes, but in a single address space. It allows the
                  master process (Python or C for the embedded case) to be coded using memory
                  references and copies and pointer swaps instead of using semaphores, and
                  potentially multi-megabyte message transfers.
                  >
                  It is not clear to me that with the use of shared memory between processes,
                  that the application couldn't use processes, and achieve many of the same
                  goals. On the other hand, the code to create and manipulate processes and
                  shared memory blocks is harder to write and has more overhead than the code
                  to create and manipulate threads, which can, when told, access any memory
                  block in the process. This allows the shared memory to be resized more
                  easily, or more blocks of shared memory created more easily. On the other
                  hand, the creation of shared memory blocks shouldn't be a high-use operation
                  in a program that has sufficient number crunching to do to be able to
                  consume multiple cores/CPUs.
                  >
                  >Or use safethread. It imposes safe semantics on shared objects, so
                  >you can keep your global classes, modules, and functions. Still need
                  >garbage collection though, and on CPython that means refcounting and
                  >the GIL.
                  >>
                  >
                  Sounds like safethread has 35-40% overhead. Sounds like too much, to me.
                  The specific implementation of safethread, which attempts to remove
                  the GIL from CPython, has significant overhead and had very limited
                  success at being scalable.

                  The monitor design proposed by safethread has no inherent overhead and
                  is completely scalable.


                  --
                  Adam Olsen, aka Rhamphoryncus

                  Comment

                  • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

                    #39
                    Re: 2.6, 3.0, and truly independent intepreters

                    >A c-level module, on the other hand, can sidestep/release
                    >the GIL at will, and go on it's merry way and process away.
                    >
                    ...Unless part of the C module execution involves the need do CPU-
                    bound work on another thread through a different python interpreter,
                    right?
                    Wrong.
                    (even if the interpreter is 100% independent, yikes).
                    Again, wrong.
                    For
                    example, have a python C module designed to programmaticall y generate
                    images (and video frames) in RAM for immediate and subsequent use in
                    animation. Meanwhile, we'd like to have a pthread with its own
                    interpreter with an instance of this module and have it dequeue jobs
                    as they come in (in fact, there'd be one of these threads for each
                    excess core present on the machine).
                    I don't understand how this example involves multiple threads. You
                    mention a single thread (running the module), and you mention designing
                    a module. Where is the second thread?

                    Let's assume there is another thread producing jobs, and then
                    a thread that generates the images. The structure would be this

                    while 1:
                    job = queue.get()
                    processing_modu le.process(job)

                    and in process:

                    PyArg_ParseTupl e(args, "s", job_data);
                    result = PyString_New(bu fsize);
                    buf = PyString_AsStri ng(result);
                    Py_BEGIN_ALLOW_ THREADS
                    compute_frame(j ob_data, buf);
                    Py_END_ALLOW_TH READS
                    return PyString_FromSt ring(buf);

                    All these compute_frames could happily run in parallel.
                    As far as I can tell, it seems
                    CPython's current state can't CPU bound parallelization in the same
                    address space.
                    That's not true.

                    Regards,
                    Martin

                    Comment

                    • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

                      #40
                      Re: 2.6, 3.0, and truly independent intepreters

                      It seems to me that the very simplest move would be to remove global
                      static data so the app could provide all thread-related data, which
                      Andy suggests through references to the QuickTime API. This would
                      suggest compiling python without thread support so as to leave it up
                      to the application.
                      I'm not sure whether you realize that this is not simple at all.
                      Consider this fragment

                      if (string == Py_None || index >= state->lastmark ||
                      !state->mark[index] || !state->mark[index+1]) {
                      if (empty)
                      /* want empty string */
                      i = j = 0;
                      else {
                      Py_INCREF(Py_No ne);
                      return Py_None;

                      Py_None here is a global variable. How would you replace it?
                      It's used in thousands of places.

                      For another example, consider

                      PyErr_SetString (PyExc_ValueErr or,
                      "Empty module name");
                      or

                      dp = PyObject_New(db mobject, &Dbmtype);

                      There are tons of different variables denoting exceptions and
                      other types which all somehow need to be rewritten (likely with
                      undesirable effects on readability).

                      So I don't think that this is a simple solution. It's the right
                      one, but it will take five or ten years to implement.

                      Regards,
                      Martin

                      Comment

                      • Terry Reedy

                        #41
                        Re: 2.6, 3.0, and truly independent intepreters

                        Glenn Linderman wrote:
                        For example, Python presently has a rather stupid algorithm for string
                        concatenation.
                        Python the language has syntax and semantics. Python implementations
                        have algorithms that fulfill the defined semantics.
                        It allocates only the exactly necessary space for the
                        concatenated string. This is a brilliant move, when you realize that
                        strings are immutable, and once allocated can never change, but the
                        operation
                        >
                        for line in mylistofstrings :
                        string = string + line
                        >
                        is basically O(N-squared) as a result. The better algorithm would
                        double the size of memory allocated for string each time there is not
                        enough room to add the next line, and that reduces the cost of the
                        algorithm to O(N).
                        If there is more than one reference to a guaranteed immutable object,
                        such as a string, the 'stupid' algorithm seem necessary to me. In-place
                        modification of a shared immutable would violate semantics.

                        However, if you do

                        string = ''
                        for line in strings:
                        string =+ line

                        so that there is only one reference and you tell the interpreter that
                        you don't mind the old value being updated, then I believe in 2.6, if
                        not before, CPython does overallocation and in-place extension. (I am
                        not sure about s=s+l.) But this is just ref-counted CPython.

                        Terry Jan Reedy

                        Comment

                        • greg

                          #42
                          Re: 2.6, 3.0, and truly independent intepreters

                          Andy O'Meara wrote:
                          I would definitely agree if there was a context (i.e. environment)
                          object passed around then perhaps we'd have the best of all worlds.
                          Moreover, I think this is probably the *only* way that
                          totally independent interpreters could be realized.

                          Converting the whole C API to use this strategy would be
                          a very big project. Also, on the face of it, it seems like
                          it would render all existing C extension code obsolete,
                          although it might be possible to do something clever with
                          macros to create a compatibility layer.

                          Another thing to consider is that passing all these extra
                          pointers around everywhere is bound to have some effect
                          on performance. The idea mightn't go down too well if it
                          slows things significantly in the case where you're only
                          using one interpreter.

                          --
                          Greg

                          Comment

                          • greg

                            #43
                            Re: 2.6, 3.0, and truly independent intepreters

                            Andy O'Meara wrote:
                            - each worker thread makes its own interpreter, pops scripts off a
                            work queue, and manages exporting (and then importing) result data to
                            other parts of the app.
                            I hope you realize that starting up one of these interpreters
                            is going to be fairly expensive. It will have to create its
                            own versions of all the builtin constants and type objects,
                            and import its own copy of all the modules it uses.

                            One wonders if it wouldn't be cheaper just to fork the
                            process. Shared memory can be used to transfer large lumps
                            of data if needed.

                            --
                            Greg

                            Comment

                            • greg

                              #44
                              Re: 2.6, 3.0, and truly independent intepreters

                              Glenn Linderman wrote:
                              If Py_None corresponds to None in Python syntax ... then
                              it is a fixed constant and could be left global, probably.
                              No, it couldn't, because it's a reference-counted object
                              like any other Python object, and therefore needs to be
                              protected against simultaneous refcount manipulation by
                              different threads. So each interpreter would need its own
                              instance of Py_None.

                              The same goes for all the other built-in constants and
                              type objects -- there are dozens of these.
                              The cost is one more push on every function call,
                              Which sounds like it could be a rather high cost! If
                              (just a wild guess) each function has an average of 2
                              parameters, then this is increasing the amount of
                              argument pushing going on by 50%...
                              On many platforms, there is the concept of TLS, or thread-local storage.
                              That's another possibility, although doing it that
                              way would require you to have a separate thread for
                              each interpreter, which you mightn't always want.

                              --
                              Greg

                              Comment

                              • greg

                                #45
                                Re: 2.6, 3.0, and truly independent intepreters

                                Andy O'Meara wrote:
                                In our case, we're doing image and video
                                manipulation--stuff not good to be messaging from address space to
                                address space.
                                Have you considered using shared memory?

                                Using mmap or equivalent, you can arrange for a block of
                                memory to be shared between processes. Then you can dump
                                the big lump of data to be transferred in there, and send
                                a short message through a pipe to the other process to
                                let it know it's there.

                                --
                                Greg

                                Comment

                                Working...