2.6, 3.0, and truly independent intepreters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

    #61
    Re: 2.6, 3.0, and truly independent intepreters

    >>As far as I can tell, it seems
    >>CPython's current state can't CPU bound parallelization in the same
    >>address space.
    >That's not true.
    >>
    >
    Um... So let's say you have a opaque object ref from the OS that
    represents hundreds of megs of data (e.g. memory-resident video). How
    do you get that back to the parent process without serialization and
    IPC?
    What parent process? I thought you were talking about multi-threading?
    What should really happen is just use the same address space so
    just a pointer changes hands. THAT's why I'm saying that a separate
    address space is generally a deal breaker when you have large or
    intricate data sets (ie. when performance matters).
    Right. So use a single address space, multiple threads, and perform the
    heavy computations in C code. I don't see how Python is in the way at
    all. Many people do that, and it works just fine. That's what
    Jesse (probably) meant with his remark
    >A c-level module, on the other hand, can sidestep/release
    >the GIL at will, and go on it's merry way and process away.
    Please reconsider this; it might be a solution to your problem.

    Regards,
    Martin

    Comment

    • Andy O'Meara

      #62
      Re: 2.6, 3.0, and truly independent intepreters


      Grrr... I posted a ton of lengthy replies to you and other recent
      posts here using Google and none of them made it, argh. Poof. There's
      nothing that fires more up more than lost work, so I'll have to
      revert short and simple answers for the time being. Argh, damn.


      On Oct 25, 1:26 am, greg <g...@cosc.cant erbury.ac.nzwro te:
      Andy O'Meara wrote:
      I would definitely agree if there was a context (i.e. environment)
      object passed around then perhaps we'd have the best of all worlds.
      >
      Moreover, I think this is probably the *only* way that
      totally independent interpreters could be realized.
      >
      Converting the whole C API to use this strategy would be
      a very big project. Also, on the face of it, it seems like
      it would render all existing C extension code obsolete,
      although it might be possible to do something clever with
      macros to create a compatibility layer.
      >
      Another thing to consider is that passing all these extra
      pointers around everywhere is bound to have some effect
      on performance.

      I'm with you on all counts, so no disagreement there. On the "passing
      a ptr everywhere" issue, perhaps one idea is that all objects could
      have an additional field that would point back to their parent context
      (ie. their interpreter). So the only prototypes that would have to be
      modified to contain the context ptr would be the ones that don't
      inherently operate on objects (e.g. importing a module).


      On Oct 25, 1:54 am, greg <g...@cosc.cant erbury.ac.nzwro te:
      Andy O'Meara wrote:
      - each worker thread makes its own interpreter, pops scripts off a
      work queue, and manages exporting (and then importing) result data to
      other parts of the app.
      >
      I hope you realize that starting up one of these interpreters
      is going to be fairly expensive. It will have to create its
      own versions of all the builtin constants and type objects,
      and import its own copy of all the modules it uses.
      >
      Yeah, for sure. And I'd say that's a pretty well established
      convention already out there for any industry package. The pattern
      I'd expect to see is where the app starts worker threads, starts
      interpreters in one or more of each, and throws jobs to different ones
      (and the interpreter would persist to move on to subsequent jobs).
      One wonders if it wouldn't be cheaper just to fork the
      process. Shared memory can be used to transfer large lumps
      of data if needed.
      >
      As I mentioned, wen you're talking about intricate data structures, OS
      opaque objects (ie. that have their own internal allocators), or huge
      data sets, even a shared memory region unfortunately can't fit the
      bill.


      Andy

      Comment

      • Andy O'Meara

        #63
        Re: 2.6, 3.0, and truly independent intepreters

        On Oct 24, 9:52 pm, "Martin v. Löwis" <mar...@v.loewi s.dewrote:
        A c-level module, on the other hand, can sidestep/release
        the GIL at will, and go on it's merry way and process away.
        >
        ...Unless part of the C module execution involves the need do CPU-
        bound work on another thread through a different python interpreter,
        right?
        >
        Wrong.

        Let's take a step back and remind ourselves of the big picture. The
        goal is to have independent interpreters running in pthreads that the
        app starts and controls. Each interpreter never at any point is doing
        any thread-related stuff in any way. For example, each script job
        just does meat an potatoes CPU work, using callbacks that, say,
        programatically use OS APIs to edit and transform frame data.

        So I think the disconnect here is that maybe you're envisioning
        threads being created *in* python. To be clear, we're talking out
        making threads at the app level and making it a given for the app to
        take its safety in its own hands.


        >
        As far as I can tell, it seems
        CPython's current state can't CPU bound parallelization in the same
        address space.
        >
        That's not true.
        >
        Well, when you're talking about large, intricate data structures
        (which include opaque OS object refs that use process-associated
        allocators), even a shared memory region between the child process and
        the parent can't do the job. Otherwise, please describe in detail how
        I'd get an opaque OS object (e.g. an OS ref that refers to memory-
        resident video) from the child process back to the parent process.

        Again, the big picture that I'm trying to plant here is that there
        really is a serious need for truly independent interpreters/contexts
        in a shared address space. Consider stuff like libpng, zlib, ipgjpg,
        or whatever, the use pattern is always the same: make a context
        object, do your work in the context, and take it down. For most
        industry-caliber packages, the expectation and convention (unless
        documented otherwise) is that the app can make as many contexts as its
        wants in whatever threads it wants because the convention is that the
        app is must (a) never use one context's objects in another context,
        and (b) never use a context at the same time from more than one
        thread. That's all I'm really trying to look at here.


        Andy




        Comment

        • Andy O'Meara

          #64
          Re: 2.6, 3.0, and truly independent intepreters


          And in the case of hundreds of megs of data
          >
          ... and I would be surprised at someone that would embed hundreds of
          megs of data into an object such that it had to be serialized... seems
          like the proper design is to point at the data, or a subset of it, in a
          big buffer.  Then data transfers would just transfer the offset/length
          and the reference to the buffer.
          >
          and/or thousands of data structure instances,
          >
          ... and this is another surprise!  You have thousands of objects (data
          structure instances) to move from one thread to another?
          >
          I think we miscommunicated there--I'm actually agreeing with you. I
          was trying to make the same point you were: that intricate and/or
          large structures are meant to be passed around by a top-level pointer,
          not using and serialization/messaging. This is what I've been trying
          to explain to others here; that IPC and shared memory unfortunately
          aren't viable options, leaving app threads (rather than child
          processes) as the solution.

          Of course, I know that data get large, but typical multimedia streams
          are large, binary blobs.  I was under the impression that processing
          them usually proceeds along the lines of keeping offsets into the blobs,
          and interpreting, etc.  Editing is usually done by making a copy of a
          blob, transforming it or a subset in some manner during the copy
          process, resulting in a new, possibly different-sized blob.

          Your instincts are right. I'd only add on that when you're talking
          about data structures associated with an intricate video format, the
          complexity and depth of the data structures is insane -- the LAST
          thing you want to burn cycles on is serializing and unserializing that
          stuff (so IPC is out)--again, we're already on the same page here.

          I think at one point you made the comment that shared memory is a
          solution to handle large data sets between a child process and the
          parent. Although this is certainty true in principle, it doesn't hold
          up in practice since complex data structures often contain 3rd party
          and OS API objects that have their own allocators. For example, in
          video encoding, there's TONS of objects that comprise memory-resident
          video from all kinds of APIs, so the idea of having them allocated
          from shared/mapped memory block isn't even possible. Again, I only
          raise this to offer evidence that doing real-world work in a child
          process is a deal breaker--a shared address space is just way too much
          to give up.


          Andy

          Comment

          • James Mills

            #65
            Re: 2.6, 3.0, and truly independent intepreters

            On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara <andy55@gmail.c omwrote:
            I think we miscommunicated there--I'm actually agreeing with you. I
            was trying to make the same point you were: that intricate and/or
            large structures are meant to be passed around by a top-level pointer,
            not using and serialization/messaging. This is what I've been trying
            to explain to others here; that IPC and shared memory unfortunately
            aren't viable options, leaving app threads (rather than child
            processes) as the solution.
            Andy,

            Why don't you just use a temporary file
            system (ram disk) to store the data that
            your app is manipulating. All you need to
            pass around then is a file descriptor.

            --JamesMills

            --
            --
            -- "Problems are solved by method"

            Comment

            • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

              #66
              Re: 2.6, 3.0, and truly independent intepreters

              Andy O'Meara wrote:
              On Oct 24, 9:52 pm, "Martin v. Löwis" <mar...@v.loewi s.dewrote:
              >>>A c-level module, on the other hand, can sidestep/release
              >>>the GIL at will, and go on it's merry way and process away.
              >>...Unless part of the C module execution involves the need do CPU-
              >>bound work on another thread through a different python interpreter,
              >>right?
              >Wrong.
              [...]
              >
              So I think the disconnect here is that maybe you're envisioning
              threads being created *in* python. To be clear, we're talking out
              making threads at the app level and making it a given for the app to
              take its safety in its own hands.
              No. Whether or not threads are created by Python or the application
              does not matter for my "Wrong" evaluation: in either case, C module
              execution can easily side-step/release the GIL.
              >>As far as I can tell, it seems
              >>CPython's current state can't CPU bound parallelization in the same
              >>address space.
              >That's not true.
              >>
              >
              Well, when you're talking about large, intricate data structures
              (which include opaque OS object refs that use process-associated
              allocators), even a shared memory region between the child process and
              the parent can't do the job. Otherwise, please describe in detail how
              I'd get an opaque OS object (e.g. an OS ref that refers to memory-
              resident video) from the child process back to the parent process.
              WHAT PARENT PROCESS? "In the same address space", to me, means
              "a single process only, not multiple processes, and no parent process
              anywhere". If you have just multiple threads, the notion of passing
              data from a "child process" back to the "parent process" is
              meaningless.
              Again, the big picture that I'm trying to plant here is that there
              really is a serious need for truly independent interpreters/contexts
              in a shared address space.
              I understand that this is your mission in this thread. However, why
              is that your problem? Why can't you just use the existing (limited)
              multiple-interpreters machinery, and solve your problems with that?
              For most
              industry-caliber packages, the expectation and convention (unless
              documented otherwise) is that the app can make as many contexts as its
              wants in whatever threads it wants because the convention is that the
              app is must (a) never use one context's objects in another context,
              and (b) never use a context at the same time from more than one
              thread. That's all I'm really trying to look at here.
              And that's indeed the case for Python, too. The app can make as many
              subinterpreters as it wants to, and it must not pass objects from one
              subinterpreter to another one, nor should it use a single interpreter
              from more than one thread (although that is actually supported by
              Python - but it surely won't hurt if you restrict yourself to a single
              thread per interpreter).

              Regards,
              Martin

              Comment

              • Rhamphoryncus

                #67
                Re: 2.6, 3.0, and truly independent intepreters

                On Oct 26, 6:57 pm, "Andy O'Meara" <and...@gmail.c omwrote:
                Grrr... I posted a ton of lengthy replies to you and other recent
                posts here using Google and none of them made it, argh. Poof. There's
                nothing that fires more up more than lost work,  so I'll have to
                revert short and simple answers for the time being.  Argh, damn.
                >
                On Oct 25, 1:26 am, greg <g...@cosc.cant erbury.ac.nzwro te:
                >
                >
                >
                Andy O'Meara wrote:
                I would definitely agree if there was a context (i.e. environment)
                object passed around then perhaps we'd have the best of all worlds.
                >
                Moreover, I think this is probably the *only* way that
                totally independent interpreters could be realized.
                >
                Converting the whole C API to use this strategy would be
                a very big project. Also, on the face of it, it seems like
                it would render all existing C extension code obsolete,
                although it might be possible to do something clever with
                macros to create a compatibility layer.
                >
                Another thing to consider is that passing all these extra
                pointers around everywhere is bound to have some effect
                on performance.
                >
                I'm with you on all counts, so no disagreement there.  On the "passing
                a ptr everywhere" issue, perhaps one idea is that all objects could
                have an additionalfield that would point back to their parent context
                (ie. their interpreter).  So the only prototypes that would have to be
                modified to contain the context ptr would be the ones that don't
                inherently operate on objects (e.g. importing a module).
                Trying to directly share objects like this is going to create
                contention. The refcounting becomes the sequential portion of
                Amdahl's Law. This is why safethread doesn't scale very well: I share
                a massive amount of objects.

                An alternative, actually simpler, is to create proxies to your real
                object. The proxy object has a pointer to the real object and the
                context containing it. When you call a method it serializes the
                arguments, acquires the target context's GIL (while releasing yours),
                and deserializes in the target context. Once the method returns it
                reverses the process.

                There's two reasons why this may perform well for you: First,
                operations done purely in C may cheat (if so designed). A copy from
                one memory buffer to another memory buffer may be given two proxies as
                arguments, but then operate directly on the target objects (ie without
                serialization).

                Second, if a target context is idle you can enter it (acquiring its
                GIL) without any context switch.

                Of course that scenario is full of "maybes", which is why I have
                little interest in it..

                An even better scenario is if your memory buffer's methods are in pure
                C and it's a simple object (no pointers). You can stick the memory
                buffer in shared memory and have multiple processes manipulate it from
                C. More "maybes".

                An evil trick if you need pointers, but control the allocation, is to
                take advantage of the fork model. Have a master process create a
                bunch of blank files (temp files if linux doesn't allow /dev/zero),
                mmap them all using MAP_SHARED, then fork and utilize. The addresses
                will be inherited from the master process, so any pointers within them
                will be usable across all processes. If you ever want to return
                memory to the system you can close that file, then have all processes
                use MAP_SHARED|MAP_ FIXED to overwrite it. Evil, but should be
                disturbingly effective, and still doesn't require modifying CPython.

                Comment

                • Michael Sparks

                  #68
                  Re: 2.6, 3.0, and truly independent intepreters

                  Glenn Linderman wrote:
                  so a 3rd party library might be called to decompress the stream into a
                  set of independently allocated chunks, each containing one frame (each
                  possibly consisting of several allocations of memory for associated
                  metadata) that is independent of other frames
                  We use a combination of a dictionary + RGB data for this purpose. Using a
                  dictionary works out pretty nicely for the metadata, and obviously one
                  attribute holds the frame data as a binary blob.

                  http://www.kamaelia.org/Components/p...Codec.YUV4MPEG gives some
                  idea structure and usage. The example given there is this:

                  Pipeline( RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
                  DiracDecoder(),
                  FrameToYUV4MPEG (),
                  SimpleFileWrite r("output.yuv4m peg")
                  ).run()

                  Now all of those components are generator components.

                  That's useful since:
                  a) we can structure the code to show what it does more clearly, and it
                  still run efficiently inside a single process
                  b) We can change this over to using multiple processes trivially:

                  ProcessPipeline (
                  RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
                  DiracDecoder(),
                  FrameToYUV4MPEG (),
                  SimpleFileWrite r("output.yuv4m peg")
                  ).run()

                  This version uses multiple processes (under the hood using Paul Boddies
                  pprocess library, since this support predates the multiprocessing module
                  support in python).

                  The big issue with *this* version however is that due to pprocess (and
                  friends) pickling data to be sent across OS pipes, the data throughput on
                  this would be lowsy. Specifically in this example, if we could change it
                  such that the high level API was this:

                  ProcessPipeline (
                  RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
                  DiracDecoder(),
                  FrameToYUV4MPEG (),
                  SimpleFileWrite r("output.yuv4m peg")
                  use_shared_memo ry_IPC = True,
                  ).run()

                  That would be pretty useful, for some hopefully obvious reasons. I suppose
                  ideally we'd just use shared_memory_I PC for everything and just go back to
                  this:

                  ProcessPipeline (
                  RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
                  DiracDecoder(),
                  FrameToYUV4MPEG (),
                  SimpleFileWrite r("output.yuv4m peg")
                  ).run()

                  But essentially for us, this is an optimisation problem, not a "how do I
                  even begin to use this" problem. Since it is an optimisation problem, it
                  also strikes me as reasonable to consider it OK to special purpose and
                  specialise such links until you get an approach that's reasonable for
                  general purpose data.

                  In theory, poshmodule.sour ceforge.net, with a bit of TLC would be a good
                  candidate or good candidate starting point for that optimisation work
                  (since it does work in Linux, contrary to a reply in the thread - I've not
                  tested it under windows :).

                  If someone's interested in building that, then someone redoing our MiniAxon
                  tutorial using processes & shared memory IPC rather than generators would
                  be a relatively gentle/structured approach to dealing with this:

                  * http://www.kamaelia.org/MiniAxon/

                  The reason I suggest that is because any time we think about fiddling and
                  creating a new optimisation approach or concurrency approach, we tend to
                  build a MiniAxon prototype to flesh out the various issues involved.


                  Michael
                  --


                  Comment

                  • Michael Sparks

                    #69
                    Re: 2.6, 3.0, and truly independent intepreters

                    Philip Semanchuk wrote:
                    On Oct 25, 2008, at 7:53 AM, Michael Sparks wrote:
                    >Glenn Linderman wrote:
                    >>In the module multiprocessing environment could you not use shared
                    >>memory, then, for the large shared data items?
                    >>
                    >If the poshmodule had a bit of TLC, it would be extremely useful for
                    >this,... http://poshmodule.sourceforge.net/
                    >
                    Last time I checked that was Windows-only. Has that changed?
                    I've only tested it under Linux where it worked, but does clearly need a bit
                    of work :)
                    The only IPC modules for Unix that I'm aware of are one which I
                    adopted (for System V semaphores & shared memory) and one which I
                    wrote (for POSIX semaphores & shared memory).
                    >

                    http://semanchuk.com/philip/posix_ipc/
                    I'll take a look at those - poshmodule does need a bit of TLC and doesn't
                    appear to be maintained.
                    If anyone wants to wrap POSH cleverness around them, go for it! If
                    not, maybe I'll make the time someday.
                    I personally don't have the time do do this, but I'd be very interested in
                    hearing someone building an up-to-date version. (Indeed, something like
                    this would be extremely useful for everyone to have in the standard library
                    now that the multiprocessing library is in the standard library)


                    Michael.
                    --


                    Comment

                    • Andy O'Meara

                      #70
                      Re: 2.6, 3.0, and truly independent intepreters

                      On Oct 26, 10:11 pm, "James Mills" <prolo...@short circuit.net.au>
                      wrote:
                      On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara <and...@gmail.c omwrote:
                      I think we miscommunicated there--I'm actually agreeing with you.  I
                      was trying to make the same point you were: that intricate and/or
                      large structures are meant to be passed around by a top-level pointer,
                      not using and serialization/messaging.  This is what I've been trying
                      to explain to others here; that IPC and shared memory unfortunately
                      aren't viable options, leaving app threads (rather than child
                      processes) as the solution.
                      >
                      Andy,
                      >
                      Why don't you just use a temporary file
                      system (ram disk) to store the data that
                      your app is manipulating. All you need to
                      pass around then is a file descriptor.
                      >
                      --JamesMills
                      Unfortunately, it's the penalty of serialization and unserialization .
                      When you're talking about stuff like memory-resident images and video
                      (complete with their intricate and complex codecs), then the only
                      option is to be passing around a couple pointers rather then take the
                      hit of serialization (which is huge for video, for example). I've
                      gone into more detail in some other posts but I could have missed
                      something.


                      Andy



                      Comment

                      • Andy O'Meara

                        #71
                        Re: 2.6, 3.0, and truly independent intepreters

                        On Oct 27, 4:05 am, "Martin v. Löwis" <mar...@v.loewi s.dewrote:
                        Andy O'Meara wrote:
                        >
                        Well, when you're talking about large, intricate data structures
                        (which include opaque OS object refs that use process-associated
                        allocators), even a shared memory region between the child process and
                        the parent can't do the job.  Otherwise, please describe in detail how
                        I'd get an opaque OS object (e.g. an OS ref that refers to memory-
                        resident video) from the child process back to the parent process.
                        >
                        WHAT PARENT PROCESS? "In the same address space", to me, means
                        "a single process only, not multiple processes, and no parent process
                        anywhere". If you have just multiple threads, the notion of passing
                        data from a "child process" back to the "parent process" is
                        meaningless.
                        I know... I was just responding to you and others here keep beating
                        the "fork" drum. I just trying make it clear that a shared address
                        space is the only way to go. Ok, good, so we're in agreement that
                        threads is the only way to deal with the "intricate and complex" data
                        set issue in a performance-centric application.
                        >
                        Again, the big picture that I'm trying to plant here is that there
                        really is a serious need for truly independent interpreters/contexts
                        in a shared address space.
                        >
                        I understand that this is your mission in this thread. However, why
                        is that your problem? Why can't you just use the existing (limited)
                        multiple-interpreters machinery, and solve your problems with that?
                        Because then we're back into the GIL not permitting threads efficient
                        core use on CPU bound scripts running on other threads (when they
                        otherwise could). Just so we're on the same page, "when they
                        otherwise could" is relevant here because that's the important given:
                        that each interpreter ("context") truly never has any context with
                        others.

                        An example would be python scripts that generate video programatically
                        using an initial set of params and use an in-house C module to
                        construct frame (which in turn make and modify python C objects that
                        wrap to intricate codec related data structures). Suppose you wanted
                        to render 3 of these at the same time, one on each thread (3
                        threads). With the GIL in place, these threads can't anywhere close
                        to their potential. Your response thus far is that the C module
                        should release the GIL before it commences its heavy lifting. Well,
                        the problem is that if during its heavy lifting it needs to call back
                        into its interpreter. It's turns out that this isn't an exotic case
                        at all: there's a *ton* of utility gained by making calls back into
                        the interpreter. The best example is that since code more easily
                        maintained in python than in C, a lot of the module "utility" code is
                        likely to be in python. Unsurprisingly, this is the situation myself
                        and many others are in: where we want to subsequently use the
                        interpreter within the C module (so, as I understand it, the proposal
                        to have the C module release the GIL unfortunately doesn't work as a
                        general solution).
                        >
                        For most
                        industry-caliber packages, the expectation and convention (unless
                        documented otherwise) is that the app can make as many contexts as its
                        wants in whatever threads it wants because the convention is that the
                        app is must (a) never use one context's objects in another context,
                        and (b) never use a context at the same time from more than one
                        thread.  That's all I'm really trying to look at here.
                        >
                        And that's indeed the case for Python, too. The app can make as many
                        subinterpreters as it wants to, and it must not pass objects from one
                        subinterpreter to another one, nor should it use a single interpreter
                        from more than one thread (although that is actually supported by
                        Python - but it surely won't hurt if you restrict yourself to a single
                        thread per interpreter).
                        >
                        I'm not following you there... I thought we're all in agreement that
                        the existing C modules are FAR from being reentrant, regularly making
                        use of static/global objects. The point I had made before is that
                        other industry-caliber packages specifically don't have restrictions
                        in *any* way.

                        I appreciate your arguments these a PyC concept is a lot of work with
                        some careful design work, but let's not kill the discussion just
                        because of that. The fact remains that the video encoding scenario
                        described above is a pretty reasonable situation, and as more people
                        are commenting in this thread, there's an increasing need to offer
                        apps more flexibility when it comes to multi-threaded use.


                        Andy




                        Comment

                        • Greg Ewing

                          #72
                          Re: 2.6, 3.0, and truly independent intepreters

                          Glenn Linderman wrote:
                          So your 50% number is just a scare tactic, it would seem, based on wild
                          guesses. Was there really any benefit to the comment?
                          All I was really trying to say is that it would be a
                          mistake to assume that the overhead will be negligible,
                          as that would be just as much a wild guess as 50%.

                          --
                          Greg

                          Comment

                          • Andy O'Meara

                            #73
                            Re: 2.6, 3.0, and truly independent intepreters

                            On Oct 25, 9:46 am, "M.-A. Lemburg" <m...@egenix.co mwrote:
                            These discussion pop up every year or so and I think that most of them
                            are not really all that necessary, since the GIL isn't all that bad.
                            >
                            Thing is, if the topic keeps coming up, then that may be an indicator
                            that change is truly needed. Someone much wiser than me once shared
                            that a measure of the usefulness and quality of a package (or API) is
                            how easily it can be added to an application--of any flavors--without
                            the application needing to change.

                            So in the rising world of idle cores and worker threads, I do see an
                            increasing concern over the GIL. Although I recognize that the debate
                            is lengthy, heated, and has strong arguments on both sides, my reading
                            on the issue makes me feel like there's a bias for the pro-GIL side
                            because of the volume of design and coding work associated with
                            considering various alternatives (such as Glenn's "Py*" concepts).
                            And I DO respect and appreciate where the pro-GIL people come from:
                            who the heck wants to do all that work and recoding so that a tiny
                            percent of developers can benefit? And my best response is that as
                            unfortunate as it is, python needs to be more multi-threaded app-
                            friendly if we hope to attract the next generation of app developers
                            that want to just drop python into their app (and not have to change
                            their app around python). For example, Lua has that property, as
                            evidenced by its rapidly growing presence in commercial software
                            (Blizzard uses it heavily, for example).
                            >
                            Furthermore, there are lots of ways to tune the CPython VM to make
                            it more or less responsive to thread switches via the various sys.set*()
                            functions in the sys module.
                            >
                            Most computing or I/O intense C extensions, built-in modules and object
                            implementations already release the GIL for you, so it usually doesn't
                            get in the way all that often.

                            The main issue I take there is that it's often highly useful for C
                            modules to make subsequent calls back into the interpreter. I suppose
                            the response to that is to call the GIL before reentry, but it just
                            seems to be more code and responsibility in scenarios where it's no
                            necessary. Although that code and protocol may come easy to veteran
                            CPython developers, let's not forget that an important goal is to
                            attract new developers and companies to the scene, where they get
                            their thread-independent code up and running using python without any
                            unexpected reengineering. Again, why are companies choosing Lua over
                            Python when it comes to an easy and flexible drop-in interpreter? And
                            please take my points here to be exploratory, and not hostile or
                            accusatory, in nature.


                            Andy


                            Comment

                            • Andy O'Meara

                              #74
                              Re: 2.6, 3.0, and truly independent intepreters

                              On Oct 27, 10:55 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

                              And I think we still are miscommunicatin g!  Or maybe communicating anyway!
                              >
                              So when you said "object", I actually don't know whether you meant
                              Python object or something else.  I assumed Python object, which may not
                              have been correct... but read on, I think the stuff below clears it up.
                              >
                              >
                              Then when you mentioned thousands of objects, I imagined thousands of
                              Python objects, and somehow transforming the blob into same... and back
                              again.  
                              My apologies to you and others here on my use of "objects" -- I'm use
                              the term generically and mean it to *not* refer to python objects (for
                              the all the reasons discussed here). Python only makes up a small
                              part of our app, hence my habit of "objects" to refer to other APIs'
                              allocated and opaque objects (including our own and OS APIs). For all
                              the reasons we've discussed, in our world, python objects don't travel
                              around outside of our python C modules -- when python objects need to
                              be passed to other parts of the app, they're converted into their non-
                              python (portable) equivalents (ints, floats, buffers, etc--but most of
                              the time, the objects are PyCObjects, so they can enter and leave a
                              python context with negligible overhead). I venture to say this is
                              pretty standard when any industry app uses a package (such as python),
                              for various reasons:
                              - Portability/Future (e.g. if we do decode to drop Python and go
                              with Lua, the changes are limited to only one region of code).
                              - Sanity (having any API's objects show up in places "far away"
                              goes against easy-to-follow code).
                              - MT flexibility (because we always never use static/global
                              storage, we have all kinds of options when it comes to
                              multithreading) . For example, recall that by throwing python in
                              multiple dynamic libs, we were able to achieve the GIL-less
                              interpreter independence that we want (albeit ghetto and a pain).



                              Andy



                              Comment

                              • Rhamphoryncus

                                #75
                                Re: 2.6, 3.0, and truly independent intepreters

                                On Oct 28, 9:30 am, "Andy O'Meara" <and...@gmail.c omwrote:
                                On Oct 25, 9:46 am, "M.-A. Lemburg" <m...@egenix.co mwrote:
                                >
                                These discussion pop up every year or so and I think that most of them
                                are not really all that necessary, since the GIL isn't all that bad.
                                >
                                Thing is, if the topic keeps coming up, then that may be an indicator
                                that change is truly needed.  Someone much wiser than me once shared
                                that a measure of the usefulness and quality of a package (or API) is
                                how easily it can be added to an application--of any flavors--without
                                the application needing to change.
                                >
                                So in the rising world of idle cores and worker threads, I do see an
                                increasing concern over the GIL.  Although I recognize that the debate
                                is lengthy, heated, and has strong arguments on both sides, my reading
                                on the issue makes me feel like there's a bias for the pro-GIL side
                                because of the volume of design and coding work associated with
                                considering various alternatives (such as Glenn's "Py*" concepts).
                                And I DO respect and appreciate where the pro-GIL people come from:
                                who the heck wants to do all that work and recoding so that a tiny
                                percent of developers can benefit?  And my best response is that as
                                unfortunate as it is, python needs to be more multi-threaded app-
                                friendly if we hope to attract the next generation of app developers
                                that want to just drop python into their app (and not have to change
                                their app around python).  For example, Lua has that property, as
                                evidenced by its rapidly growing presence in commercial software
                                (Blizzard uses it heavily, for example).
                                >
                                >
                                >
                                Furthermore, there are lots of ways to tune the CPython VM to make
                                it more or less responsive to thread switches via the various sys.set*()
                                functions in the sys module.
                                >
                                Most computing or I/O intense C extensions, built-in modules and object
                                implementations already release the GIL for you, so it usually doesn't
                                get in the way all that often.
                                >
                                The main issue I take there is that it's often highly useful for C
                                modules to make subsequent calls back into the interpreter. I suppose
                                the response to that is to call the GIL before reentry, but it just
                                seems to be more code and responsibility in scenarios where it's no
                                necessary.  Although that code and protocol may come easy to veteran
                                CPython developers, let's not forget that an important goal is to
                                attract new developers and companies to the scene, where they get
                                their thread-independent code up and running using python without any
                                unexpected reengineering.  Again, why are companies choosing Lua over
                                Python when it comes to an easy and flexible drop-in interpreter?  And
                                please take my points here to be exploratory, and not hostile or
                                accusatory, in nature.
                                >
                                Andy
                                Okay, here's the bottom line:
                                * This is not about the GIL. This is about *completely* isolated
                                interpreters; most of the time when we want to remove the GIL we want
                                a single interpreter with lots of shared data.
                                * Your use case, although not common, is not extraordinarily rare
                                either. It'd be nice to support.
                                * If CPython had supported it all along we would continue to maintain
                                it.
                                * However, since it's not supported today, it's not worth the time
                                invested, API incompatibility , and general breakage it would imply.
                                * Although it's far more work than just solving your problem, if I
                                were to remove the GIL I'd go all the way and allow shared objects.

                                So there's really only two options here:
                                * get a short-term bodge that works, like hacking the 3rd party
                                library to use your shared-memory allocator. Should be far less work
                                than hacking all of CPython.
                                * invest yourself in solving the *entire* problem (GIL removal with
                                shared python objects).

                                Comment

                                Working...