2.6, 3.0, and truly independent intepreters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

    #91
    Re: 2.6, 3.0, and truly independent intepreters

    >Why do you think so? For C code that is carefully written, the GIL
    >allows *very well* to write CPU bound scripts running on other threads.
    >(please do get back to Jesse's original remark in case you have lost
    >the thread :-)
    >>
    >
    I don't follow you there. If you're referring to multiprocessing
    No, I'm not. I refer to regular, plain, multi-threading.
    >>It's turns out that this isn't an exotic case
    >>at all: there's a *ton* of utility gained by making calls back into
    >>the interpreter. The best example is that since code more easily
    >>maintained in python than in C, a lot of the module "utility" code is
    >>likely to be in python.
    >You should really reconsider writing performance-critical code in
    >Python.
    >
    I don't follow you there... Performance-critical code in Python??
    I probably expressed myself incorrectly (being not a native speaker
    of English): If you were writing performance-critical in Python,
    you should reconsider (i.e. you should rewrite it in C).

    It's not clear whether this calling back into Python is in the
    performance-critical path. If it is, then reconsider.
    I tried to list some abbreviated examples in other posts, but here's
    some elaboration:
    >
    - Pixel-level effects and filters, where some filters may use C procs
    while others may call back into the interpreter to execute logic --
    while some do both, multiple times.
    Ok. For a plain C proc, release the GIL before the proc, and reacquire
    it afterwards. For a proc that calls into the interpreter:
    a) if it is performance-critical, reconsider writing it in C, or
    reformulate so that it stops being performance critical (e.g.
    through caching)
    b) else, reacquire the GIL before calling back into Python, then
    release the GIL before continuing the proc
    - Image and video analysis/recognition where there's TONS of intricate
    data structures and logic. Those data structures and logic are
    easiest to develop and maintain in python, but you'll often want to
    call back to C procs which will, in turn, want to access Python (as
    well as C-level) data structures.
    Not sure what the processing is, or what processing you need to do.
    The data structures themselves are surely not performance critical
    (not being algorithms). If you really run Python algorithms on these
    structures, then my approach won't help you (except for the general
    recommendation to find some expensive sub-algorithm and rewrite that
    in C, so that it both becomes faster and can release the GIL).
    It's just not practical to be
    locking and locking the GIL when you want to operate on python data
    structures or call back into python.
    This I don't understand. I find that fairly easy to do.
    You seem to have placed the burden of proof on my shoulders for an app
    to deserve the ability to free-thread when using 3rd party packages,
    so how about we just agree it's not an unreasonable desire for a
    package (such as python) to support it and move on with the
    discussion.
    Not at all - I don't want a proof. I just want agreement on Jesse
    Noller's claim

    # A c-level module, on the other hand, can sidestep/release
    # the GIL at will, and go on it's merry way and process away.
    >If neither is likely to result, killing the discussion is the most
    >productive thing we can do.
    >>
    >
    Well, most others here seem to have a lot different definition of what
    qualifies as a "futile" discussion, so how about you allow the rest of
    us continue to discuss these issues and possible solutions. And, for
    the record, I've said multiple times I'm ready to contribute
    monetarily, professionally, and personally, so if that doesn't qualify
    as the precursor to "code contributions from one of the participants"
    then I don't know WHAT does.
    Ok, I apologize for having misunderstood you here.

    Regards,
    Martin

    Comment

    • Patrick Stinson

      #92
      Re: 2.6, 3.0, and truly independent intepreters

      On Wed, Oct 29, 2008 at 4:05 PM, Glenn Linderman <v+python@g.nev cal.comwrote:
      On approximately 10/29/2008 3:45 PM, came the following characters from the
      keyboard of Patrick Stinson:
      >>
      >If you are dealing with "lots" of data like in video or sound editing,
      >you would just keep the data in shared memory and send the reference
      >over IPC to the worker process. Otherwise, if you marshal and send you
      >are looking at a temporary doubling of the memory footprint of your
      >app because the data will be copied, and marshaling overhead.
      >
      Right. Sounds, and is, easy, if the data is all directly allocated by the
      application. But when pieces are allocated by 3rd party libraries, that use
      the C-runtime allocator directly, then it becomes more difficult to keep
      everything in shared memory.
      good point.
      >
      One _could_ replace the C-runtime allocator, I suppose, but that could have
      some adverse effects on other code, that doesn't need its data to be in
      shared memory. So it is somewhat between a rock and a hard place.
      ewww scary. mousetraps for sale?
      >
      By avoiding shared memory, such problems are sidestepped... until you run
      smack into the GIL.
      >
      --
      Glenn -- http://nevcal.com/
      =============== ============
      A protocol is complete when there is nothing left to remove.
      -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
      >
      >

      Comment

      • Glenn Linderman

        #93
        Re: 2.6, 3.0, and truly independent intepreters

        On approximately 10/30/2008 6:26 AM, came the following characters from
        the keyboard of Jesse Noller:
        On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman <v+python@g.nev cal.comwrote:
        >
        >On approximately 10/29/2008 3:45 PM, came the following characters from the
        >keyboard of Patrick Stinson:
        >>
        >>If you are dealing with "lots" of data like in video or sound editing,
        >>you would just keep the data in shared memory and send the reference
        >>over IPC to the worker process. Otherwise, if you marshal and send you
        >>are looking at a temporary doubling of the memory footprint of your
        >>app because the data will be copied, and marshaling overhead.
        >>>
        >Right. Sounds, and is, easy, if the data is all directly allocated by the
        >application. But when pieces are allocated by 3rd party libraries, that use
        >the C-runtime allocator directly, then it becomes more difficult to keep
        >everything in shared memory.
        >>
        >One _could_ replace the C-runtime allocator, I suppose, but that could have
        >some adverse effects on other code, that doesn't need its data to be in
        >shared memory. So it is somewhat between a rock and a hard place.
        >>
        >By avoiding shared memory, such problems are sidestepped... until you run
        >smack into the GIL.
        >>
        >
        If you do not have shared memory: You don't need threads, ergo: You
        don't get penalized by the GIL. Threads are only useful when you need
        to have that requirement of large in-memory data structures shared and
        modified by a pool of workers.
        The whole point of this thread is to talk about large in-memory data
        structures that are shared and modified by a pool of workers.

        My reference to shared memory was specifically referring to the concept
        of sharing memory between processes... a particular OS feature that is
        called shared memory.

        The need for sharing memory among a pool of workers is still the
        premise. Threads do that automatically, without the need for the OS
        shared memory feature, that brings with it the need for a special
        allocator to allocate memory in the shared memory area vs the rest of
        the address space.

        Not to pick on you, particularly, Jesse, but this particular response
        made me finally understand why there has been so much repetition of the
        same issues and positions over and over and over in this thread: instead
        of comprehending the whole issue, people are responding to small
        fragments of it, with opinions that may be perfectly reasonable for that
        fragment, but missing the big picture, or the explanation made when the
        same issue was raised in a different sub-thread.

        --
        Glenn -- http://nevcal.com/
        =============== ============
        A protocol is complete when there is nothing left to remove.
        -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

        Comment

        • Patrick Stinson

          #94
          Re: 2.6, 3.0, and truly independent intepreters

          Speaking of the big picture, is this how it normally works when
          someone says "Here's some code and a problem and I'm willing to pay
          for a solution?" I've never really walked that path with a project of
          this complexity (I guess it's the backwards-compatibility that makes
          it confusing), but is this problem just too complex so we have to keep
          talking and talking on forum after forum? Afraid to fork? I know I am.
          How many people are qualified to tackle Andy's problem? Are all of
          them busy or uninterested? Is the current code in a tight spot where
          it just can't be fixed without really jabbing that FORK in so deep
          that the patch will die when your project does?

          Personally I think this problem is super-awesome on the hobbyest's fun
          scale. I'd totally take the time to let my patch do the talking but I
          haven't read enough of the (2.5) code. So, I resort to simply reading
          the newsgroups and python code to better understand the mechanics
          problem :(

          On Thu, Oct 30, 2008 at 2:54 PM, Glenn Linderman <v+python@g.nev cal.comwrote:
          On approximately 10/30/2008 6:26 AM, came the following characters from the
          keyboard of Jesse Noller:
          >>
          >On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman <v+python@g.nev cal.com>
          >wrote:
          >>
          >>>
          >>On approximately 10/29/2008 3:45 PM, came the following characters from
          >>the
          >>keyboard of Patrick Stinson:
          >>>
          >>>>
          >>>If you are dealing with "lots" of data like in video or sound editing,
          >>>you would just keep the data in shared memory and send the reference
          >>>over IPC to the worker process. Otherwise, if you marshal and send you
          >>>are looking at a temporary doubling of the memory footprint of your
          >>>app because the data will be copied, and marshaling overhead.
          >>>>
          >>>
          >>Right. Sounds, and is, easy, if the data is all directly allocated by
          >>the
          >>application . But when pieces are allocated by 3rd party libraries, that
          >>use
          >>the C-runtime allocator directly, then it becomes more difficult to keep
          >>everything in shared memory.
          >>>
          >>One _could_ replace the C-runtime allocator, I suppose, but that could
          >>have
          >>some adverse effects on other code, that doesn't need its data to be in
          >>shared memory. So it is somewhat between a rock and a hard place.
          >>>
          >>By avoiding shared memory, such problems are sidestepped... until you run
          >>smack into the GIL.
          >>>
          >>
          >If you do not have shared memory: You don't need threads, ergo: You
          >don't get penalized by the GIL. Threads are only useful when you need
          >to have that requirement of large in-memory data structures shared and
          >modified by a pool of workers.
          >
          The whole point of this thread is to talk about large in-memory data
          structures that are shared and modified by a pool of workers.
          >
          My reference to shared memory was specifically referring to the concept of
          sharing memory between processes... a particular OS feature that is called
          shared memory.
          >
          The need for sharing memory among a pool of workers is still the premise.
          Threads do that automatically, without the need for the OS shared memory
          feature, that brings with it the need for a special allocator to allocate
          memory in the shared memory area vs the rest of the address space.
          >
          Not to pick on you, particularly, Jesse, but this particular response made
          me finally understand why there has been so much repetition of the same
          issues and positions over and over and over in this thread: instead of
          comprehending the whole issue, people are responding to small fragments of
          it, with opinions that may be perfectly reasonable for that fragment, but
          missing the big picture, or the explanation made when the same issue was
          raised in a different sub-thread.
          >
          --
          Glenn -- http://nevcal.com/
          =============== ============
          A protocol is complete when there is nothing left to remove.
          -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
          >
          >

          Comment

          • Rhamphoryncus

            #95
            Re: 2.6, 3.0, and truly independent intepreters

            On Oct 30, 8:23 pm, "Patrick Stinson" <patrickstinson .li...@gmail.co m>
            wrote:
            Speaking of the big picture, is this how it normally works when
            someone says "Here's some code and a problem and I'm willing to pay
            for a solution?" I've never really walked that path with a project of
            this complexity (I guess it's the backwards-compatibility that makes
            it confusing), but is this problem just too complex so we have to keep
            talking and talking on forum after forum? Afraid to fork? I know I am.
            How many people are qualified to tackle Andy's problem? Are all of
            them busy or uninterested? Is the current code in a tight spot where
            it just can't be fixed without really jabbing that FORK in so deep
            that the patch will die when your project does?
            >
            Personally I think this problem is super-awesome on the hobbyest's fun
            scale. I'd totally take the time to let my patch do the talking but I
            haven't read enough of the (2.5) code. So, I resort to simply reading
            the newsgroups and python code to better understand the mechanics
            problem :(
            The scale of this issue is why so little progress gets made, yes. I
            intend to solve it regardless of getting paid (and have been working
            on various aspects for quite a while now), but as you can see from
            this thread it's very difficult to convince anybody that my approach
            is the *right* approach.

            Comment

            • alex23

              #96
              Re: 2.6, 3.0, and truly independent intepreters

              On Oct 31, 2:05 am, "Andy O'Meara" <and...@gmail.c omwrote:
              I don't follow you there.  If you're referring to multiprocessing , our
              concerns are:
              >
              - Maturity (am I willing to tell my partners and employees that I'm
              betting our future on a brand-new module that imposes significant
              restrictions as to how our app operates?)
              - Liability (am I ready to invest our resources into lots of new
              python module-specific code to find out that a platform that we want
              to target isn't supported or has problems?).  Like it not, we're a
              company and we have to show sensitivity about new or fringe packages
              that make our codebase less agile -- C/C++ continues to win the day in
              that department.
              I don't follow this...wouldn't both of these concerns be even more
              true for modifying the CPython interpreter to provide the
              functionality you want?

              Comment

              • greg

                #97
                Re: 2.6, 3.0, and truly independent intepreters

                Patrick Stinson wrote:
                Speaking of the big picture, is this how it normally works when
                someone says "Here's some code and a problem and I'm willing to pay
                for a solution?"
                In an open-source volunteer context, time is generally more
                valuable than money. Most people can't just drop part of
                their regular employment temporarily, so unless there's
                quite a *lot* of money being offered (enough to offer someone
                full-time employment, for example) it doesn't necessarily
                make any more man-hours available.

                --
                Greg

                Comment

                • lkcl

                  #98
                  Re: 2.6, 3.0, and truly independent intepreters

                  On Oct 30, 6:39 pm, Terry Reedy <tjre...@udel.e duwrote:
                  Their professor is Lars Bak, the lead architect of the Google V8Javascripteng ine. They spent some time working on V8 in the last couple
                  months.
                  then they will be at home with pyv8 - which is a combination of the
                  pyjamas python-to-javascript compiler and google's v8 engine.

                  in pyv8, thanks to v8 (and the judicious application of boost) it's
                  possible to call out to external c-based modules.

                  so not only do you get the benefits of the (much) faster execution
                  speed of v8, along with its garbage collection, but also you still get
                  access to external modules.

                  so... their project's done, already!

                  l.

                  Comment

                  • sturlamolden

                    #99
                    Re: 2.6, 3.0, and truly independent intepreters


                    If you are serious about multicore programming, take a look at:



                    Now if we could make Python do something like that, people would
                    perhaps start to think about writing Python programs for more than one
                    processor.

                    Comment

                    • Andy O'Meara

                      Re: 2.6, 3.0, and truly independent intepreters

                      On Nov 4, 9:38 am, sturlamolden <sturlamol...@y ahoo.nowrote:

                      First let me say that there are several solutions to the "multicore"
                      problem. Multiple independendent interpreters embedded in a process is
                      one possibility, but not the only.''
                      No one is disagrees there. However, motivation of this thread has
                      been to make people here consider that it's much more preferable for
                      CPython have has few restrictions as possible with how it's used. I
                      think many people here assume that python is the showcase item in
                      industrial and commercial use, but it's generally just one of many
                      pieces of machinery that serve the app's function (so "the tail can't
                      wag the dog" when it comes to app design). Some people in this thread
                      have made comments such as "make your app run in python" or "change
                      your app requirements" but in the world of production schedules and
                      making sure payroll is met, those options just can't happen. People
                      in the scientific and academic communities have to understand that the
                      dynamics in commercial software are can be *very* different needs and
                      have to show some open-mindedness there.

                      The multiprocessing package has almost the same API as you would get
                      from your suggestion, the only difference being that multiple
                      processes is involved.
                      As other posts have gone into extensive detail, multiprocessing
                      unfortunately don't handle the massive/complex data structures
                      situation (see my posts regarding real-time video processing). I'm
                      not sure if you've followed all the discussion, but multiple processes
                      is off the table (this is discussed at length, so just flip back into
                      the thread history).


                      Andy


                      Comment

                      • sturlamolden

                        Re: 2.6, 3.0, and truly independent intepreters

                        On Nov 4, 4:27 pm, "Andy O'Meara" <and...@gmail.c omwrote:
                        People
                        in the scientific and academic communities have to understand that the
                        dynamics in commercial software are can be *very* different needs and
                        have to show some open-mindedness there.
                        You are beware that BDFL's employer is a company called Google? Python
                        is not just used in academic settings.

                        Furthermore, I gave you a link to cilk++. This is a simple tool that
                        allows you to parallelize existing C or C++ software using three small
                        keywords. This is the kind of tool I believe would be useful. That is
                        not an academic judgement. It makes it easy to take existing software
                        and make it run efficiently on multicore processors.


                        As other posts have gone into extensive detail, multiprocessing
                        unfortunately don't handle the massive/complex data structures
                        situation (see my posts regarding real-time video processing).  
                        That is something I don't believe. Why can't multiprocessing handle
                        that? Is using a proxy object out of the question? Is putting the
                        complex object in shared memory out of the question? Is having
                        multiple copies of the object out of the question (did you see my kd-
                        tree example)? Using multiple independent interpreters inside a
                        process does not make this any easier. For Christ sake, researchers
                        write global climate models using MPI. And you think a toy problem
                        like 'real-time video processing' is a show stopper for using multiple
                        processes.




                        Comment

                        • Paul Boddie

                          Re: 2.6, 3.0, and truly independent intepreters

                          On 4 Nov, 16:00, sturlamolden <sturlamol...@y ahoo.nowrote:
                          If you are serious about multicore programming, take a look at:
                          >

                          >
                          Now if we could make Python do something like that, people would
                          perhaps start to think about writing Python programs for more than one
                          processor.
                          The language features look a lot like what others have already been
                          offering for a while: keywords for parallelised constructs (clik_for)
                          which are employed by solutions for various languages (C# and various C
                          ++ libraries spring immediately to mind); spawning and synchronisation
                          are typically supported in existing Python solutions, although
                          obviously not using language keywords. The more interesting aspects of
                          the referenced technology seem to be hyperobjects which, as far as I
                          can tell, are shared global objects, along with the way the work
                          actually gets distributed and scheduled - something which would
                          require slashing through the white paper aspects of the referenced
                          site and actually reading the academic papers associated with the
                          work.

                          I've considered doing something like hyperobjects for a while, and
                          this does fit in somewhat with recent discussions about shared memory
                          and managing contention for that resource using the communications
                          channels found in, amongst other solutions, the pprocess module. I
                          currently have no real motivation to implement this myself, however.

                          Paul

                          Comment

                          • Andy O'Meara

                            Re: 2.6, 3.0, and truly independent intepreters

                            On Nov 4, 10:59 am, sturlamolden <sturlamol...@y ahoo.nowrote:
                            On Nov 4, 4:27 pm, "Andy O'Meara" <and...@gmail.c omwrote:
                            >
                            People
                            in the scientific and academic communities have to understand that the
                            dynamics in commercial software are can be *very* different needs and
                            have to show some open-mindedness there.
                            >
                            You are beware that BDFL's employer is a company called Google? Python
                            is not just used in academic settings.
                            Turns out I have heard of Google (and how about you be a little more
                            courteous). If you've read the posts in this thread, you'll note that
                            the needs outlined in this thread are quite different than the needs
                            and interests of Google. Note that my point was that python *could*
                            and *should* be used more in end-user/desktop applications, but it
                            can't "wag the dog" to use my earlier statement.
                            >
                            Furthermore, I gave you a link to cilk++. This is a simple tool that
                            allows you to parallelize existing C or C++ software using three small
                            keywords.
                            Sorry if it wasn't clear, but we need the features associated with an
                            embedded interpreter. I checked out clik++ when you linked it and
                            although it seems pretty cool, it's not a good fit for us for a number
                            of reasons. Also, we like the idea of helping support a FOSS project
                            rather than license a proprietary product (again, to be clear, using
                            cilk isn't even appropriate for our situation).

                            As other posts have gone into extensive detail, multiprocessing
                            unfortunately don't handle the massive/complex data structures
                            situation (see my posts regarding real-time video processing).  
                            >
                            That is something I don't believe. Why can't multiprocessing handle
                            that?
                            In a few earlier posts, I went into details what's meant there:




                            For Christ sake, researchers
                            write global climate models using MPI. And you think a toy problem
                            like 'real-time video processing' is a show stopper for using multiple
                            processes.
                            I'm not sure why you're posting this sort of stuff when it seems like
                            you haven't checked out earlier posts in the this thread. Also, you
                            do yourself and the people here a disservice in the way that you're
                            speaking to me here. You never know who you're really talking to or
                            who's reading.


                            Andy



                            Comment

                            • Paul Boddie

                              Re: 2.6, 3.0, and truly independent intepreters

                              On 5 Nov, 20:44, "Andy O'Meara" <and...@gmail.c omwrote:
                              On Nov 4, 10:59 am, sturlamolden <sturlamol...@y ahoo.nowrote:
                              >
                              For Christ sake, researchers
                              write global climate models using MPI. And you think a toy problem
                              like 'real-time video processing' is a show stopper for using multiple
                              processes.
                              >
                              I'm not sure why you're posting this sort of stuff when it seems like
                              you haven't checked out earlier posts in the this thread. Also, you
                              do yourself and the people here a disservice in the way that you're
                              speaking to me here. You never know who you're really talking to or
                              who's reading.
                              I think your remarks about "people in the scientific and academic
                              communities" went down the wrong way, giving (or perhaps reinforcing)
                              the impression that such people live carefree lives and write software
                              unconstrained by external factors.

                              Anyway, to keep things constructive, I should ask (again) whether you
                              looked at tinypy [1] and whether that might possibly satisfy your
                              embedded requirements. As I noted before, the developers might share
                              your outlook on a number of matters. Otherwise, you might peruse the
                              list of Python implementations :



                              Paul

                              [1] http://www.tinypy.org/

                              Comment

                              • sturlamolden

                                Re: 2.6, 3.0, and truly independent intepreters

                                On Nov 5, 8:44 pm, "Andy O'Meara" <and...@gmail.c omwrote:
                                All this says is:

                                1. The cost of serialization and deserialization is to large.
                                2. Complex data structures cannot be placed in shared memory.

                                The first claim is unsubstantiated . It depends on how much and what
                                you serialize. If you use something like NumPy arrays, the cost of
                                pickling is tiny. Erlang is a language specifically designed for
                                concurrent programming, yet it does not allow anything to be shared.

                                The second claim is plain wrong. You can put anything you want in
                                shared memory. The mapping address of the shared memory segment may
                                vary, but it can be dealt with (basically use integers instead of
                                pointers, and use the base address as offset.) Pyro is a Python
                                project that has investigated this. With Pyro you can put any Python
                                object in a shared memory region. You can also use NumPy record arrays
                                to put very complex data structures in shared memory.

                                What do you gain by placing multiple interpreters in the same process?
                                You will avoid the complication that the mapping address of the shared
                                memory region may be different. But this is a problem that has been
                                worked out and solved. Instead you get a lot of issues dealing with
                                DLL loading and unloading (Python extension objects).

                                The multiprocessing module has something called proxy objects, which
                                also deals with this issue. An object is hosed in a server process,
                                and client processes may access it through synchronized IPC calls.
                                Inside the client process the remote object looks like any other
                                Python object. The synchronized IPC is hidden away in an abstraction
                                layer. In Windows, you can also construct outproc ActiveX objects,
                                which are not that different from multiprocessing 's proxy objects.

                                If you need to place a complex object in shared memory:

                                1. Check if a NumPy record array may suffice (dtypes may be nested).
                                It will if you don't have dynamically allocated pointers inside the
                                data structure.

                                2. Consider using multiprocessing 's proxy objects or outproc ActiveX
                                objects.

                                3. Go to http://pyro.sourceforge.net, download the code and read the
                                documentation.

                                Saying that "it can't be done" is silly before you have tried.
                                Programmers are not that good at guessing where the bottlenecks
                                reside, even if we think we do.





                                Comment

                                Working...