2.6, 3.0, and truly independent intepreters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Christian Heimes

    #16
    Re: 2.6, 3.0, and truly independent intepreters

    Andy wrote:
    2) Barriers to "free threading". As Jesse describes, this is simply
    just the GIL being in place, but of course it's there for a reason.
    It's there because (1) doesn't hold and there was never any specs/
    guidance put forward about what should and shouldn't be done in multi-
    threaded apps (see my QuickTime API example). Perhaps if we could go
    back in time, we would not put the GIL in place, strict guidelines
    regarding multithreaded use would have been established, and PEP 3121
    would have been mandatory for C modules. Then again--screw that, if I
    could go back in time, I'd just go for the lottery tickets!! :^)
    I'm very - not absolute, but very - sure that Guido and the initial
    designers of Python would have added the GIL anyway. The GIL makes
    Python faster on single core machines and more stable on multi core
    machines. Other language designers think the same way. Ruby recently got
    a GIL. The article
    http://www.infoq.com/news/2007/05/ru...eading-futures explains the
    rationales for a GIL in Ruby. The article also holds a quote from Guido
    about threading in general.

    Several people inside and outside the Python community think that
    threads are dangerous and don't scale. The paper
    http://www.eecs.berkeley.edu/Pubs/Te...ECS-2006-1.pdf sums it
    up nicely, It explains why modern processors are going to cause more and
    more trouble with the Java approach to threads, too.

    Python *must* gain means of concurrent execution of CPU bound code
    eventually to survive on the market. But it must get the right means or
    we are going to suffer the consequences.

    Christian


    Comment

    • Rhamphoryncus

      #17
      Re: 2.6, 3.0, and truly independent intepreters

      On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
      On approximately 10/23/2008 12:24 AM, came the following characters from
      the keyboard of Christian Heimes:
      >
      Andy wrote:
      2) Barriers to "free threading".  As Jesse describes, this is simply
      just the GIL being in place, but of course it's there for a reason.
      It's there because (1) doesn't hold and there was never any specs/
      guidance put forward about what should and shouldn't be done in multi-
      threaded apps (see my QuickTime API example).  Perhaps if we could go
      back in time, we would not put the GIL in place, strict guidelines
      regarding multithreaded use would have been established, and PEP 3121
      would have been mandatory for C modules.  Then again--screw that, ifI
      could go back in time, I'd just go for the lottery tickets!! :^)
      >
      I've been following this discussion with interest, as it certainly seems
      that multi-core/multi-CPU machines are the coming thing, and many
      applications will need to figure out how to use them effectively.
      >
      I'm very - not absolute, but very - sure that Guido and the initial
      designers of Python would have added the GIL anyway. The GIL makes
      Python faster on single core machines and more stable on multi core
      machines. Other language designers think the same way. Ruby recently
      got a GIL. The article
      http://www.infoq.com/news/2007/05/ru...uturesexplains the
      rationales for a GIL in Ruby. The article also holds a quote from
      Guido about threading in general.
      >
      Several people inside and outside the Python community think that
      threads are dangerous and don't scale. The paper

      it up nicely, It explains why modern processors are going to cause
      more and more trouble with the Java approach to threads, too.
      >
      Reading this PDF paper is extremely interesting (albeit somewhat
      dependent on understanding abstract theories of computation; I have
      enough math background to follow it, sort of, and most of the text can
      be read even without fully understanding the theoretical abstractions).
      >
      I have already heard people talking about "Java applications are
      buggy".  I don't believe that general sequential programs written in
      Java are any buggier than programs written in other languages... so I
      had interpreted that to mean (based on some inquiry) that complex,
      multi-threaded Java applications are buggy.  And while I also don't
      believe that complex, multi-threaded programs written in Java are any
      buggier than complex, multi-threaded programs written in other
      languages, it does seem to be true that Java is one of the currently
      popular languages in which to write complex, multi-threaded programs,
      because of its language support for threads and concurrency primitives.  
      These reports were from people that are not programmers, but are field
      IT people, that have bought and/or support software and/or hardware with
      drivers, that are written in Java, and seem to have non-ideal behavior,
      (apparently only) curable by stopping/restarting the application or
      driver, or sometimes requiring a reboot.
      >
      The paper explains many traps that lead to complex, multi-threaded
      programs being buggy, and being hard to test.  I have worked with
      parallel machines, applications, and databases for 25 years, and can
      appreciate the succinct expression of the problems explained within the
      paper, and can, from experience, agree with its premises and
      conclusions.  Parallel applications only have been commercial successes
      when the parallelism is tightly constrained to well-controlled patterns
      that could be easily understood.  Threads, especially in "cooperatio n"
      with languages that use memory pointers, have the potential to get out
      of control, in inexplicable ways.
      Although the paper is correct in many ways, I find it fails to
      distinguish the core of the problem from the chaff surrounding it, and
      thus is used to justify poor language designs.

      For example, the amount of interaction may be seen as a spectrum: at
      one end is C or Java threads, with complicated memory models, and a
      tendency to just barely control things using locks. At the other end
      would be completely isolated processes with no form of IPC. The later
      is considered the worst possible, while the latter is the best
      possible (purely sequential).

      However, the latter is too weak for many uses. At a minimum we'd like
      some pipes to communicate. Helps, but it's still too weak. What if
      you have a large amount of data to share, created at startup but
      otherwise not modified? So we add some read only types and ways to
      define your own read only types. A couple of those types need a
      process associated with them, so we make sure process handles are
      proper objects too.

      What have we got now? It's more on the thread end of the spectrum
      than the process end, but it's definitely not a C or Java thread, and
      it's definitely not an OS process. What is it? Does it have the
      problems in the paper? Only some? Which?

      Another peeve I have is his characterizatio n of the observer pattern.
      The generalized form of the problem exists in both single-threaded
      sequential programs, in the form of unexpected reentrancy, and message
      passing, with infinite CPU usage or infinite number of pending
      messages.

      Perhaps threading makes it much worse; I've heard many anecdotes that
      would support that. Or perhaps it's the lack of automatic deadlock
      detection, giving a clear and diagnosable error for you to fix.
      Certainly, the mystery and extremeness of a deadlock could explain how
      much it scales people. Either way the paper says nothing.

      Python *must* gain means of concurrent execution of CPU bound code
      eventually to survive on the market. But it must get the right means
      or we are going to suffer the consequences.
      >
      This statement, after reading the paper, seems somewhat in line with the
      author's premise that language acceptability requires that a language be
      self-contained/monolithic, and potentially sufficient to implement
      itself.  That seems to also be one of the reasons that Java is used
      today for threaded applications.  It does seem to be true, given current
      hardware trends, that _some mechanism_ must be provided to obtain the
      benefit of multiple cores/CPUs to a single application, and that Python
      must either implement or interface to that mechanism to continue to be a
      viable language for large scale application development.
      >
      Andy seems to want an implementation of independent Python processes
      implemented as threads within a single address space, that can be
      coordinated by an outer application.  This actually corresponds to the
      model promulgated in the paper as being most likely to succeed.  Of
      course, it maps nicely into a model using separate processes,
      coordinated by an outer process, also.  The differences seem to be:
      >
      1) Most applications are historically perceived as corresponding to
      single processes.  Language features for multi-processing are rare, and
      such languages are not in common use.
      >
      2) A single address space can be convenient for the coordinating outer
      application.  It does seem simpler and more efficient to simply "copy"
      data from one memory location to another, rather than send it in a
      message, especially if the data are large.  On the other hand,
      coordination of memory access between multiple cores/CPUs effectively
      causes memory copies from one cache to the other, and if memory is
      accessed from multiple cores/CPUs regularly, the underlying hardware
      implements additional synchronization and copying of data, potentially
      each time the memory is accessed.  Being forced to do message passing of
      data between processes can actually be more efficient than access to
      shared memory at times.  I should note that in my 25 years of parallel
      development, all the systems created used a message passing paradigm,
      partly because the multiple CPUs often didn't share the same memory
      chips, much less the same address space, and that a key feature of all
      the successful systems of that nature was an efficient inter-CPU message
      passing mechanism.  I should also note that Herb Sutter has a recent
      series of columns in Dr Dobbs regarding multi-core/multi-CPU parallelism
      and a variety of implementation pitfalls, that I found to be very
      interesting reading.
      Try looking at it on another level: when your CPU wants to read from a
      bit of memory controlled by another CPU it sends them a message
      requesting they get it for us. They send back a message containing
      that memory. They also note we have it, in case they want to modify
      it later. We also note where we got it, in case we want to modify it
      (and not wait for them to do modifications for us).

      Message passing vs shared memory isn't really a yes/no question. It's
      about ratios, usage patterns, and tradeoffs. *All* programs will
      share data, but in what way? If it's just the code itself you can
      move the cache validation into software and simplify the CPU, making
      it faster. If the shared data is a lot more than that, and you use it
      to coordinate accesses, then it'll be faster to have it in hardware.

      It's quite possible they'll come up with something that seems quite
      different, but in reality is the same sort of rearrangement. Add
      hardware support for transactions, move the caching partly into
      software, etc.
      >
      I have noted the multiprocessing module that is new to Python 2.6/3.0
      being feverishly backported to Python 2.5, 2.4, etc... indicating that
      people truly find the model/module useful... seems that this is one way,
      in Python rather than outside of it, to implement the model Andy is
      looking for, although I haven't delved into the details of that module
      yet, myself.  I suspect that a non-Python application could load one
      embedded Python interpreter, and then indirectly use the multiprocessing
      module to control other Python interpreters in other processors.  I
      don't know that multithreading primitives such as described in the paper
      are available in the multiprocessing module, but perhaps they can be
      implemented in some manner using the tools that are provided; in any
      case, some interprocess communication primitives are provided via this
      new Python module.
      >
      There could be opportunity to enhance Python with process creation and
      process coordination operations, rather than have it depend on
      easy-to-implement-incorrectly coordination patterns or
      easy-to-use-improperly libraries/modules of multiprocessing primitives
      (this is not a slam of the new multiprocessing module, which appears to
      be filling a present need in rather conventional ways, but just to point
      out that ideas promulgated by the paper, which I suspect 2 years later
      are still research topics, may be a better abstraction than the
      conventional mechanisms).
      >
      One thing Andy hasn't yet explained (or I missed) is why any of his
      application is coded in a language other than Python.  I can think of a
      number of possibilities:
      >
      A) (Historical) It existed, then the desire for extensions was seen, and
      Python was seen as a good extension language.
      >
      B) Python is inappropriate (performance?) for some of the algorithms
      (but should they be coded instead as Python extensions, with the core
      application being in Python?)
      >
      C) Unavailability of Python wrappers for particularly useful 3rd-party
      libraries
      >
      D) Other?
      "It already existed" is definitely the original reason, but now it
      includes single-threaded performance and multi-threaded scalability.
      Although the idea of "just write an extension that releases the GIL"
      is a common suggestion, it needs to be fairly coarse to be effective,
      and ensure little of the CPU time is left in python. If the apps
      spreads around it's CPU time it is likely impossible to use python
      effectively.

      Comment

      • greg

        #18
        Re: 2.6, 3.0, and truly independent intepreters

        Andy wrote:
        1) Independent interpreters (this is the easier one--and solved, in
        principle anyway, by PEP 3121, by Martin v. Löwis
        Something like that is necessary for independent interpreters,
        but not sufficient. There are also all the built-in constants
        and type objects to consider. Most of these are statically
        allocated at the moment.
        2) Barriers to "free threading". As Jesse describes, this is simply
        just the GIL being in place, but of course it's there for a reason.
        It's there because (1) doesn't hold and there was never any specs/
        guidance put forward about what should and shouldn't be done in multi-
        threaded apps
        No, it's there because it's necessary for acceptable performance
        when multiple threads are running in one interpreter. Independent
        interpreters wouldn't mean the absence of a GIL; it would only
        mean each interpreter having its own GIL.

        --
        Greg

        Comment

        • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

          #19
          Re: 2.6, 3.0, and truly independent intepreters

          You seem confused. PEP 3121 is for isolated interpreters (ie emulated
          processes), not threading.
          Just a small remark: this wasn't the primary objective of the PEP.
          The primary objective was to support module cleanup in a reliable
          manner, to allow eventually to get modules garbage-collected properly.
          However, I also kept the isolated interpreters feature in mind there.

          Regards,
          Martin

          Comment

          • sturlamolden

            #20
            Re: 2.6, 3.0, and truly independent intepreters


            Instead of "appdomains " (one interpreter per thread), or free
            threading, you could use multiple processes. Take a look at the new
            multiprocessing module in Python 2.6. It has roughly the same
            interface as Python's threading and queue modules, but uses processes
            instead of threads. Processes are scheduled independently by the
            operating system. The objects in the multiprocessing module also tend
            to have much better performance than their threading and queue
            counterparts. If you have a problem with threads due to the GIL, the
            multiprocessing module with most likely take care of it.

            There is a fundamental problem with using homebrew loading of multiple
            (but renamed) copies of PythonXX.dll that is easily overlooked. That
            is, extension modules (.pyd) are DLLs as well. Even if required by two
            interpreters, they will only be loaded into the process image once.
            Thus you have to rename all of them as well, or you will get havoc
            with refcounts. Not to speak of what will happen if a Windows HANDLE
            is closed by one interpreter while still needed by another. It is
            almost guaranteed to bite you, sooner or later.

            There are other options as well:

            - Use IronPython. It does not have a GIL.

            - Use Jython. It does not have a GIL.

            - Use pywin32 to create isolated outproc COM servers in Python. (I'm
            not sure what the effect of inproc servers would be.)

            - Use os.fork() if your platform supports it (Linux, Unix, Apple,
            Cygwin, Windows Vista SUA). This is the standard posix way of doing
            multiprocessing . It is almost unbeatable if you have a fast copy-on-
            write implementation of fork (that is, all platforms except Cygwin).












            Comment

            • Andy O'Meara

              #21
              Re: 2.6, 3.0, and truly independent intepreters

              On Oct 24, 9:35 am, sturlamolden <sturlamol...@y ahoo.nowrote:
              Instead of "appdomains " (one interpreter per thread), or free
              threading, you could use multiple processes. Take a look at the new
              multiprocessing module in Python 2.6.
              That's mentioned earlier in the thread.
              >
              There is a fundamental problem with using homebrew loading of multiple
              (but renamed) copies of PythonXX.dll that is easily overlooked. That
              is, extension modules (.pyd) are DLLs as well.
              Tell me about it--there's all kinds of problems and maintenance
              liabilities with our approach. That's why I'm here talking about this
              stuff.
              There are other options as well:
              >
              - Use IronPython. It does not have a GIL.
              >
              - Use Jython. It does not have a GIL.
              >
              - Use pywin32 to create isolated outproc COM servers in Python. (I'm
              not sure what the effect of inproc servers would be.)
              >
              - Use os.fork() if your platform supports it (Linux, Unix, Apple,
              Cygwin, Windows Vista SUA). This is the standard posix way of doing
              multiprocessing . It is almost unbeatable if you have a fast copy-on-
              write implementation of fork (that is, all platforms except Cygwin).
              This is discussed earlier in the thread--they're unfortunately all
              out.

              Comment

              • Stefan Behnel

                #22
                Re: 2.6, 3.0, and truly independent intepreters

                Terry Reedy wrote:
                Everything in DLLs is compiled C extensions. I see about 15 for Windows
                3.0.
                Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was
                inhabited by only 15 libraries? *sigh*

                .... although ... wait, didn't Win3.0 have more than that already? Maybe you
                meant Windows 1.0?

                SCNR-ly,

                Stefan

                Comment

                • sturlamolden

                  #23
                  Re: 2.6, 3.0, and truly independent intepreters

                  On Oct 24, 3:58 pm, "Andy O'Meara" <and...@gmail.c omwrote:
                  This is discussed earlier in the thread--they're unfortunately all
                  out.
                  It occurs to me that tcl is doing what you want. Have you ever thought
                  of not using Python?

                  That aside, the fundamental problem is what I perceive a fundamental
                  design flaw in Python's C API. In Java JNI, each function takes a
                  JNIEnv* pointer as their first argument. There is nothing the
                  prevents you from embedding several JVMs in a process. Python can
                  create embedded subinterpreters , but it works differently. It swaps
                  subinterpreters like a finite state machine: only one is concurrently
                  active, and the GIL is shared. The approach is fine, except it kills
                  free threading of subinterpreters . The argument seems to be that
                  Apache's mod_python somehow depends on it (for reasons I don't
                  understand).





                  Comment

                  • Andy O'Meara

                    #24
                    Re: 2.6, 3.0, and truly independent intepreters

                    On Oct 24, 2:12 am, greg <g...@cosc.cant erbury.ac.nzwro te:
                    Andy wrote:
                    1) Independent interpreters (this is the easier one--and solved, in
                    principle anyway, by PEP 3121, by Martin v. Löwis
                    >
                    Something like that is necessary for independent interpreters,
                    but not sufficient. There are also all the built-in constants
                    and type objects to consider. Most of these are statically
                    allocated at the moment.
                    >
                    Agreed--I was just trying to speak generally. Or, put another way,
                    there's no hope for independent interpreters without the likes of PEP
                    3121. Also, as Martin pointed out, there's the issue of module
                    cleanup some guys here may underestimate (and I'm glad Martin pointed
                    out the importance of it). Without the module cleanup, every time a
                    dynamic library using python loads and unloads you've got leaks. This
                    issue is a real problem for us since our software is loaded and
                    unloaded many many times in a host app (iTunes, WMP, etc). I hadn't
                    raised it here yet (and I don't want to turn the discussion to this),
                    but lack of multiple load and unload support has been another painful
                    issue that we didn't expect to encounter when we went with python.

                    2) Barriers to "free threading".  As Jesse describes, this is simply
                    just the GIL being in place, but of course it's there for a reason.
                    It's there because (1) doesn't hold and there was never any specs/
                    guidance put forward about what should and shouldn't be done in multi-
                    threaded apps
                    >
                    No, it's there because it's necessary for acceptable performance
                    when multiple threads are running in one interpreter. Independent
                    interpreters wouldn't mean the absence of a GIL; it would only
                    mean each interpreter having its own GIL.
                    >
                    I see what you're saying, but let's note that what you're talking
                    about at this point is an interpreter containing protection from the
                    client level violating (supposed) direction put forth in python
                    multithreaded guidelines. Glenn Linderman's post really gets at
                    what's at hand here. It's really important to consider that it's not
                    a given that python (or any framework) has to be designed against
                    hazardous use. Again, I refer you to the diagrams and guidelines in
                    the QuickTime API:



                    They tell you point-blank what you can and can't do, and it's that's
                    simple. Their engineers can then simply create the implementation
                    around those specs and not weigh any of the implementation down with
                    sync mechanisms. I'm in the camp that simplicity and convention wins
                    the day when it comes to an API. It's safe to say that software
                    engineers expect and assume that a thread that doesn't have contact
                    with other threads (except for explicit, controlled message/object
                    passing) will run unhindered and safely, so I raise an eyebrow at the
                    GIL (or any internal "helper" sync stuff) holding up an thread's
                    performance when the app is designed to not need lower-level global
                    locks.

                    Anyway, let's talk about solutions. My company looking to support
                    python dev community endeavor that allows the following:

                    - an app makes N worker threads (using the OS)

                    - each worker thread makes its own interpreter, pops scripts off a
                    work queue, and manages exporting (and then importing) result data to
                    other parts of the app. Generally, we're talking about CPU-bound work
                    here.

                    - each interpreter has the essentials (e.g. math support, string
                    support, re support, and so on -- I realize this is open-ended, but
                    work with me here).

                    Let's guesstimate about what kind of work we're talking about here and
                    if this is even in the realm of possibility. If we find that it *is*
                    possible, let's figure out what level of work we're talking about.
                    From there, I can get serious about writing up a PEP/spec, paid
                    support, and so on.

                    Regards,
                    Andy





                    Comment

                    • Patrick Stinson

                      #25
                      Re: 2.6, 3.0, and truly independent intepreters

                      I'm not finished reading the whole thread yet, but I've got some
                      things below to respond to this post with.

                      On Thu, Oct 23, 2008 at 9:30 AM, Glenn Linderman <v+python@g.nev cal.comwrote:
                      On approximately 10/23/2008 12:24 AM, came the following characters from the
                      keyboard of Christian Heimes:
                      >>
                      >Andy wrote:
                      >>>
                      >>2) Barriers to "free threading". As Jesse describes, this is simply
                      >>just the GIL being in place, but of course it's there for a reason.
                      >>It's there because (1) doesn't hold and there was never any specs/
                      >>guidance put forward about what should and shouldn't be done in multi-
                      >>threaded apps (see my QuickTime API example). Perhaps if we could go
                      >>back in time, we would not put the GIL in place, strict guidelines
                      >>regarding multithreaded use would have been established, and PEP 3121
                      >>would have been mandatory for C modules. Then again--screw that, if I
                      >>could go back in time, I'd just go for the lottery tickets!! :^)
                      >
                      >
                      I've been following this discussion with interest, as it certainly seems
                      that multi-core/multi-CPU machines are the coming thing, and many
                      applications will need to figure out how to use them effectively.
                      >
                      >I'm very - not absolute, but very - sure that Guido and the initial
                      >designers of Python would have added the GIL anyway. The GIL makes Python
                      >faster on single core machines and more stable on multi core machines. Other
                      >language designers think the same way. Ruby recently got a GIL. The article
                      >http://www.infoq.com/news/2007/05/ru...eading-futures explains the
                      >rationales for a GIL in Ruby. The article also holds a quote from Guido
                      >about threading in general.
                      >>
                      >Several people inside and outside the Python community think that threads
                      >are dangerous and don't scale. The paper
                      >http://www.eecs.berkeley.edu/Pubs/Te...ECS-2006-1.pdf sums it up
                      >nicely, It explains why modern processors are going to cause more and more
                      >trouble with the Java approach to threads, too.
                      >
                      Reading this PDF paper is extremely interesting (albeit somewhat dependent
                      on understanding abstract theories of computation; I have enough math
                      background to follow it, sort of, and most of the text can be read even
                      without fully understanding the theoretical abstractions).
                      >
                      I have already heard people talking about "Java applications are buggy". I
                      don't believe that general sequential programs written in Java are any
                      buggier than programs written in other languages... so I had interpreted
                      that to mean (based on some inquiry) that complex, multi-threaded Java
                      applications are buggy. And while I also don't believe that complex,
                      multi-threaded programs written in Java are any buggier than complex,
                      multi-threaded programs written in other languages, it does seem to be true
                      that Java is one of the currently popular languages in which to write
                      complex, multi-threaded programs, because of its language support for
                      threads and concurrency primitives. These reports were from people that are
                      not programmers, but are field IT people, that have bought and/or support
                      software and/or hardware with drivers, that are written in Java, and seem to
                      have non-ideal behavior, (apparently only) curable by stopping/restarting
                      the application or driver, or sometimes requiring a reboot.
                      >
                      The paper explains many traps that lead to complex, multi-threaded programs
                      being buggy, and being hard to test. I have worked with parallel machines,
                      applications, and databases for 25 years, and can appreciate the succinct
                      expression of the problems explained within the paper, and can, from
                      experience, agree with its premises and conclusions. Parallel applications
                      only have been commercial successes when the parallelism is tightly
                      constrained to well-controlled patterns that could be easily understood.
                      Threads, especially in "cooperatio n" with languages that use memory
                      pointers, have the potential to get out of control, in inexplicable ways.
                      >
                      >
                      >Python *must* gain means of concurrent execution of CPU bound code
                      >eventually to survive on the market. But it must get the right means or we
                      >are going to suffer the consequences.
                      >
                      This statement, after reading the paper, seems somewhat in line with the
                      author's premise that language acceptability requires that a language be
                      self-contained/monolithic, and potentially sufficient to implement itself.
                      That seems to also be one of the reasons that Java is used today for
                      threaded applications. It does seem to be true, given current hardware
                      trends, that _some mechanism_ must be provided to obtain the benefit of
                      multiple cores/CPUs to a single application, and that Python must either
                      implement or interface to that mechanism to continue to be a viable language
                      for large scale application development.
                      >
                      Andy seems to want an implementation of independent Python processes
                      implemented as threads within a single address space, that can be
                      coordinated by an outer application. This actually corresponds to the model
                      promulgated in the paper as being most likely to succeed. Of course, it
                      maps nicely into a model using separate processes, coordinated by an outer
                      process, also. The differences seem to be:
                      >
                      1) Most applications are historically perceived as corresponding to single
                      processes. Language features for multi-processing are rare, and such
                      languages are not in common use.
                      >
                      2) A single address space can be convenient for the coordinating outer
                      application. It does seem simpler and more efficient to simply "copy" data
                      from one memory location to another, rather than send it in a message,
                      especially if the data are large. On the other hand, coordination of memory
                      access between multiple cores/CPUs effectively causes memory copies from one
                      cache to the other, and if memory is accessed from multiple cores/CPUs
                      regularly, the underlying hardware implements additional synchronization and
                      copying of data, potentially each time the memory is accessed. Being forced
                      to do message passing of data between processes can actually be more
                      efficient than access to shared memory at times. I should note that in my
                      25 years of parallel development, all the systems created used a message
                      passing paradigm, partly because the multiple CPUs often didn't share the
                      same memory chips, much less the same address space, and that a key feature
                      of all the successful systems of that nature was an efficient inter-CPU
                      message passing mechanism. I should also note that Herb Sutter has a recent
                      series of columns in Dr Dobbs regarding multi-core/multi-CPU parallelism and
                      a variety of implementation pitfalls, that I found to be very interesting
                      reading.
                      >
                      I have noted the multiprocessing module that is new to Python 2.6/3.0 being
                      feverishly backported to Python 2.5, 2.4, etc... indicating that people
                      truly find the model/module useful... seems that this is one way, in Python
                      rather than outside of it, to implement the model Andy is looking for,
                      although I haven't delved into the details of that module yet, myself. I
                      suspect that a non-Python application could load one embedded Python
                      interpreter, and then indirectly use the multiprocessing module to control
                      other Python interpreters in other processors. I don't know that
                      multithreading primitives such as described in the paper are available in
                      the multiprocessing module, but perhaps they can be implemented in some
                      manner using the tools that are provided; in any case, some interprocess
                      communication primitives are provided via this new Python module.
                      >
                      There could be opportunity to enhance Python with process creation and
                      process coordination operations, rather than have it depend on
                      easy-to-implement-incorrectly coordination patterns or
                      easy-to-use-improperly libraries/modules of multiprocessing primitives (this
                      is not a slam of the new multiprocessing module, which appears to be filling
                      a present need in rather conventional ways, but just to point out that ideas
                      promulgated by the paper, which I suspect 2 years later are still research
                      topics, may be a better abstraction than the conventional mechanisms).
                      >
                      One thing Andy hasn't yet explained (or I missed) is why any of his
                      application is coded in a language other than Python. I can think of a
                      number of possibilities:
                      >
                      A) (Historical) It existed, then the desire for extensions was seen, and
                      Python was seen as a good extension language.
                      >
                      B) Python is inappropriate (performance?) for some of the algorithms (but
                      should they be coded instead as Python extensions, with the core application
                      being in Python?)
                      >
                      C) Unavailability of Python wrappers for particularly useful 3rd-party
                      libraries
                      >
                      D) Other?
                      We develop virtual instrument plugins for music production using
                      AudioUnit, VST, and RTAS on Windows and OS X. While our dsp engine's
                      code has to be written in C/C++ for performance reasons, the gui could
                      have been written in python. But, we didn't because:

                      1) Our project lead didn't know python, and the project began with
                      little time for him to learn it.
                      2) All of our third-party libs (for dsp, plugin-wrappers, etc) are
                      written in C++, so it would far easier to write and debug our app if
                      written in the same language. Could I do it now? yes. Could we do it
                      then? No.

                      ** Additionally **, we would have run into this problem, which is very
                      appropriate to this thread:

                      3) Adding python as an audio scripting language in the audio thread
                      would have caused concurrency issues if our GUI had been written in
                      python, since audio threads are not allowed to make blockign calls
                      (f.ex. acquiring the GIL).

                      OK, I'll continue reading the thread now :)
                      >
                      --
                      Glenn -- http://nevcal.com/
                      =============== ============
                      A protocol is complete when there is nothing left to remove.
                      -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
                      >
                      --

                      >

                      Comment

                      • Andy O'Meara

                        #26
                        Re: 2.6, 3.0, and truly independent intepreters


                        Glenn, great post and points!
                        >
                        Andy seems to want an implementation of independent Python processes
                        implemented as threads within a single address space, that can be
                        coordinated by an outer application.  This actually corresponds to the
                        model promulgated in the paper as being most likely to succeed.
                        Yeah, that's the idea--let the highest levels run and coordinate the
                        show.
                        >
                        It does seem simpler and more efficient to simply "copy"
                        data from one memory location to another, rather than send it in a
                        message, especially if the data are large.
                        That's the rub... In our case, we're doing image and video
                        manipulation--stuff not good to be messaging from address space to
                        address space. The same argument holds for numerical processing with
                        large data sets. The workers handing back huge data sets via
                        messaging isn't very attractive.
                        One thing Andy hasn't yet explained (or I missed) is why any of his
                        application is coded in a language other than Python.  
                        Our software runs in real time (so performance is paramount),
                        interacts with other static libraries, depends on worker threads to
                        perform real-time image manipulation, and leverages Windows and Mac OS
                        API concepts and features. Python's performance hits have generally
                        been a huge challenge with our animators because they often have to go
                        back and massage their python code to improve execution performance.
                        So, in short, there are many reasons why we use python as a part
                        rather than a whole.

                        The other area of pain that I mentioned in one of my other posts is
                        that what we ship, above all, can't be flaky. The lack of module
                        cleanup (intended to be addressed by PEP 3121), using a duplicate copy
                        of the python dynamic lib, and namespace black magic to achieve
                        independent interpreters are all examples that have made using python
                        for us much more challenging and time-consuming then we ever
                        anticipated.

                        Again, if it turns out nothing can be done about our needs (which
                        appears to be more and more like the case), I think it's important for
                        everyone here to consider the points raised here in the last week.
                        Moreover, realize that the python dev community really stands to gain
                        from making python usable as a tool (rather than a monolith). This
                        fact alone has caused lua to *rapidly* rise in popularity with
                        software companies looking to embed a powerful, lightweight
                        interpreter in their software.

                        As a python language fan an enthusiast, don't let lua win! (I say
                        this endearingly of course--I have the utmost respect for both
                        communities and I only want to see CPython be an attractive pick when
                        a company is looking to embed a language that won't intrude upon their
                        app's design).


                        Andy

                        Comment

                        • Patrick Stinson

                          #27
                          Re: 2.6, 3.0, and truly independent intepreters

                          We are in the same position as Andy here.

                          I think that something that would help people like us produce
                          something in code form is a collection of information outlining the
                          problem and suggested solutions, appropriate parts of the CPython's
                          current threading API, and pros and cons of the many various proposed
                          solutions to the different levels of the problem. The most valuable
                          information I've found is contained in the many (lengthy!) discussions
                          like this one, a few related PEP's, and the CPython docs, but has
                          anyone condensed the state of the problem into a wiki or something
                          similar? Maybe we should start one?

                          For example, Guido's post here
                          http://www.artima.com/weblogs/viewpo...14235describes some
                          possible solutions to the problem, like interpreter-specific locks, or
                          fine-grained object locks, and he also mentions the primary
                          requirement of not harming from the performance of single-threaded
                          apps. As I understand it, that requirement does not rule out new build
                          configurations that provide some level of concurrency, as long as you
                          can still compile python so as to perform as well on single-threaded
                          apps.

                          To add to the heap of use cases, the most important thing to us is to
                          simple have the python language and the sip/PyQt modules available to
                          us. All we wanted to do was embed the interpreter and language core as
                          a local scripting engine, so had we patched python to provide
                          concurrent execution, we wouldn't have cared about all of the other
                          unsuppported extension modules since our scripts are quite
                          application-specific.

                          It seems to me that the very simplest move would be to remove global
                          static data so the app could provide all thread-related data, which
                          Andy suggests through references to the QuickTime API. This would
                          suggest compiling python without thread support so as to leave it up
                          to the application.

                          Anyway, I'm having fun reading all of these papers and news postings,
                          but it's true that code talks, and it could be a little easier if the
                          state of the problems was condensed. This could be an intense and fun
                          project, but frankly it's a little tough to keep it all in my head. Is
                          there a wiki or something out there or should we start one, or do I
                          just need to read more code?

                          On Fri, Oct 24, 2008 at 6:40 AM, Andy O'Meara <andy55@gmail.c omwrote:
                          On Oct 24, 2:12 am, greg <g...@cosc.cant erbury.ac.nzwro te:
                          >Andy wrote:
                          1) Independent interpreters (this is the easier one--and solved, in
                          principle anyway, by PEP 3121, by Martin v. Löwis
                          >>
                          >Something like that is necessary for independent interpreters,
                          >but not sufficient. There are also all the built-in constants
                          >and type objects to consider. Most of these are statically
                          >allocated at the moment.
                          >>
                          >
                          Agreed--I was just trying to speak generally. Or, put another way,
                          there's no hope for independent interpreters without the likes of PEP
                          3121. Also, as Martin pointed out, there's the issue of module
                          cleanup some guys here may underestimate (and I'm glad Martin pointed
                          out the importance of it). Without the module cleanup, every time a
                          dynamic library using python loads and unloads you've got leaks. This
                          issue is a real problem for us since our software is loaded and
                          unloaded many many times in a host app (iTunes, WMP, etc). I hadn't
                          raised it here yet (and I don't want to turn the discussion to this),
                          but lack of multiple load and unload support has been another painful
                          issue that we didn't expect to encounter when we went with python.
                          >
                          >
                          2) Barriers to "free threading". As Jesse describes, this is simply
                          just the GIL being in place, but of course it's there for a reason.
                          It's there because (1) doesn't hold and there was never any specs/
                          guidance put forward about what should and shouldn't be done in multi-
                          threaded apps
                          >>
                          >No, it's there because it's necessary for acceptable performance
                          >when multiple threads are running in one interpreter. Independent
                          >interpreters wouldn't mean the absence of a GIL; it would only
                          >mean each interpreter having its own GIL.
                          >>
                          >
                          I see what you're saying, but let's note that what you're talking
                          about at this point is an interpreter containing protection from the
                          client level violating (supposed) direction put forth in python
                          multithreaded guidelines. Glenn Linderman's post really gets at
                          what's at hand here. It's really important to consider that it's not
                          a given that python (or any framework) has to be designed against
                          hazardous use. Again, I refer you to the diagrams and guidelines in
                          the QuickTime API:
                          >

                          >
                          They tell you point-blank what you can and can't do, and it's that's
                          simple. Their engineers can then simply create the implementation
                          around those specs and not weigh any of the implementation down with
                          sync mechanisms. I'm in the camp that simplicity and convention wins
                          the day when it comes to an API. It's safe to say that software
                          engineers expect and assume that a thread that doesn't have contact
                          with other threads (except for explicit, controlled message/object
                          passing) will run unhindered and safely, so I raise an eyebrow at the
                          GIL (or any internal "helper" sync stuff) holding up an thread's
                          performance when the app is designed to not need lower-level global
                          locks.
                          >
                          Anyway, let's talk about solutions. My company looking to support
                          python dev community endeavor that allows the following:
                          >
                          - an app makes N worker threads (using the OS)
                          >
                          - each worker thread makes its own interpreter, pops scripts off a
                          work queue, and manages exporting (and then importing) result data to
                          other parts of the app. Generally, we're talking about CPU-bound work
                          here.
                          >
                          - each interpreter has the essentials (e.g. math support, string
                          support, re support, and so on -- I realize this is open-ended, but
                          work with me here).
                          >
                          Let's guesstimate about what kind of work we're talking about here and
                          if this is even in the realm of possibility. If we find that it *is*
                          possible, let's figure out what level of work we're talking about.
                          From there, I can get serious about writing up a PEP/spec, paid
                          support, and so on.
                          >
                          Regards,
                          Andy
                          >
                          >
                          >
                          >
                          >
                          --

                          >

                          Comment

                          • Patrick Stinson

                            #28
                            Re: 2.6, 3.0, and truly independent intepreters

                            As a side note to the performance question, we are executing python
                            code in an audio thread that is used in all of the top-end music
                            production environments. We have found the language to perform
                            extremely well when executed at control-rate frequency, meaning we
                            aren't doing DSP computations, just responding to less-frequent events
                            like user input and MIDI messages.

                            So we are sitting this music platform with unimaginable possibilities
                            in the music world (of which python does not play a role), but those
                            little CPU spikes caused by the GIL at low latencies won't let us have
                            it. AFAIK, there is no music scripting language out there that would
                            come close, and yet we are sooooo close! This is a big deal.

                            On Fri, Oct 24, 2008 at 7:42 AM, Andy O'Meara <andy55@gmail.c omwrote:
                            >
                            Glenn, great post and points!
                            >
                            >>
                            >Andy seems to want an implementation of independent Python processes
                            >implemented as threads within a single address space, that can be
                            >coordinated by an outer application. This actually corresponds to the
                            >model promulgated in the paper as being most likely to succeed.
                            >
                            Yeah, that's the idea--let the highest levels run and coordinate the
                            show.
                            >
                            >>
                            >It does seem simpler and more efficient to simply "copy"
                            >data from one memory location to another, rather than send it in a
                            >message, especially if the data are large.
                            >
                            That's the rub... In our case, we're doing image and video
                            manipulation--stuff not good to be messaging from address space to
                            address space. The same argument holds for numerical processing with
                            large data sets. The workers handing back huge data sets via
                            messaging isn't very attractive.
                            >
                            >One thing Andy hasn't yet explained (or I missed) is why any of his
                            >application is coded in a language other than Python.
                            >
                            Our software runs in real time (so performance is paramount),
                            interacts with other static libraries, depends on worker threads to
                            perform real-time image manipulation, and leverages Windows and Mac OS
                            API concepts and features. Python's performance hits have generally
                            been a huge challenge with our animators because they often have to go
                            back and massage their python code to improve execution performance.
                            So, in short, there are many reasons why we use python as a part
                            rather than a whole.
                            >
                            The other area of pain that I mentioned in one of my other posts is
                            that what we ship, above all, can't be flaky. The lack of module
                            cleanup (intended to be addressed by PEP 3121), using a duplicate copy
                            of the python dynamic lib, and namespace black magic to achieve
                            independent interpreters are all examples that have made using python
                            for us much more challenging and time-consuming then we ever
                            anticipated.
                            >
                            Again, if it turns out nothing can be done about our needs (which
                            appears to be more and more like the case), I think it's important for
                            everyone here to consider the points raised here in the last week.
                            Moreover, realize that the python dev community really stands to gain
                            from making python usable as a tool (rather than a monolith). This
                            fact alone has caused lua to *rapidly* rise in popularity with
                            software companies looking to embed a powerful, lightweight
                            interpreter in their software.
                            >
                            As a python language fan an enthusiast, don't let lua win! (I say
                            this endearingly of course--I have the utmost respect for both
                            communities and I only want to see CPython be an attractive pick when
                            a company is looking to embed a language that won't intrude upon their
                            app's design).
                            >
                            >
                            Andy
                            --

                            >

                            Comment

                            • Terry Reedy

                              #29
                              Re: 2.6, 3.0, and truly independent intepreters

                              Stefan Behnel wrote:
                              Terry Reedy wrote:
                              >Everything in DLLs is compiled C extensions. I see about 15 for Windows
                              >3.0.
                              >
                              Ah, weren't that wonderful times back in the days of Win3.0, when DLL-hell was
                              inhabited by only 15 libraries? *sigh*
                              >
                              ... although ... wait, didn't Win3.0 have more than that already? Maybe you
                              meant Windows 1.0?
                              >
                              SCNR-ly,
                              Is that the equivalent of a smilely? or did you really not understand
                              what I wrote?

                              Comment

                              • Andy O'Meara

                                #30
                                Re: 2.6, 3.0, and truly independent intepreters


                                >
                                The Global Interpreter Lock is fundamentally designed to make the
                                interpreter easier to maintain and safer: Developers do not need to
                                worry about other code stepping on their namespace. This makes things
                                thread-safe, inasmuch as having multiple PThreads within the same
                                interpreter space modifying global state and variable at once is,
                                well, bad. A c-level module, on the other hand, can sidestep/release
                                the GIL at will, and go on it's merry way and process away.
                                ....Unless part of the C module execution involves the need do CPU-
                                bound work on another thread through a different python interpreter,
                                right? (even if the interpreter is 100% independent, yikes). For
                                example, have a python C module designed to programmaticall y generate
                                images (and video frames) in RAM for immediate and subsequent use in
                                animation. Meanwhile, we'd like to have a pthread with its own
                                interpreter with an instance of this module and have it dequeue jobs
                                as they come in (in fact, there'd be one of these threads for each
                                excess core present on the machine). As far as I can tell, it seems
                                CPython's current state can't CPU bound parallelization in the same
                                address space (basically, it seems that we're talking about the
                                "embarrassi ngly parallel" scenario raised in that paper). Why does it
                                have to be in same address space? Convenience and simplicity--the
                                same reasons that most APIs let you hang yourself if the app does dumb
                                things with threads. Also, when the data sets that you need to send
                                to and from each process is large, using the same address space makes
                                more and more sense.

                                So, just to clarify - Andy, do you want one interpreter, $N threads
                                (e.g. PThreads) or the ability to fork multiple "heavyweigh t"
                                processes?
                                Sorry if I haven't been clear, but we're talking the app starting a
                                pthread, making a fresh/clean/independent interpreter, and then being
                                responsible for its safety at the highest level (with the payoff of
                                each of these threads executing without hinderance). No different
                                than if you used most APIs out there where step 1 is always to make
                                and init a context object and the final step is always to destroy/take-
                                down that context object.

                                I'm a lousy writer sometimes, but I feel bad if you took the time to
                                describe threads vs processes. The only reason I raised IPC with my
                                "messaging isn't very attractive" comment was to respond to Glenn
                                Linderman's points regarding tradeoffs of shared memory vs no.


                                Andy



                                Comment

                                Working...