2.6, 3.0, and truly independent intepreters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • sturlamolden

    Re: 2.6, 3.0, and truly independent intepreters

    On Nov 4, 6:51 pm, Paul Boddie <p...@boddie.or g.ukwrote:
    The language features look a lot like what others have already been
    offering for a while: keywords for parallelised constructs (clik_for)
    which are employed by solutions for various languages (C# and various C
    ++ libraries spring immediately to mind); spawning and synchronisation
    are typically supported in existing Python solutions, although
    obviously not using language keywords.
    Yes, but there is not a 'concurrency platform' that takes care of
    things like load balancing and testing for race conditions. If you
    spawn with cilk++, the result is not that a new process or thread is
    spawned. The task is put in a queue (scheduled using work stealing),
    and executed by a pool of threads/processes. Multiprocessing makes
    it easy to write concurrent algorithms (as opposed to subprocess or
    popen), but automatic load balancing is something it does not do. It
    also does not identify and warn the programmer about race conditions.
    It does not have a barrier synchronization paradigm, but it can be
    constructed.

    java.util.concu rrent.forkjoin is actually based on cilk.

    Something like cilk can easily be built on top of the multiprocessing
    module. Extra keywords can and should be avoided. But it is easier in
    Python than C. Keywords are used in cilk++ because they can be defined
    out by the preprocessor, thus restoring the original seqential code.
    In Python we can e.g. use a decorator instead.


    Comment

    • Walter Overby

      Re: 2.6, 3.0, and truly independent intepreters

      Hi,

      I've been following this discussion, and although I'm not nearly the
      Python expert that others on this thread are, I think I understand
      Andy's point of view. His premises seem to include at least:

      1. His Python code does not control the creation of the threads. That
      is done "at the app level".
      2. Perhaps more importantly, his Python code does not control the
      allocation of the data he needs to operate on. He's got, for example,
      "an opaque OS object" that is manipulated by CPU-intensive OS
      functions.

      sturlamolden suggests a few approaches:
      1. Check if a NumPy record array may suffice (dtypes may be nested).
      It will if you don't have dynamically allocated pointers inside the
      data structure.
      I suspect that the OS is very likely to have dynamically allocated
      pointers inside their opaque structures.
      2. Consider using multiprocessing 's proxy objects or outproc ActiveX
      objects.
      I don't understand how this would help. If these large data
      structures reside only in one remote process, then the overhead of
      proxying the data into another process for manipulation requires too
      much IPC, or at least so Andy stipulates.
      3. Go to http://pyro.sourceforge.net, download the code and read the
      documentation.
      I don't see how this solves the problem with 2. I admit I have only
      cursory knowledge, but I understand "remoting" approaches to have the
      same weakness.

      I understand Andy's problem to be that he needs to operate on a large
      amount of in-process data from several threads, and each thread mixes
      CPU-intensive C functions with callbacks to Python utility functions.
      He contends that, even though he releases the GIL in the CPU-bound C
      functions, the reacquisition of the GIL for the utility functions
      causes unacceptable contention slowdowns in the current implementation
      of CPython.

      After reading Martin's posts, I think I also understand his point of
      view. Is the time spent in these Python callbacks so large compared
      to the C functions that you really have to wait? If so, then Andy has
      crossed over into writing performance-critical code in Python. Andy
      proposes that the Python community could work on making that possible,
      but Martin cautions that it may be very hard to do so.

      If I understand them correctly, none of these concerns are silly.

      Walter.

      Comment

      • sturlamolden

        Re: 2.6, 3.0, and truly independent intepreters

        On Nov 6, 6:05 pm, Walter Overby <walter.ove...@ gmail.comwrote:
        I don't understand how this would help. If these large data
        structures reside only in one remote process, then the overhead of
        proxying the data into another process for manipulation requires too
        much IPC, or at least so Andy stipulates.
        Perhaps it will, or perhaps not. Reading or writing to a pipe has
        slightly more overhead than a memcpy. There are things that Python
        needs to do that are slower than the IPC. In this case, the real
        constraint would probably be contention for the object in the server,
        not the IPC. (And don't blame it on the GIL, because putting a lock
        around the object would not be any better.)

        3. Go tohttp://pyro.sourceforg e.net, download the code and read the
        documentation.
        >
        I don't see how this solves the problem with 2.
        It puts Python objects in shared memory. Shared memory is the fastest
        form of IPC there is. The overhead is basically zero. The only
        constraint will be contention for the object.

        I understand Andy's problem to be that he needs to operate on a large
        amount of in-process data from several threads, and each thread mixes
        CPU-intensive C functions with callbacks to Python utility functions.
        He contends that, even though he releases the GIL in the CPU-bound C
        functions, the reacquisition of the GIL for the utility functions
        causes unacceptable contention slowdowns in the current implementation
        of CPython.
        Yes, callbacks to Python are expensive. But is the problem the GIL?
        Instead of contention for the GIL, he seems to prefer contention for a
        complex object. Is that any better? It too has to be protected by a
        lock.

        If I understand them correctly, none of these concerns are silly.
        No they are not. But I think he underestimates what multiple processes
        can do. The objects in 'multiprocessin g' are already a lot faster than
        their 'threading' and 'Queue' counterparts.



        Comment

        • Walter Overby

          Re: 2.6, 3.0, and truly independent intepreters

          On Nov 6, 2:03 pm, sturlamolden <sturlamol...@y ahoo.nowrote:
          On Nov 6, 6:05 pm, Walter Overby <walter.ove...@ gmail.comwrote:
          >
          I don't understand how this would help.  If these large data
          structures reside only in one remote process, then the overhead of
          proxying the data into another process for manipulation requires too
          much IPC, or at least so Andy stipulates.
          >
          Perhaps it will, or perhaps not. Reading or writing to a pipe has
          slightly more overhead than a memcpy. There are things that Python
          needs to do that are slower than the IPC. In this case, the real
          constraint would probably be contention for the object in the server,
          not the IPC. (And don't blame it on the GIL, because putting a lock
          around the object would not be any better.)
          (I'm not blaming anything on the GIL.)

          I read Andy to stipulate that the pipe needs to transmit "hundreds of
          megs of data and/or thousands of data structure instances." I doubt
          he'd be happy with memcpy either. My instinct is that contention for
          a lock could be the quicker option.

          And don't forget, he says he's got an "opaque OS object." He asked
          the group to explain how to send that via IPC to another process. I
          surely don't know how.
          3. Go tohttp://pyro.sourceforg e.net, download the code and read the
          documentation.
          >
          I don't see how this solves the problem with 2.
          >
          It puts Python objects in shared memory. Shared memory is the fastest
          form of IPC there is. The overhead is basically zero. The only
          constraint will be contention for the object.
          I don't think he has Python objects to work with. I'm persuaded when
          he says: "when you're talking about large, intricate data structures
          (which include opaque OS object refs that use process-associated
          allocators), even a shared memory region between the child process and
          the parent can't do the job."

          Why aren't you persuaded?

          <snip>
          Yes, callbacks to Python are expensive. But is the problem the GIL?
          Instead of contention for the GIL, he seems to prefer contention for a
          complex object. Is that any better? It too has to be protected by a
          lock.
          At a couple points, Andy has expressed his preference for a "single
          high level sync object" to synchronize access to the data, at least
          that's my reading. What he doesn't seem to prefer is the slowdown
          arising from the Python callbacks acquiring the GIL. I think that
          would be an additional lock, and that's near the heart of Andy's
          concern, as I read him.
          If I understand them correctly, none of these concerns are silly.
          >
          No they are not. But I think he underestimates what multiple processes
          can do. The objects in 'multiprocessin g' are already a lot faster than
          their 'threading' and 'Queue' counterparts.
          Andy has complimented 'multiprocessin g' as a "huge huge step." He
          just offers a scenario where multiprocessing might not be the best
          solution, and so far, I see no evidence he is wrong. That's not
          underestimation , in my estimation!

          Walter.

          Comment

          • sturlamolden

            Re: 2.6, 3.0, and truly independent intepreters

            On Nov 7, 12:22 am, Walter Overby <walter.ove...@ gmail.comwrote:
            I read Andy to stipulate that the pipe needs to transmit "hundreds of
            megs of data and/or thousands of data structure instances."  I doubt
            he'd be happy with memcpy either.  My instinct is that contention for
            a lock could be the quicker option.
            If he needs to communicate that amount of data very often, he has a
            serious design problem.

            A pipe can transmit hundreds of megs in a split second by the way.

            And don't forget, he says he's got an "opaque OS object."  He asked
            the group to explain how to send that via IPC to another process.  I
            surely don't know how.
            This is a typical situation where one could use a proxy object. Let
            one server process own the opaque OS object, and multiple client
            processes access it via IPC calls to the server.

            I don't think he has Python objects to work with.  I'm persuaded when
            he says: "when you're talking about large, intricate data structures
            (which include opaque OS object refs that use process-associated
            allocators), even a shared memory region between the child process and
            the parent can't do the job."
            >
            Why aren't you persuaded?
            I am persuaded that shared memory may be difficult in that particular
            case. I am not persuaded that multiple processes cannot be used,
            because one can let one server process own the object.



            Comment

            • Paul Boddie

              Re: 2.6, 3.0, and truly independent intepreters

              On 7 Nov, 03:02, sturlamolden <sturlamol...@y ahoo.nowrote:
              On Nov 7, 12:22 am, Walter Overby <walter.ove...@ gmail.comwrote:
              >
              I read Andy to stipulate that the pipe needs to transmit "hundreds of
              megs of data and/or thousands of data structure instances."  I doubt
              he'd be happy with memcpy either.  My instinct is that contention for
              a lock could be the quicker option.
              >
              If he needs to communicate that amount of data very often, he has a
              serious design problem.
              As far as I can tell, he wants to keep the data in one place and just
              pass a pointer around between execution contexts. The apparent issue
              with using shared memory segments for this is that he relies on
              existing components which have their own allocation preferences. So
              although you or I might choose shared memory if writing this stuff
              from scratch, he doesn't appear to have this option.

              The inquirer hasn't acknowledged my remarks about tinypy, but I know
              that if I were considering dropping $40000 and/or 2-3 man-months, I'd
              at least have a look at what those people have done and whether
              there's any mileage in using it before starting a new, embeddable
              implementation of Python from scratch.

              Paul

              Comment

              • sturlamolden

                Re: 2.6, 3.0, and truly independent intepreters

                On Nov 7, 11:46 am, Paul Boddie <p...@boddie.or g.ukwrote:

                As far as I can tell, he wants to keep the data in one place and just
                pass a pointer around between execution contexts.
                This would be the easiest solution if Python were designed to do this
                from the beginning. I have previously stated that I believe the lack
                of a context pointer in Python's C API is a design flaw, albeit one
                that is difficult to change.

                If the alternative is to rewrite the whole CPython interpreter, I
                would say it it easier to try a proxy object design instead (either
                using multiprocessing or an outproc ActiveX object).





                Comment

                • Andy O'Meara

                  Re: 2.6, 3.0, and truly independent intepreters

                  On Nov 5, 5:09 pm, Paul Boddie <p...@boddie.or g.ukwrote:

                  Anyway, to keep things constructive, I should ask (again) whether you
                  looked at tinypy [1] and whether that might possibly satisfy your
                  embedded requirements.
                  Actually, I'm starting to get into the tinypy codebase and have been
                  talking in detail with the leads for that project (I just branched it,
                  in fact). TP indeed has all the right ingredients for a CPython "ES"
                  API, so I'm currently working on a first draft. Interestingly, the TP
                  VM is largely based on Lua's implementation and stresses compactness.
                  One challenge is that it's design may be overly compact, making it a
                  little tricky to extend and maintain (but I anticipate things will
                  improve as we rev it).

                  When I have a draft of this "CPythonES" API, I plan to post here for
                  everyone to look at and give feedback on. The only thing that sucks
                  is that I have a lot of other commitments right now, so I can't spend
                  the time on this that I'd like to. Once we have that API finalized,
                  I'll be able to start offering some bounties for filling in some of
                  its implementation. In any case, I look forward to updating folks
                  here on our progress!

                  Andy

                  Comment

                  • Andy O'Meara

                    Re: 2.6, 3.0, and truly independent intepreters

                    On Nov 6, 8:25 am, sturlamolden <sturlamol...@y ahoo.nowrote:
                    On Nov 5, 8:44 pm, "Andy O'Meara" <and...@gmail.c omwrote:
                    >
                    In a few earlier posts, I went into details what's meant there:
                    >>
                    All this says is:
                    >
                    1. The cost of serialization and deserialization is to large.
                    2. Complex data structures cannot be placed in shared memory.
                    >
                    The first claim is unsubstantiated . It depends on how much and what
                    you serialize.
                    Right, but I'm telling you that it *is* substantial... Unfortunately,
                    you can't serialize thousands of opaque OS objects (which undoubtably
                    contain sub allocations and pointers) in a frame-based, performance
                    centric-app. Please consider that others (such as myself) are not
                    trying to be difficult here--turns out that we're actually
                    professionals. Again, I'm not the type to compare credentials, but it
                    would be nice if you considered that you aren't the final authority on
                    real-time professional software development.
                    >
                    The second claim is plain wrong. You can put anything you want in
                    shared memory. The mapping address of the shared memory segment may
                    vary, but it can be dealt with (basically use integers instead of
                    pointers, and use the base address as offset.)
                    I explained this in other posts: OS objects are opaque and their
                    serialization has to be done via their APIs, which is never marketed
                    as being fast *OR* cheap. I've gone into this many times and in many
                    posts.
                    Saying that "it can't be done" is silly before you have tried.
                    Your attitude and unwillingless to look at the use cases listed myself
                    and others in this thread shows that this discussion may not be a good
                    use of your time. In any case, you haven't even acknowledged that a
                    package can't "wag the dog" when it comes to app development--and
                    that's the bottom line and root liability.


                    Andy




                    Comment

                    • Andy O'Meara

                      Re: 2.6, 3.0, and truly independent intepreters

                      On Nov 6, 9:02 pm, sturlamolden <sturlamol...@y ahoo.nowrote:
                      On Nov 7, 12:22 am, Walter Overby <walter.ove...@ gmail.comwrote:
                      >
                      I read Andy to stipulate that the pipe needs to transmit "hundreds of
                      megs of data and/or thousands of data structure instances."  I doubt
                      he'd be happy with memcpy either.  My instinct is that contention for
                      a lock could be the quicker option.
                      >
                      If he needs to communicate that amount of data very often, he has a
                      serious design problem.
                      >
                      Hmmm... Your comment there seems to be an indicator that you don't
                      have a lot of experience with real-time, performance-centric apps.
                      Consider my previously listed examples of video rendering and
                      programatic effects in real-time. You need to have a lot of stuff in
                      threads being worked on, and as Walter described, using a signal
                      rather than serialization is the clear choice. Or, consider Patrick's
                      case where you have massive amounts of audio being run through a DSP--
                      it just doesn't make sense to serialize a intricate, high level object
                      when you could otherwise just hand it off via a single sync step.
                      Walter and Paul really get what's being said here, so that should be
                      an indicator to take a step back for a moment and ease up a bit...
                      C'mon, man--we're all on the same side here! :^)


                      Andy



                      Comment

                      Working...