2.6, 3.0, and truly independent intepreters

**=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=** · Oct 26 '08, 10:05 AM

Re: 2.6, 3.0, and truly independent intepreters

>>As far as I can tell, it seems

>>CPython's current state can't CPU bound parallelization in the same
>>address space.

>That's not true.
>>

>
Um... So let's say you have a opaque object ref from the OS that
represents hundreds of megs of data (e.g. memory-resident video). How
do you get that back to the parent process without serialization and
IPC?

What parent process? I thought you were talking about multi-threading?

What should really happen is just use the same address space so
just a pointer changes hands. THAT's why I'm saying that a separate
address space is generally a deal breaker when you have large or
intricate data sets (ie. when performance matters).

Right. So use a single address space, multiple threads, and perform the
heavy computations in C code. I don't see how Python is in the way at
all. Many people do that, and it works just fine. That's what
Jesse (probably) meant with his remark

>A c-level module, on the other hand, can sidestep/release
>the GIL at will, and go on it's merry way and process away.

Please reconsider this; it might be a solution to your problem.

Regards,
Martin

**Andy O'Meara** · Oct 27 '08, 01:05 AM

Re: 2.6, 3.0, and truly independent intepreters

Grrr... I posted a ton of lengthy replies to you and other recent
posts here using Google and none of them made it, argh. Poof. There's
nothing that fires more up more than lost work, so I'll have to
revert short and simple answers for the time being. Argh, damn.

On Oct 25, 1:26 am, greg <g...@cosc.cant erbury.ac.nzwro te:

Andy O'Meara wrote:

I would definitely agree if there was a context (i.e. environment)
object passed around then perhaps we'd have the best of all worlds.

>
Moreover, I think this is probably the *only* way that
totally independent interpreters could be realized.
>
Converting the whole C API to use this strategy would be
a very big project. Also, on the face of it, it seems like
it would render all existing C extension code obsolete,
although it might be possible to do something clever with
macros to create a compatibility layer.
>
Another thing to consider is that passing all these extra
pointers around everywhere is bound to have some effect
on performance.

I'm with you on all counts, so no disagreement there. On the "passing
a ptr everywhere" issue, perhaps one idea is that all objects could
have an additional field that would point back to their parent context
(ie. their interpreter). So the only prototypes that would have to be
modified to contain the context ptr would be the ones that don't
inherently operate on objects (e.g. importing a module).

On Oct 25, 1:54 am, greg <g...@cosc.cant erbury.ac.nzwro te:

Andy O'Meara wrote:

- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app.

>
I hope you realize that starting up one of these interpreters
is going to be fairly expensive. It will have to create its
own versions of all the builtin constants and type objects,
and import its own copy of all the modules it uses.
>

Yeah, for sure. And I'd say that's a pretty well established
convention already out there for any industry package. The pattern
I'd expect to see is where the app starts worker threads, starts
interpreters in one or more of each, and throws jobs to different ones
(and the interpreter would persist to move on to subsequent jobs).

One wonders if it wouldn't be cheaper just to fork the
process. Shared memory can be used to transfer large lumps
of data if needed.
>

As I mentioned, wen you're talking about intricate data structures, OS
opaque objects (ie. that have their own internal allocators), or huge
data sets, even a shared memory region unfortunately can't fit the
bill.

Andy

**Andy O'Meara** · Oct 27 '08, 01:35 AM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 9:52 pm, "Martin v. Löwis" <mar...@v.loewi s.dewrote:

A c-level module, on the other hand, can sidestep/release
the GIL at will, and go on it's merry way and process away.

>

...Unless part of the C module execution involves the need do CPU-
bound work on another thread through a different python interpreter,
right?

>
Wrong.

Let's take a step back and remind ourselves of the big picture. The
goal is to have independent interpreters running in pthreads that the
app starts and controls. Each interpreter never at any point is doing
any thread-related stuff in any way. For example, each script job
just does meat an potatoes CPU work, using callbacks that, say,
programatically use OS APIs to edit and transform frame data.

So I think the disconnect here is that maybe you're envisioning
threads being created *in* python. To be clear, we're talking out
making threads at the app level and making it a given for the app to
take its safety in its own hands.

>

As far as I can tell, it seems
CPython's current state can't CPU bound parallelization in the same
address space.

>
That's not true.
>

Well, when you're talking about large, intricate data structures
(which include opaque OS object refs that use process-associated
allocators), even a shared memory region between the child process and
the parent can't do the job. Otherwise, please describe in detail how
I'd get an opaque OS object (e.g. an OS ref that refers to memory-
resident video) from the child process back to the parent process.

Again, the big picture that I'm trying to plant here is that there
really is a serious need for truly independent interpreters/contexts
in a shared address space. Consider stuff like libpng, zlib, ipgjpg,
or whatever, the use pattern is always the same: make a context
object, do your work in the context, and take it down. For most
industry-caliber packages, the expectation and convention (unless
documented otherwise) is that the app can make as many contexts as its
wants in whatever threads it wants because the convention is that the
app is must (a) never use one context's objects in another context,
and (b) never use a context at the same time from more than one
thread. That's all I'm really trying to look at here.

Andy

**Andy O'Meara** · Oct 27 '08, 02:05 AM

Re: 2.6, 3.0, and truly independent intepreters

And in the case of hundreds of megs of data

>
... and I would be surprised at someone that would embed hundreds of
megs of data into an object such that it had to be serialized... seems
like the proper design is to point at the data, or a subset of it, in a
big buffer. Then data transfers would just transfer the offset/length
and the reference to the buffer.
>

and/or thousands of data structure instances,

>
... and this is another surprise! You have thousands of objects (data
structure instances) to move from one thread to another?
>

I think we miscommunicated there--I'm actually agreeing with you. I
was trying to make the same point you were: that intricate and/or
large structures are meant to be passed around by a top-level pointer,
not using and serialization/messaging. This is what I've been trying
to explain to others here; that IPC and shared memory unfortunately
aren't viable options, leaving app threads (rather than child
processes) as the solution.

Of course, I know that data get large, but typical multimedia streams
are large, binary blobs. I was under the impression that processing
them usually proceeds along the lines of keeping offsets into the blobs,
and interpreting, etc. Editing is usually done by making a copy of a
blob, transforming it or a subset in some manner during the copy
process, resulting in a new, possibly different-sized blob.

Your instincts are right. I'd only add on that when you're talking
about data structures associated with an intricate video format, the
complexity and depth of the data structures is insane -- the LAST
thing you want to burn cycles on is serializing and unserializing that
stuff (so IPC is out)--again, we're already on the same page here.

I think at one point you made the comment that shared memory is a
solution to handle large data sets between a child process and the
parent. Although this is certainty true in principle, it doesn't hold
up in practice since complex data structures often contain 3rd party
and OS API objects that have their own allocators. For example, in
video encoding, there's TONS of objects that comprise memory-resident
video from all kinds of APIs, so the idea of having them allocated
from shared/mapped memory block isn't even possible. Again, I only
raise this to offer evidence that doing real-world work in a child
process is a deal breaker--a shared address space is just way too much
to give up.

Andy

**James Mills** · Oct 27 '08, 02:15 AM

Re: 2.6, 3.0, and truly independent intepreters

On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara <andy55@gmail.c omwrote:

I think we miscommunicated there--I'm actually agreeing with you. I
was trying to make the same point you were: that intricate and/or
large structures are meant to be passed around by a top-level pointer,
not using and serialization/messaging. This is what I've been trying
to explain to others here; that IPC and shared memory unfortunately
aren't viable options, leaving app threads (rather than child
processes) as the solution.

Andy,

Why don't you just use a temporary file
system (ram disk) to store the data that
your app is manipulating. All you need to
pass around then is a file descriptor.

--JamesMills

--
--
-- "Problems are solved by method"

**=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=** · Oct 27 '08, 08:15 AM

Re: 2.6, 3.0, and truly independent intepreters

Andy O'Meara wrote:

On Oct 24, 9:52 pm, "Martin v. Löwis" <mar...@v.loewi s.dewrote:

>>>A c-level module, on the other hand, can sidestep/release
>>>the GIL at will, and go on it's merry way and process away.
>>...Unless part of the C module execution involves the need do CPU-
>>bound work on another thread through a different python interpreter,
>>right?

>Wrong.

[...]

>
So I think the disconnect here is that maybe you're envisioning
threads being created *in* python. To be clear, we're talking out
making threads at the app level and making it a given for the app to
take its safety in its own hands.

No. Whether or not threads are created by Python or the application
does not matter for my "Wrong" evaluation: in either case, C module
execution can easily side-step/release the GIL.

>>As far as I can tell, it seems
>>CPython's current state can't CPU bound parallelization in the same
>>address space.

>That's not true.
>>

>
Well, when you're talking about large, intricate data structures
(which include opaque OS object refs that use process-associated
allocators), even a shared memory region between the child process and
the parent can't do the job. Otherwise, please describe in detail how
I'd get an opaque OS object (e.g. an OS ref that refers to memory-
resident video) from the child process back to the parent process.

WHAT PARENT PROCESS? "In the same address space", to me, means
"a single process only, not multiple processes, and no parent process
anywhere". If you have just multiple threads, the notion of passing
data from a "child process" back to the "parent process" is
meaningless.

Again, the big picture that I'm trying to plant here is that there
really is a serious need for truly independent interpreters/contexts
in a shared address space.

I understand that this is your mission in this thread. However, why
is that your problem? Why can't you just use the existing (limited)
multiple-interpreters machinery, and solve your problems with that?

For most
industry-caliber packages, the expectation and convention (unless
documented otherwise) is that the app can make as many contexts as its
wants in whatever threads it wants because the convention is that the
app is must (a) never use one context's objects in another context,
and (b) never use a context at the same time from more than one
thread. That's all I'm really trying to look at here.

And that's indeed the case for Python, too. The app can make as many
subinterpreters as it wants to, and it must not pass objects from one
subinterpreter to another one, nor should it use a single interpreter
from more than one thread (although that is actually supported by
Python - but it surely won't hurt if you restrict yourself to a single
thread per interpreter).

Regards,
Martin

**Rhamphoryncus** · Oct 28 '08, 09:15 AM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 26, 6:57 pm, "Andy O'Meara" <and...@gmail.c omwrote:

Grrr... I posted a ton of lengthy replies to you and other recent
posts here using Google and none of them made it, argh. Poof. There's
nothing that fires more up more than lost work, so I'll have to
revert short and simple answers for the time being. Argh, damn.
>
On Oct 25, 1:26 am, greg <g...@cosc.cant erbury.ac.nzwro te:
>
>
>

Andy O'Meara wrote:

I would definitely agree if there was a context (i.e. environment)
object passed around then perhaps we'd have the best of all worlds.

>

Moreover, I think this is probably the *only* way that
totally independent interpreters could be realized.

>

Converting the whole C API to use this strategy would be
a very big project. Also, on the face of it, it seems like
it would render all existing C extension code obsolete,
although it might be possible to do something clever with
macros to create a compatibility layer.

>

Another thing to consider is that passing all these extra
pointers around everywhere is bound to have some effect
on performance.

>
I'm with you on all counts, so no disagreement there. On the "passing
a ptr everywhere" issue, perhaps one idea is that all objects could
have an additionalfield that would point back to their parent context
(ie. their interpreter). So the only prototypes that would have to be
modified to contain the context ptr would be the ones that don't
inherently operate on objects (e.g. importing a module).

Trying to directly share objects like this is going to create
contention. The refcounting becomes the sequential portion of
Amdahl's Law. This is why safethread doesn't scale very well: I share
a massive amount of objects.

An alternative, actually simpler, is to create proxies to your real
object. The proxy object has a pointer to the real object and the
context containing it. When you call a method it serializes the
arguments, acquires the target context's GIL (while releasing yours),
and deserializes in the target context. Once the method returns it
reverses the process.

There's two reasons why this may perform well for you: First,
operations done purely in C may cheat (if so designed). A copy from
one memory buffer to another memory buffer may be given two proxies as
arguments, but then operate directly on the target objects (ie without
serialization).

Second, if a target context is idle you can enter it (acquiring its
GIL) without any context switch.

Of course that scenario is full of "maybes", which is why I have
little interest in it..

An even better scenario is if your memory buffer's methods are in pure
C and it's a simple object (no pointers). You can stick the memory
buffer in shared memory and have multiple processes manipulate it from
C. More "maybes".

An evil trick if you need pointers, but control the allocation, is to
take advantage of the fork model. Have a master process create a
bunch of blank files (temp files if linux doesn't allow /dev/zero),
mmap them all using MAP_SHARED, then fork and utilize. The addresses
will be inherited from the master process, so any pointers within them
will be usable across all processes. If you ever want to return
memory to the system you can close that file, then have all processes
use MAP_SHARED|MAP_ FIXED to overwrite it. Evil, but should be
disturbingly effective, and still doesn't require modifying CPython.

**Michael Sparks** · Oct 28 '08, 09:35 AM

Re: 2.6, 3.0, and truly independent intepreters

Glenn Linderman wrote:

so a 3rd party library might be called to decompress the stream into a
set of independently allocated chunks, each containing one frame (each
possibly consisting of several allocations of memory for associated
metadata) that is independent of other frames

We use a combination of a dictionary + RGB data for this purpose. Using a
dictionary works out pretty nicely for the metadata, and obviously one
attribute holds the frame data as a binary blob.

http://www.kamaelia.org/Components/p...Codec.YUV4MPEG gives some
idea structure and usage. The example given there is this:

Pipeline( RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
DiracDecoder(),
FrameToYUV4MPEG (),
SimpleFileWrite r("output.yuv4m peg")
).run()

Now all of those components are generator components.

That's useful since:
a) we can structure the code to show what it does more clearly, and it
still run efficiently inside a single process
b) We can change this over to using multiple processes trivially:

ProcessPipeline (
RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
DiracDecoder(),
FrameToYUV4MPEG (),
SimpleFileWrite r("output.yuv4m peg")
).run()

This version uses multiple processes (under the hood using Paul Boddies
pprocess library, since this support predates the multiprocessing module
support in python).

The big issue with *this* version however is that due to pprocess (and
friends) pickling data to be sent across OS pipes, the data throughput on
this would be lowsy. Specifically in this example, if we could change it
such that the high level API was this:

ProcessPipeline (
RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
DiracDecoder(),
FrameToYUV4MPEG (),
SimpleFileWrite r("output.yuv4m peg")
use_shared_memo ry_IPC = True,
).run()

That would be pretty useful, for some hopefully obvious reasons. I suppose
ideally we'd just use shared_memory_I PC for everything and just go back to
this:

ProcessPipeline (
RateControlledF ileReader("vide o.dirac",readmo de="bytes", ...),
DiracDecoder(),
FrameToYUV4MPEG (),
SimpleFileWrite r("output.yuv4m peg")
).run()

But essentially for us, this is an optimisation problem, not a "how do I
even begin to use this" problem. Since it is an optimisation problem, it
also strikes me as reasonable to consider it OK to special purpose and
specialise such links until you get an approach that's reasonable for
general purpose data.

In theory, poshmodule.sour ceforge.net, with a bit of TLC would be a good
candidate or good candidate starting point for that optimisation work
(since it does work in Linux, contrary to a reply in the thread - I've not
tested it under windows :).

If someone's interested in building that, then someone redoing our MiniAxon
tutorial using processes & shared memory IPC rather than generators would
be a relatively gentle/structured approach to dealing with this:

* http://www.kamaelia.org/MiniAxon/

The reason I suggest that is because any time we think about fiddling and
creating a new optimisation approach or concurrency approach, we tend to
build a MiniAxon prototype to flesh out the various issues involved.

Michael
--

index

http://www.kamaelia.org/Home

**Michael Sparks** · Oct 28 '08, 10:35 AM

Re: 2.6, 3.0, and truly independent intepreters

Philip Semanchuk wrote:

On Oct 25, 2008, at 7:53 AM, Michael Sparks wrote:

>Glenn Linderman wrote:

>>In the module multiprocessing environment could you not use shared
>>memory, then, for the large shared data items?

>>
>If the poshmodule had a bit of TLC, it would be extremely useful for
>this,... http://poshmodule.sourceforge.net/

>
Last time I checked that was Windows-only. Has that changed?

I've only tested it under Linux where it worked, but does clearly need a bit
of work :)

The only IPC modules for Unix that I'm aware of are one which I
adopted (for System V semaphores & shared memory) and one which I
wrote (for POSIX semaphores & shared memory).
>

Nikita the Spider

http://NikitaTheSpider.com/python/shm/

The ultimate life protection

http://semanchuk.com/philip/posix_ipc/

I'll take a look at those - poshmodule does need a bit of TLC and doesn't
appear to be maintained.

If anyone wants to wrap POSH cleverness around them, go for it! If
not, maybe I'll make the time someday.

I personally don't have the time do do this, but I'd be very interested in
hearing someone building an up-to-date version. (Indeed, something like
this would be extremely useful for everyone to have in the standard library
now that the multiprocessing library is in the standard library)

Michael.
--

index

http://www.kamaelia.org/Home

**Andy O'Meara** · Oct 28 '08, 02:25 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 26, 10:11 pm, "James Mills" <prolo...@short circuit.net.au>
wrote:

On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara <and...@gmail.c omwrote:

I think we miscommunicated there--I'm actually agreeing with you. I
was trying to make the same point you were: that intricate and/or
large structures are meant to be passed around by a top-level pointer,
not using and serialization/messaging. This is what I've been trying
to explain to others here; that IPC and shared memory unfortunately
aren't viable options, leaving app threads (rather than child
processes) as the solution.

>
Andy,
>
Why don't you just use a temporary file
system (ram disk) to store the data that
your app is manipulating. All you need to
pass around then is a file descriptor.
>
--JamesMills

Unfortunately, it's the penalty of serialization and unserialization .
When you're talking about stuff like memory-resident images and video
(complete with their intricate and complex codecs), then the only
option is to be passing around a couple pointers rather then take the
hit of serialization (which is huge for video, for example). I've
gone into more detail in some other posts but I could have missed
something.

Andy

**Andy O'Meara** · Oct 28 '08, 03:05 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 27, 4:05 am, "Martin v. Löwis" <mar...@v.loewi s.dewrote:

Andy O'Meara wrote:

>

Well, when you're talking about large, intricate data structures
(which include opaque OS object refs that use process-associated
allocators), even a shared memory region between the child process and
the parent can't do the job. Otherwise, please describe in detail how
I'd get an opaque OS object (e.g. an OS ref that refers to memory-
resident video) from the child process back to the parent process.

>
WHAT PARENT PROCESS? "In the same address space", to me, means
"a single process only, not multiple processes, and no parent process
anywhere". If you have just multiple threads, the notion of passing
data from a "child process" back to the "parent process" is
meaningless.

I know... I was just responding to you and others here keep beating
the "fork" drum. I just trying make it clear that a shared address
space is the only way to go. Ok, good, so we're in agreement that
threads is the only way to deal with the "intricate and complex" data
set issue in a performance-centric application.

>

Again, the big picture that I'm trying to plant here is that there
really is a serious need for truly independent interpreters/contexts
in a shared address space.

>
I understand that this is your mission in this thread. However, why
is that your problem? Why can't you just use the existing (limited)
multiple-interpreters machinery, and solve your problems with that?

Because then we're back into the GIL not permitting threads efficient
core use on CPU bound scripts running on other threads (when they
otherwise could). Just so we're on the same page, "when they
otherwise could" is relevant here because that's the important given:
that each interpreter ("context") truly never has any context with
others.

An example would be python scripts that generate video programatically
using an initial set of params and use an in-house C module to
construct frame (which in turn make and modify python C objects that
wrap to intricate codec related data structures). Suppose you wanted
to render 3 of these at the same time, one on each thread (3
threads). With the GIL in place, these threads can't anywhere close
to their potential. Your response thus far is that the C module
should release the GIL before it commences its heavy lifting. Well,
the problem is that if during its heavy lifting it needs to call back
into its interpreter. It's turns out that this isn't an exotic case
at all: there's a *ton* of utility gained by making calls back into
the interpreter. The best example is that since code more easily
maintained in python than in C, a lot of the module "utility" code is
likely to be in python. Unsurprisingly, this is the situation myself
and many others are in: where we want to subsequently use the
interpreter within the C module (so, as I understand it, the proposal
to have the C module release the GIL unfortunately doesn't work as a
general solution).

>

For most
industry-caliber packages, the expectation and convention (unless
documented otherwise) is that the app can make as many contexts as its
wants in whatever threads it wants because the convention is that the
app is must (a) never use one context's objects in another context,
and (b) never use a context at the same time from more than one
thread. That's all I'm really trying to look at here.

>
And that's indeed the case for Python, too. The app can make as many
subinterpreters as it wants to, and it must not pass objects from one
subinterpreter to another one, nor should it use a single interpreter
from more than one thread (although that is actually supported by
Python - but it surely won't hurt if you restrict yourself to a single
thread per interpreter).
>

I'm not following you there... I thought we're all in agreement that
the existing C modules are FAR from being reentrant, regularly making
use of static/global objects. The point I had made before is that
other industry-caliber packages specifically don't have restrictions
in *any* way.

I appreciate your arguments these a PyC concept is a lot of work with
some careful design work, but let's not kill the discussion just
because of that. The fact remains that the video encoding scenario
described above is a pretty reasonable situation, and as more people
are commenting in this thread, there's an increasing need to offer
apps more flexibility when it comes to multi-threaded use.

Andy

**Greg Ewing** · Oct 28 '08, 03:25 PM

Re: 2.6, 3.0, and truly independent intepreters

Glenn Linderman wrote:

So your 50% number is just a scare tactic, it would seem, based on wild
guesses. Was there really any benefit to the comment?

All I was really trying to say is that it would be a
mistake to assume that the overhead will be negligible,
as that would be just as much a wild guess as 50%.

--
Greg

**Andy O'Meara** · Oct 28 '08, 03:35 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 25, 9:46 am, "M.-A. Lemburg" <m...@egenix.co mwrote:

These discussion pop up every year or so and I think that most of them
are not really all that necessary, since the GIL isn't all that bad.
>

Thing is, if the topic keeps coming up, then that may be an indicator
that change is truly needed. Someone much wiser than me once shared
that a measure of the usefulness and quality of a package (or API) is
how easily it can be added to an application--of any flavors--without
the application needing to change.

So in the rising world of idle cores and worker threads, I do see an
increasing concern over the GIL. Although I recognize that the debate
is lengthy, heated, and has strong arguments on both sides, my reading
on the issue makes me feel like there's a bias for the pro-GIL side
because of the volume of design and coding work associated with
considering various alternatives (such as Glenn's "Py*" concepts).
And I DO respect and appreciate where the pro-GIL people come from:
who the heck wants to do all that work and recoding so that a tiny
percent of developers can benefit? And my best response is that as
unfortunate as it is, python needs to be more multi-threaded app-
friendly if we hope to attract the next generation of app developers
that want to just drop python into their app (and not have to change
their app around python). For example, Lua has that property, as
evidenced by its rapidly growing presence in commercial software
(Blizzard uses it heavily, for example).

>
Furthermore, there are lots of ways to tune the CPython VM to make
it more or less responsive to thread switches via the various sys.set*()
functions in the sys module.
>
Most computing or I/O intense C extensions, built-in modules and object
implementations already release the GIL for you, so it usually doesn't
get in the way all that often.

The main issue I take there is that it's often highly useful for C
modules to make subsequent calls back into the interpreter. I suppose
the response to that is to call the GIL before reentry, but it just
seems to be more code and responsibility in scenarios where it's no
necessary. Although that code and protocol may come easy to veteran
CPython developers, let's not forget that an important goal is to
attract new developers and companies to the scene, where they get
their thread-independent code up and running using python without any
unexpected reengineering. Again, why are companies choosing Lua over
Python when it comes to an easy and flexible drop-in interpreter? And
please take my points here to be exploratory, and not hostile or
accusatory, in nature.

Andy

**Andy O'Meara** · Oct 28 '08, 04:15 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 27, 10:55 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

And I think we still are miscommunicatin g! Or maybe communicating anyway!
>
So when you said "object", I actually don't know whether you meant
Python object or something else. I assumed Python object, which may not
have been correct... but read on, I think the stuff below clears it up.
>
>
Then when you mentioned thousands of objects, I imagined thousands of
Python objects, and somehow transforming the blob into same... and back
again.

My apologies to you and others here on my use of "objects" -- I'm use
the term generically and mean it to *not* refer to python objects (for
the all the reasons discussed here). Python only makes up a small
part of our app, hence my habit of "objects" to refer to other APIs'
allocated and opaque objects (including our own and OS APIs). For all
the reasons we've discussed, in our world, python objects don't travel
around outside of our python C modules -- when python objects need to
be passed to other parts of the app, they're converted into their non-
python (portable) equivalents (ints, floats, buffers, etc--but most of
the time, the objects are PyCObjects, so they can enter and leave a
python context with negligible overhead). I venture to say this is
pretty standard when any industry app uses a package (such as python),
for various reasons:
- Portability/Future (e.g. if we do decode to drop Python and go
with Lua, the changes are limited to only one region of code).
- Sanity (having any API's objects show up in places "far away"
goes against easy-to-follow code).
- MT flexibility (because we always never use static/global
storage, we have all kinds of options when it comes to
multithreading) . For example, recall that by throwing python in
multiple dynamic libs, we were able to achieve the GIL-less
interpreter independence that we want (albeit ghetto and a pain).

Andy

**Rhamphoryncus** · Oct 28 '08, 08:05 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 28, 9:30 am, "Andy O'Meara" <and...@gmail.c omwrote:

On Oct 25, 9:46 am, "M.-A. Lemburg" <m...@egenix.co mwrote:
>

These discussion pop up every year or so and I think that most of them
are not really all that necessary, since the GIL isn't all that bad.

>
Thing is, if the topic keeps coming up, then that may be an indicator
that change is truly needed. Someone much wiser than me once shared
that a measure of the usefulness and quality of a package (or API) is
how easily it can be added to an application--of any flavors--without
the application needing to change.
>
So in the rising world of idle cores and worker threads, I do see an
increasing concern over the GIL. Although I recognize that the debate
is lengthy, heated, and has strong arguments on both sides, my reading
on the issue makes me feel like there's a bias for the pro-GIL side
because of the volume of design and coding work associated with
considering various alternatives (such as Glenn's "Py*" concepts).
And I DO respect and appreciate where the pro-GIL people come from:
who the heck wants to do all that work and recoding so that a tiny
percent of developers can benefit? And my best response is that as
unfortunate as it is, python needs to be more multi-threaded app-
friendly if we hope to attract the next generation of app developers
that want to just drop python into their app (and not have to change
their app around python). For example, Lua has that property, as
evidenced by its rapidly growing presence in commercial software
(Blizzard uses it heavily, for example).
>
>
>

Furthermore, there are lots of ways to tune the CPython VM to make
it more or less responsive to thread switches via the various sys.set*()
functions in the sys module.

>

Most computing or I/O intense C extensions, built-in modules and object
implementations already release the GIL for you, so it usually doesn't
get in the way all that often.

>
The main issue I take there is that it's often highly useful for C
modules to make subsequent calls back into the interpreter. I suppose
the response to that is to call the GIL before reentry, but it just
seems to be more code and responsibility in scenarios where it's no
necessary. Although that code and protocol may come easy to veteran
CPython developers, let's not forget that an important goal is to
attract new developers and companies to the scene, where they get
their thread-independent code up and running using python without any
unexpected reengineering. Again, why are companies choosing Lua over
Python when it comes to an easy and flexible drop-in interpreter? And
please take my points here to be exploratory, and not hostile or
accusatory, in nature.
>
Andy

Okay, here's the bottom line:
* This is not about the GIL. This is about *completely* isolated
interpreters; most of the time when we want to remove the GIL we want
a single interpreter with lots of shared data.
* Your use case, although not common, is not extraordinarily rare
either. It'd be nice to support.
* If CPython had supported it all along we would continue to maintain
it.
* However, since it's not supported today, it's not worth the time
invested, API incompatibility , and general breakage it would imply.
* Although it's far more work than just solving your problem, if I
were to remove the GIL I'd go all the way and allow shared objects.

So there's really only two options here:
* get a short-term bodge that works, like hacking the 3rd party
library to use your shared-memory allocator. Should be far less work
than hacking all of CPython.
* invest yourself in solving the *entire* problem (GIL removal with
shared python objects).

2.6, 3.0, and truly independent intepreters

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment