2.6, 3.0, and truly independent intepreters

**=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=** · Oct 30 '08, 07:35 PM

Re: 2.6, 3.0, and truly independent intepreters

>Why do you think so? For C code that is carefully written, the GIL

>allows *very well* to write CPU bound scripts running on other threads.
>(please do get back to Jesse's original remark in case you have lost
>the thread :-)
>>

>
I don't follow you there. If you're referring to multiprocessing

No, I'm not. I refer to regular, plain, multi-threading.

>>It's turns out that this isn't an exotic case
>>at all: there's a *ton* of utility gained by making calls back into
>>the interpreter. The best example is that since code more easily
>>maintained in python than in C, a lot of the module "utility" code is
>>likely to be in python.

>You should really reconsider writing performance-critical code in
>Python.

>
I don't follow you there... Performance-critical code in Python??

I probably expressed myself incorrectly (being not a native speaker
of English): If you were writing performance-critical in Python,
you should reconsider (i.e. you should rewrite it in C).

It's not clear whether this calling back into Python is in the
performance-critical path. If it is, then reconsider.

I tried to list some abbreviated examples in other posts, but here's
some elaboration:
>
- Pixel-level effects and filters, where some filters may use C procs
while others may call back into the interpreter to execute logic --
while some do both, multiple times.

Ok. For a plain C proc, release the GIL before the proc, and reacquire
it afterwards. For a proc that calls into the interpreter:
a) if it is performance-critical, reconsider writing it in C, or
reformulate so that it stops being performance critical (e.g.
through caching)
b) else, reacquire the GIL before calling back into Python, then
release the GIL before continuing the proc

- Image and video analysis/recognition where there's TONS of intricate
data structures and logic. Those data structures and logic are
easiest to develop and maintain in python, but you'll often want to
call back to C procs which will, in turn, want to access Python (as
well as C-level) data structures.

Not sure what the processing is, or what processing you need to do.
The data structures themselves are surely not performance critical
(not being algorithms). If you really run Python algorithms on these
structures, then my approach won't help you (except for the general
recommendation to find some expensive sub-algorithm and rewrite that
in C, so that it both becomes faster and can release the GIL).

It's just not practical to be
locking and locking the GIL when you want to operate on python data
structures or call back into python.

This I don't understand. I find that fairly easy to do.

You seem to have placed the burden of proof on my shoulders for an app
to deserve the ability to free-thread when using 3rd party packages,
so how about we just agree it's not an unreasonable desire for a
package (such as python) to support it and move on with the
discussion.

Not at all - I don't want a proof. I just want agreement on Jesse
Noller's claim

# A c-level module, on the other hand, can sidestep/release
# the GIL at will, and go on it's merry way and process away.

>If neither is likely to result, killing the discussion is the most
>productive thing we can do.
>>

>
Well, most others here seem to have a lot different definition of what
qualifies as a "futile" discussion, so how about you allow the rest of
us continue to discuss these issues and possible solutions. And, for
the record, I've said multiple times I'm ready to contribute
monetarily, professionally, and personally, so if that doesn't qualify
as the precursor to "code contributions from one of the participants"
then I don't know WHAT does.

Ok, I apologize for having misunderstood you here.

Regards,
Martin

**Patrick Stinson** · Oct 30 '08, 07:55 PM

Re: 2.6, 3.0, and truly independent intepreters

On Wed, Oct 29, 2008 at 4:05 PM, Glenn Linderman <v+python@g.nev cal.comwrote:

On approximately 10/29/2008 3:45 PM, came the following characters from the
keyboard of Patrick Stinson:

>>
>If you are dealing with "lots" of data like in video or sound editing,
>you would just keep the data in shared memory and send the reference
>over IPC to the worker process. Otherwise, if you marshal and send you
>are looking at a temporary doubling of the memory footprint of your
>app because the data will be copied, and marshaling overhead.

>
Right. Sounds, and is, easy, if the data is all directly allocated by the
application. But when pieces are allocated by 3rd party libraries, that use
the C-runtime allocator directly, then it becomes more difficult to keep
everything in shared memory.

good point.

>
One _could_ replace the C-runtime allocator, I suppose, but that could have
some adverse effects on other code, that doesn't need its data to be in
shared memory. So it is somewhat between a rock and a hard place.

ewww scary. mousetraps for sale?

>
By avoiding shared memory, such problems are sidestepped... until you run
smack into the GIL.
>
--
Glenn -- http://nevcal.com/
=============== ============
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
>
>

**Glenn Linderman** · Oct 30 '08, 10:55 PM

Re: 2.6, 3.0, and truly independent intepreters

On approximately 10/30/2008 6:26 AM, came the following characters from
the keyboard of Jesse Noller:

On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman <v+python@g.nev cal.comwrote:
>

>On approximately 10/29/2008 3:45 PM, came the following characters from the
>keyboard of Patrick Stinson:
>>

>>If you are dealing with "lots" of data like in video or sound editing,
>>you would just keep the data in shared memory and send the reference
>>over IPC to the worker process. Otherwise, if you marshal and send you
>>are looking at a temporary doubling of the memory footprint of your
>>app because the data will be copied, and marshaling overhead.
>>>

>Right. Sounds, and is, easy, if the data is all directly allocated by the
>application. But when pieces are allocated by 3rd party libraries, that use
>the C-runtime allocator directly, then it becomes more difficult to keep
>everything in shared memory.
>>
>One _could_ replace the C-runtime allocator, I suppose, but that could have
>some adverse effects on other code, that doesn't need its data to be in
>shared memory. So it is somewhat between a rock and a hard place.
>>
>By avoiding shared memory, such problems are sidestepped... until you run
>smack into the GIL.
>>

>
If you do not have shared memory: You don't need threads, ergo: You
don't get penalized by the GIL. Threads are only useful when you need
to have that requirement of large in-memory data structures shared and
modified by a pool of workers.

The whole point of this thread is to talk about large in-memory data
structures that are shared and modified by a pool of workers.

My reference to shared memory was specifically referring to the concept
of sharing memory between processes... a particular OS feature that is
called shared memory.

The need for sharing memory among a pool of workers is still the
premise. Threads do that automatically, without the need for the OS
shared memory feature, that brings with it the need for a special
allocator to allocate memory in the shared memory area vs the rest of
the address space.

Not to pick on you, particularly, Jesse, but this particular response
made me finally understand why there has been so much repetition of the
same issues and positions over and over and over in this thread: instead
of comprehending the whole issue, people are responding to small
fragments of it, with opinions that may be perfectly reasonable for that
fragment, but missing the big picture, or the explanation made when the
same issue was raised in a different sub-thread.

--
Glenn -- http://nevcal.com/
=============== ============
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

**Patrick Stinson** · Oct 31 '08, 02:25 AM

Re: 2.6, 3.0, and truly independent intepreters

Speaking of the big picture, is this how it normally works when
someone says "Here's some code and a problem and I'm willing to pay
for a solution?" I've never really walked that path with a project of
this complexity (I guess it's the backwards-compatibility that makes
it confusing), but is this problem just too complex so we have to keep
talking and talking on forum after forum? Afraid to fork? I know I am.
How many people are qualified to tackle Andy's problem? Are all of
them busy or uninterested? Is the current code in a tight spot where
it just can't be fixed without really jabbing that FORK in so deep
that the patch will die when your project does?

Personally I think this problem is super-awesome on the hobbyest's fun
scale. I'd totally take the time to let my patch do the talking but I
haven't read enough of the (2.5) code. So, I resort to simply reading
the newsgroups and python code to better understand the mechanics
problem :(

On Thu, Oct 30, 2008 at 2:54 PM, Glenn Linderman <v+python@g.nev cal.comwrote:

On approximately 10/30/2008 6:26 AM, came the following characters from the
keyboard of Jesse Noller:

>>
>On Wed, Oct 29, 2008 at 8:05 PM, Glenn Linderman <v+python@g.nev cal.com>
>wrote:
>>

>>>
>>On approximately 10/29/2008 3:45 PM, came the following characters from
>>the
>>keyboard of Patrick Stinson:
>>>
>>>>
>>>If you are dealing with "lots" of data like in video or sound editing,
>>>you would just keep the data in shared memory and send the reference
>>>over IPC to the worker process. Otherwise, if you marshal and send you
>>>are looking at a temporary doubling of the memory footprint of your
>>>app because the data will be copied, and marshaling overhead.
>>>>
>>>
>>Right. Sounds, and is, easy, if the data is all directly allocated by
>>the
>>application . But when pieces are allocated by 3rd party libraries, that
>>use
>>the C-runtime allocator directly, then it becomes more difficult to keep
>>everything in shared memory.
>>>
>>One _could_ replace the C-runtime allocator, I suppose, but that could
>>have
>>some adverse effects on other code, that doesn't need its data to be in
>>shared memory. So it is somewhat between a rock and a hard place.
>>>
>>By avoiding shared memory, such problems are sidestepped... until you run
>>smack into the GIL.
>>>

>>
>If you do not have shared memory: You don't need threads, ergo: You
>don't get penalized by the GIL. Threads are only useful when you need
>to have that requirement of large in-memory data structures shared and
>modified by a pool of workers.

>
The whole point of this thread is to talk about large in-memory data
structures that are shared and modified by a pool of workers.
>
My reference to shared memory was specifically referring to the concept of
sharing memory between processes... a particular OS feature that is called
shared memory.
>
The need for sharing memory among a pool of workers is still the premise.
Threads do that automatically, without the need for the OS shared memory
feature, that brings with it the need for a special allocator to allocate
memory in the shared memory area vs the rest of the address space.
>
Not to pick on you, particularly, Jesse, but this particular response made
me finally understand why there has been so much repetition of the same
issues and positions over and over and over in this thread: instead of
comprehending the whole issue, people are responding to small fragments of
it, with opinions that may be perfectly reasonable for that fragment, but
missing the big picture, or the explanation made when the same issue was
raised in a different sub-thread.
>
--
Glenn -- http://nevcal.com/
=============== ============
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
>
>

**Rhamphoryncus** · Oct 31 '08, 02:45 AM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 30, 8:23 pm, "Patrick Stinson" <patrickstinson .li...@gmail.co m>
wrote:

Speaking of the big picture, is this how it normally works when
someone says "Here's some code and a problem and I'm willing to pay
for a solution?" I've never really walked that path with a project of
this complexity (I guess it's the backwards-compatibility that makes
it confusing), but is this problem just too complex so we have to keep
talking and talking on forum after forum? Afraid to fork? I know I am.
How many people are qualified to tackle Andy's problem? Are all of
them busy or uninterested? Is the current code in a tight spot where
it just can't be fixed without really jabbing that FORK in so deep
that the patch will die when your project does?
>
Personally I think this problem is super-awesome on the hobbyest's fun
scale. I'd totally take the time to let my patch do the talking but I
haven't read enough of the (2.5) code. So, I resort to simply reading
the newsgroups and python code to better understand the mechanics
problem :(

The scale of this issue is why so little progress gets made, yes. I
intend to solve it regardless of getting paid (and have been working
on various aspects for quite a while now), but as you can see from
this thread it's very difficult to convince anybody that my approach
is the *right* approach.

**alex23** · Oct 31 '08, 04:15 AM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 31, 2:05 am, "Andy O'Meara" <and...@gmail.c omwrote:

I don't follow you there. If you're referring to multiprocessing , our
concerns are:
>
- Maturity (am I willing to tell my partners and employees that I'm
betting our future on a brand-new module that imposes significant
restrictions as to how our app operates?)
- Liability (am I ready to invest our resources into lots of new
python module-specific code to find out that a platform that we want
to target isn't supported or has problems?). Like it not, we're a
company and we have to show sensitivity about new or fringe packages
that make our codebase less agile -- C/C++ continues to win the day in
that department.

I don't follow this...wouldn't both of these concerns be even more
true for modifying the CPython interpreter to provide the
functionality you want?

**greg** · Oct 31 '08, 08:55 AM

Re: 2.6, 3.0, and truly independent intepreters

Patrick Stinson wrote:

Speaking of the big picture, is this how it normally works when
someone says "Here's some code and a problem and I'm willing to pay
for a solution?"

In an open-source volunteer context, time is generally more
valuable than money. Most people can't just drop part of
their regular employment temporarily, so unless there's
quite a *lot* of money being offered (enough to offer someone
full-time employment, for example) it doesn't necessarily
make any more man-hours available.

--
Greg

**lkcl** · Nov 4 '08, 01:55 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 30, 6:39 pm, Terry Reedy <tjre...@udel.e duwrote:

Their professor is Lars Bak, the lead architect of the Google V8Javascripteng ine. They spent some time working on V8 in the last couple
months.

then they will be at home with pyv8 - which is a combination of the
pyjamas python-to-javascript compiler and google's v8 engine.

in pyv8, thanks to v8 (and the judicious application of boost) it's
possible to call out to external c-based modules.

so not only do you get the benefits of the (much) faster execution
speed of v8, along with its garbage collection, but also you still get
access to external modules.

so... their project's done, already!

l.

**sturlamolden** · Nov 4 '08, 03:05 PM

Re: 2.6, 3.0, and truly independent intepreters

If you are serious about multicore programming, take a look at:

http://www.cilk.com/

Now if we could make Python do something like that, people would
perhaps start to think about writing Python programs for more than one
processor.

**Andy O'Meara** · Nov 4 '08, 03:35 PM

Re: 2.6, 3.0, and truly independent intepreters

On Nov 4, 9:38 am, sturlamolden <sturlamol...@y ahoo.nowrote:

First let me say that there are several solutions to the "multicore"
problem. Multiple independendent interpreters embedded in a process is
one possibility, but not the only.''

No one is disagrees there. However, motivation of this thread has
been to make people here consider that it's much more preferable for
CPython have has few restrictions as possible with how it's used. I
think many people here assume that python is the showcase item in
industrial and commercial use, but it's generally just one of many
pieces of machinery that serve the app's function (so "the tail can't
wag the dog" when it comes to app design). Some people in this thread
have made comments such as "make your app run in python" or "change
your app requirements" but in the world of production schedules and
making sure payroll is met, those options just can't happen. People
in the scientific and academic communities have to understand that the
dynamics in commercial software are can be *very* different needs and
have to show some open-mindedness there.

The multiprocessing package has almost the same API as you would get
from your suggestion, the only difference being that multiple
processes is involved.

As other posts have gone into extensive detail, multiprocessing
unfortunately don't handle the massive/complex data structures
situation (see my posts regarding real-time video processing). I'm
not sure if you've followed all the discussion, but multiple processes
is off the table (this is discussed at length, so just flip back into
the thread history).

Andy

**sturlamolden** · Nov 4 '08, 04:05 PM

Re: 2.6, 3.0, and truly independent intepreters

On Nov 4, 4:27 pm, "Andy O'Meara" <and...@gmail.c omwrote:

People
in the scientific and academic communities have to understand that the
dynamics in commercial software are can be *very* different needs and
have to show some open-mindedness there.

You are beware that BDFL's employer is a company called Google? Python
is not just used in academic settings.

Furthermore, I gave you a link to cilk++. This is a simple tool that
allows you to parallelize existing C or C++ software using three small
keywords. This is the kind of tool I believe would be useful. That is
not an academic judgement. It makes it easy to take existing software
and make it run efficiently on multicore processors.

As other posts have gone into extensive detail, multiprocessing
unfortunately don't handle the massive/complex data structures
situation (see my posts regarding real-time video processing).

That is something I don't believe. Why can't multiprocessing handle
that? Is using a proxy object out of the question? Is putting the
complex object in shared memory out of the question? Is having
multiple copies of the object out of the question (did you see my kd-
tree example)? Using multiple independent interpreters inside a
process does not make this any easier. For Christ sake, researchers
write global climate models using MPI. And you think a toy problem
like 'real-time video processing' is a show stopper for using multiple
processes.

**Paul Boddie** · Nov 4 '08, 05:55 PM

Re: 2.6, 3.0, and truly independent intepreters

On 4 Nov, 16:00, sturlamolden <sturlamol...@y ahoo.nowrote:

If you are serious about multicore programming, take a look at:
>

http://www.cilk.com/

>
Now if we could make Python do something like that, people would
perhaps start to think about writing Python programs for more than one
processor.

The language features look a lot like what others have already been
offering for a while: keywords for parallelised constructs (clik_for)
which are employed by solutions for various languages (C# and various C
++ libraries spring immediately to mind); spawning and synchronisation
are typically supported in existing Python solutions, although
obviously not using language keywords. The more interesting aspects of
the referenced technology seem to be hyperobjects which, as far as I
can tell, are shared global objects, along with the way the work
actually gets distributed and scheduled - something which would
require slashing through the white paper aspects of the referenced
site and actually reading the academic papers associated with the
work.

I've considered doing something like hyperobjects for a while, and
this does fit in somewhat with recent discussions about shared memory
and managing contention for that resource using the communications
channels found in, amongst other solutions, the pprocess module. I
currently have no real motivation to implement this myself, however.

Paul

**Andy O'Meara** · Nov 5 '08, 07:45 PM

Re: 2.6, 3.0, and truly independent intepreters

On Nov 4, 10:59 am, sturlamolden <sturlamol...@y ahoo.nowrote:

On Nov 4, 4:27 pm, "Andy O'Meara" <and...@gmail.c omwrote:
>

People
in the scientific and academic communities have to understand that the
dynamics in commercial software are can be *very* different needs and
have to show some open-mindedness there.

>
You are beware that BDFL's employer is a company called Google? Python
is not just used in academic settings.

Turns out I have heard of Google (and how about you be a little more
courteous). If you've read the posts in this thread, you'll note that
the needs outlined in this thread are quite different than the needs
and interests of Google. Note that my point was that python *could*
and *should* be used more in end-user/desktop applications, but it
can't "wag the dog" to use my earlier statement.

>
Furthermore, I gave you a link to cilk++. This is a simple tool that
allows you to parallelize existing C or C++ software using three small
keywords.

Sorry if it wasn't clear, but we need the features associated with an
embedded interpreter. I checked out clik++ when you linked it and
although it seems pretty cool, it's not a good fit for us for a number
of reasons. Also, we like the idea of helping support a FOSS project
rather than license a proprietary product (again, to be clear, using
cilk isn't even appropriate for our situation).

As other posts have gone into extensive detail, multiprocessing
unfortunately don't handle the massive/complex data structures
situation (see my posts regarding real-time video processing).

>
That is something I don't believe. Why can't multiprocessing handle
that?

In a few earlier posts, I went into details what's meant there:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/9d995e4a1153a1b2/09aaca3d94ee7a04?lnk=st#09aaca3d94ee7a04

http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344

http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b

For Christ sake, researchers
write global climate models using MPI. And you think a toy problem
like 'real-time video processing' is a show stopper for using multiple
processes.

I'm not sure why you're posting this sort of stuff when it seems like
you haven't checked out earlier posts in the this thread. Also, you
do yourself and the people here a disservice in the way that you're
speaking to me here. You never know who you're really talking to or
who's reading.

Andy

**Paul Boddie** · Nov 5 '08, 10:15 PM

Re: 2.6, 3.0, and truly independent intepreters

On 5 Nov, 20:44, "Andy O'Meara" <and...@gmail.c omwrote:

On Nov 4, 10:59 am, sturlamolden <sturlamol...@y ahoo.nowrote:
>

For Christ sake, researchers
write global climate models using MPI. And you think a toy problem
like 'real-time video processing' is a show stopper for using multiple
processes.

>
I'm not sure why you're posting this sort of stuff when it seems like
you haven't checked out earlier posts in the this thread. Also, you
do yourself and the people here a disservice in the way that you're
speaking to me here. You never know who you're really talking to or
who's reading.

I think your remarks about "people in the scientific and academic
communities" went down the wrong way, giving (or perhaps reinforcing)
the impression that such people live carefree lives and write software
unconstrained by external factors.

Anyway, to keep things constructive, I should ask (again) whether you
looked at tinypy [1] and whether that might possibly satisfy your
embedded requirements. As I noted before, the developers might share
your outlook on a number of matters. Otherwise, you might peruse the
list of Python implementations :

PythonImplementations - Python Wiki

http://wiki.python.org/moin/implementation

Paul

[1] http://www.tinypy.org/

**sturlamolden** · Nov 6 '08, 01:35 PM

Re: 2.6, 3.0, and truly independent intepreters

On Nov 5, 8:44 pm, "Andy O'Meara" <and...@gmail.c omwrote:

In a few earlier posts, I went into details what's meant there:
>

Error 400 (Bad Request)!!1

http://groups.google.com/group/comp.lang.python/browse_thread/thread/...http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b

>

All this says is:

1. The cost of serialization and deserialization is to large.
2. Complex data structures cannot be placed in shared memory.

The first claim is unsubstantiated . It depends on how much and what
you serialize. If you use something like NumPy arrays, the cost of
pickling is tiny. Erlang is a language specifically designed for
concurrent programming, yet it does not allow anything to be shared.

The second claim is plain wrong. You can put anything you want in
shared memory. The mapping address of the shared memory segment may
vary, but it can be dealt with (basically use integers instead of
pointers, and use the base address as offset.) Pyro is a Python
project that has investigated this. With Pyro you can put any Python
object in a shared memory region. You can also use NumPy record arrays
to put very complex data structures in shared memory.

What do you gain by placing multiple interpreters in the same process?
You will avoid the complication that the mapping address of the shared
memory region may be different. But this is a problem that has been
worked out and solved. Instead you get a lot of issues dealing with
DLL loading and unloading (Python extension objects).

The multiprocessing module has something called proxy objects, which
also deals with this issue. An object is hosed in a server process,
and client processes may access it through synchronized IPC calls.
Inside the client process the remote object looks like any other
Python object. The synchronized IPC is hidden away in an abstraction
layer. In Windows, you can also construct outproc ActiveX objects,
which are not that different from multiprocessing 's proxy objects.

If you need to place a complex object in shared memory:

1. Check if a NumPy record array may suffice (dtypes may be nested).
It will if you don't have dynamically allocated pointers inside the
data structure.

2. Consider using multiprocessing 's proxy objects or outproc ActiveX
objects.

3. Go to http://pyro.sourceforge.net, download the code and read the
documentation.

Saying that "it can't be done" is silly before you have tried.
Programmers are not that good at guessing where the bottlenecks
reside, even if we think we do.

2.6, 3.0, and truly independent intepreters

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment