2.6, 3.0, and truly independent intepreters

**Jesse Noller** · Oct 24 '08, 07:55 PM

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 3:17 PM, Andy O'Meara <andy55@gmail.c omwrote:

I'm a lousy writer sometimes, but I feel bad if you took the time to
describe threads vs processes. The only reason I raised IPC with my
"messaging isn't very attractive" comment was to respond to Glenn
Linderman's points regarding tradeoffs of shared memory vs no.
>

I actually took the time to bring anyone listening in up to speed, and
to clarify so I could better understand your use case. Don't feel bad,
things in the thread are moving fast and I just wanted to clear it up.

Ideally, we all want to improve the language, and the interpreter.
However trying to push it towards a particular use case is dangerous
given the idea of "general use".

-jesse

**Rhamphoryncus** · Oct 24 '08, 08:15 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 1:02 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

On approximately 10/24/2008 8:42 AM, came the following characters from
the keyboard of Andy O'Meara:
>

Glenn, great post and points!

>
Thanks. I need to admit here that while I've got a fair bit of
professional programming experience, I'm quite new to Python -- I've not
learned its internals, nor even the full extent of its rich library. So
I have some questions that are partly about the goals of the
applications being discussed, partly about how Python is constructed,
and partly about how the library is constructed. I'm hoping to get a
better understanding of all of these; perhaps once a better
understanding is achieved, limitations will be understood, and maybe
solutions be achievable.
>
Let me define some speculative Python interpreters; I think the first is
today's Python:
>
PyA: Has a GIL. PyA threads can run within a process; but are
effectively serialized to the places where the GIL is obtained/released.
Needs the GIL because that solves lots of problems with non-reentrant
code (an example of non-reentrant code, is code that uses global (C
global, or C static) variables – note that I'm not talking about Python
vars declared global... they are only module global). In this model,
non-reentrant code could include pieces of the interpreter, and/or
extension modules.
>
PyB: No GIL. PyB threads acquire/release a lock around each reference to
a global variable (like "with" feature). Requires massive recoding of
all code that contains global variables. Reduces performance
significantly by the increased cost of obtaining and releasing locks.
>
PyC: No locks. Instead, recoding is done to eliminate global variables
(interpreter requires a state structure to be passed in). Extension
modules that use globals are prohibited... this eliminates large
portions of the library, or requires massive recoding. PyC threads do
not share data between threads except by explicit interfaces.
>
PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
global variables, and each interpreter instance is provided a state
structure. There is still a GIL, however, because globals are
potentially still used by some modules. Code is added to detect use of
global variables by a module, or some contract is written whereby a
module can be declared to be reentrant and global-free. PyA threads will
obtain the GIL as they would today. PyC threads would be available to be
created. PyC instances refuse to call non-reentrant modules, but also
need not obtain the GIL... PyC threads would have limited module support
initially, but over time, most modules can be migrated to be reentrant
and global-free, so they can be used by PyC instances. Most 3rd-party
libraries today are starting to care about reentrancy anyway, because of
the popularity of threads.

PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. Most other
shareable objects are immutable. Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features. Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.

[...]

As a python language fan an enthusiast, don't let lua win! (I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).

I agree with the problem, and desire to make python fill all niches,
but let's just say I'm more ambitious with my solution. ;)

**Andy O'Meara** · Oct 24 '08, 08:55 PM

Re: 2.6, 3.0, and truly independent intepreters

Another great post, Glenn!! Very well laid-out and posed!! Thanks for
taking the time to lay all that out.

>
Questions for Andy: is the type of work you want to do in independent
threads mostly pure Python? Or with libraries that you can control to
some extent? Are those libraries reentrant? Could they be made
reentrant? How much of the Python standard library would need to be
available in reentrant mode to provide useful functionality for those
threads? I think you want PyC
>

I think you've defined everything perfectly, and you're you're of
course correct about my love for for the PyC model. :^)

Like any software that's meant to be used without restrictions, our
code and frameworks always use a context object pattern so that
there's never and non-const global/shared data). I would go as far to
say that this is the case with more performance-oriented software than
you may think since it's usually a given for us to have to be parallel
friendly in as many ways as possible. Perhaps Patrick can back me up
there.

As to what modules are "essential" ... As you point out, once
reentrant module implementations caught on in PyC or hybrid world, I
think we'd start to see real effort to whip them into compliance--
there's just so much to be gained imho. But to answer the question,
there's the obvious ones (operator, math, etc), string/buffer
processing (string, re), C bridge stuff (struct, array), and OS basics
(time, file system, etc). Nice-to-haves would be buffer and image
decompression (zlib, libpng, etc), crypto modules, and xml. As far as
I can imagine, I have to believe all of these modules already contain
little, if any, global data, so I have to believe they'd be super easy
to make "PyC happy". Patrick, what would you see you guys using?

That's the rub... In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space. The same argument holds for numerical processing with
large data sets. The workers handing back huge data sets via
messaging isn't very attractive.

>
In the module multiprocessing environment could you not use shared
memory, then, for the large shared data items?
>

As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP). Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something). Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC. The former is just WAY less
code, plain and simple.

Andy

**Jesse Noller** · Oct 24 '08, 09:05 PM

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 4:51 PM, Andy O'Meara <andy55@gmail.c omwrote:

>In the module multiprocessing environment could you not use shared
>memory, then, for the large shared data items?
>>

>
As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP). Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something). Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC. The former is just WAY less
code, plain and simple.
>

Are you familiar with the API at all? Multiprocessing was designed to
mimic threading in about every way possible, the only restriction on
shared data is that it must be serializable, but event then you can
override or customize the behavior.

Also, inter process communication is done via pipes. It can also be
done with messages if you want to tweak the manager(s).

-jesse

**Rhamphoryncus** · Oct 24 '08, 09:15 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 2:59 pm, Glenn Linderman <gl...@nevcal.c omwrote:

On approximately 10/24/2008 1:09 PM, came the following characters from
the keyboard of Rhamphoryncus:

PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable. A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability. Most other
shareable objects are immutable. Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties. Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

>
Hmm. So I think your PyE is an instance is an attempt to be more
explicit about what I said above in PyC: PyC threads do not share data
between threads except by explicit interfaces. I consider your
definitions of shared data types somewhat orthogonal to the types of
threads, in that both PyA and PyC threads could use these new shared
data items.

Unlike PyC, there's a *lot* shared by default (classes, modules,
function), but it requires only minimal recoding. It's as close to
"have your cake and eat it too" as you're gonna get.

I think/hope that you meant that "many types are now only allowed to be
non-shareable"? At least, I think that should be the default; they
should be within the context of a single, independent interpreter
instance, so other interpreters don't even know they exist, much less
how to share them. If so, then I understand most of the rest of your
paragraph, and it could be a way of providing shared objects, perhaps.

There aren't multiple interpreters under my model. You only need
one. Instead, you create a monitor, and run a thread on it. A list
is not shareable, so it can only be used within the monitor it's
created within, but the list type object is shareable.

I've no interest in *requiring* a C/C++ extension to communicate
between isolated interpreters. Without that they're really no better
than processes.

**Rhamphoryncus** · Oct 24 '08, 09:25 PM

Re: 2.6, 3.0, and truly independent intepreters

On Oct 24, 3:02 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:

On approximately 10/23/2008 2:24 PM, came the following characters from the
keyboard of Rhamphoryncus:

>>
>On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
>>

>>>
>>On approximately 10/23/2008 12:24 AM, came the following characters from
>>the keyboard of Christian Heimes
>>>>
>>>Andy wrote:
>>>I'm very - not absolute, but very - sure that Guido and the initial
>>>designers of Python would have added the GIL anyway. The GIL makes
>>>Python faster on single core machines and more stable on multi core
>>>machines.

>
Actually, the GIL doesn't make Python faster; it is a design decision that
reduces the overhead of lock acquisition, while still allowing use of global
variables.
>
Using finer-grained locks has higher run-time cost; eliminating the use of
global variables has a higher programmer-time cost, but would actually run
faster and more concurrently than using a GIL. Especially on a
multi-core/multi-CPU machine.

Those "globals" include classes, modules, and functions. You can't
have *any* objects shared. Your interpreters are entirely isolated,
much like processes (and we all start wondering why you don't use
processes in the first place.)

Or use safethread. It imposes safe semantics on shared objects, so
you can keep your global classes, modules, and functions. Still need
garbage collection though, and on CPython that means refcounting and
the GIL.

>Another peeve I have is his characterizatio n of the observer pattern.
>The generalized form of the problem exists in both single-threaded
>sequential programs, in the form of unexpected reentrancy, and message
>passing, with infinite CPU usage or infinite number of pending
>messages.
>>

>
So how do you get reentrancy is a single-threaded sequential program? I
think only via recursion? Which isn't a serious issue for the observer
pattern. If you add interrupts, then your program is no longer sequential..

Sorry, I meant recursion. Why isn't it a serious issue for
single-threaded programs? Just the fact that it's much easier to
handle when it does happen?

>Try looking at it on another level: when your CPU wants to read from a
>bit of memory controlled by another CPU it sends them a message
>requesting they get it for us. They send back a message containing
>that memory. They also note we have it, in case they want to modify
>it later. We also note where we got it, in case we want to modify it
>(and not wait for them to do modifications for us).
>>

>
I understand that level... one of my degrees is in EE, and I started college
wanting to design computers (at about the time the first microprocessor chip
came along, and they, of course, have now taken over). But I was side-lined
by the malleability of software, and have mostly practiced software during
my career.
>
Anyway, that is the level that Herb Sutter was describing in the Dr Dobbs
articles I mentioned. And the overhead of doing that at the level of a cache
line is high, if there is lots of contention for particular memory locations
between threads running on different cores/CPUs. So to achieve concurrency,
you must not only limit explicit software locks, but must also avoid memory
layouts where data needed by different cores/CPUs are in the same cache
line.

I suspect they'll end up redesigning the caching to use a size and
alignment of 64 bits (or smaller). Same cache line size, but with
masking.

You still need to minimize contention of course, but that should at
least be more predictable. Having two unrelated mallocs contend could
suck.

>Message passing vs shared memory isn't really a yes/no question. It's
>about ratios, usage patterns, and tradeoffs. *All* programs will
>share data, but in what way? If it's just the code itself you can
>move the cache validation into software and simplify the CPU, making
>it faster. If the shared data is a lot more than that, and you use it
>to coordinate accesses, then it'll be faster to have it in hardware.
>>

>
I agree there are tradeoffs... unfortunately, the hardware architectures
vary, and the languages don't generally understand the hardware. So then it
becomes an OS API, which adds the overhead of an OS API call to the cost of
the synchronization ... It could instead be (and in clever applications is) a
non-portable assembly level function that wraps on OS locking or waiting
API.

In practice I highly doubt we'll see anything that doesn't extend
traditional threading (posix threads, whatever MS has, etc).

Nonetheless, while putting the shared data accesses in hardware might be
more efficient per unit operation, there are still tradeoffs: A software
solution can group multiple accesses under a single lock acquisition; the
hardware probably doesn't have enough smarts to do that. So it may well
require many more hardware unit operations for the same overall concurrently
executed function, and the resulting performance may not be any better.

Speculative ll/sc? ;)

Sidestepping the whole issue, by minimizing shared data in the application
design, avoiding not only software lock calls, and hardware cache
contention, is going to provide the best performance... it isn't the things
you do efficiently that make software fast — it is the things you don'tdo
at all.

Minimizing contention, certainly. Minimizing the shared data itself
is iffier though.

**Adam Olsen** · Oct 25 '08, 01:05 AM

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 4:48 PM, Glenn Linderman <v+python@g.nev cal.comwrote:

On approximately 10/24/2008 2:15 PM, came the following characters from the
keyboard of Rhamphoryncus:

>>
>On Oct 24, 2:59 pm, Glenn Linderman <gl...@nevcal.c omwrote:
>>

>>>
>>On approximately 10/24/2008 1:09 PM, came the following characters from
>>the keyboard of Rhamphoryncus:
>>>
>>>>
>>>PyE: objects are reclassified as shareable or non-shareable, many
>>>types are now only allowed to be shareable. A module and its classes
>>>become shareable with the use of a __future__ import, and their
>>>shareddict uses a read-write lock for scalability. Most other
>>>shareable objects are immutable. Each thread is run in its own
>>>private monitor, and thus protected from the normal threading memory
>>>module nasties. Alas, this gives you all the semantics, but you still
>>>need scalable garbage collection.. and CPython's refcounting needs the
>>>GIL.
>>>>
>>>
>>Hmm. So I think your PyE is an instance is an attempt to be more
>>explicit about what I said above in PyC: PyC threads do not share data
>>between threads except by explicit interfaces. I consider your
>>definitions of shared data types somewhat orthogonal to the types of
>>threads, in that both PyA and PyC threads could use these new shared
>>data items.
>>>

>>
>Unlike PyC, there's a *lot* shared by default (classes, modules,
>function), but it requires only minimal recoding. It's as close to
>"have your cake and eat it too" as you're gonna get.
>>

>
Yes, but I like my cake frosted with performance; Guido's non-acceptance of
granular locks in the blog entry someone referenced was due to the slowdown
acquired with granular locking and shared objects. Your PyE model, with
highly granular sharing, will likely suffer the same fate.

No, my approach includes scalable performance. Typical paths will
involve *no* contention (ie no locking). classes and modules use
shareddict, which is based on a read-write lock built into the
interpreter, so it's uncontended for read-only usage patterns. Pretty
much everything else is immutable.

Of course that doesn't include the cost of garbage collection.
CPython's refcounting can't scale.

The independent threads model, with only slight locking for a few explicitly
shared objects, has a much better chance of getting better performance
overall. With one thread running, it would be the same as today; with
multiple threads, it should scale at the same rate as the system... minus
any locking done at the higher level.

So use processes with a little IPC for these expensive-yet-"shared"
objects. multiprocessing does it already.

>>I think/hope that you meant that "many types are now only allowed to be
>>non-shareable"? At least, I think that should be the default; they
>>should be within the context of a single, independent interpreter
>>instance, so other interpreters don't even know they exist, much less
>>how to share them. If so, then I understand most of the rest of your
>>paragraph, and it could be a way of providing shared objects, perhaps.
>>>

>>
>There aren't multiple interpreters under my model. You only need
>one. Instead, you create a monitor, and run a thread on it. A list
>is not shareable, so it can only be used within the monitor it's
>created within, but the list type object is shareable.
>>

>
The python interpreter code should be sharable, having been written in C,
and being/becoming reentrant. So in that sense, there is only one
interpreter. Similarly, any other reentrant C extensions would be that way.
On the other hand, each thread of execution requires its own interpreter
context, so that would have to be independent for the threads to be
independent. It is the combination of code+context that I call an
interpreter, and there would be one per thread for PyC threads. Bytecode
for loaded modules could potentially be shared, if it is also immutable.
However, that could be in my mental "phase 2", as it would require an extra
level of complexity in the interpreter as it creates shared bytecode...
there would be a memory savings from avoiding multiple copies of shared
bytecode, likely, and maybe also a compilation performance savings. So it
sounds like a win, but it is a win that can deferred for initial simplicity,
to prove the concept is or is not workable.
>
A monitor allows a single thread to run at a time; that is the same
situation as the present GIL. I guess I don't fully understand your model.

To use your terminology, each monitor is a context. Each thread
operates in a different monitor. As you say, most C functions are
already thread-safe (reentrant). All I need to do is avoid letting
multiple threads modify a single mutable object (such as a list) at a
time, which I do by containing it within a single monitor (context).

--
Adam Olsen, aka Rhamphoryncus

**Adam Olsen** · Oct 25 '08, 01:15 AM

Re: 2.6, 3.0, and truly independent intepreters

On Fri, Oct 24, 2008 at 5:38 PM, Glenn Linderman <v+python@g.nev cal.comwrote:

On approximately 10/24/2008 2:16 PM, came the following characters from the
keyboard of Rhamphoryncus:

>>
>On Oct 24, 3:02 pm, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
>>

>>>
>>On approximately 10/23/2008 2:24 PM, came the following characters from
>>the
>>keyboard of Rhamphoryncus:
>>>
>>>>
>>>On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@g.nev cal.comwrote:
>>>>
>>>>
>>>>>
>>>>On approximately 10/23/2008 12:24 AM, came the following characters
>>>>from
>>>>the keyboard of Christian Heimes
>>>>>
>>>>>>
>>>>>Andy wrote:
>>>>>I'm very - not absolute, but very - sure that Guido and the initial
>>>>>designer s of Python would have added the GIL anyway. The GIL makes
>>>>>Python faster on single core machines and more stable on multi core
>>>>>machines .
>>>>>>
>>>
>>Actually, the GIL doesn't make Python faster; it is a design decision
>>that
>>reduces the overhead of lock acquisition, while still allowing use of
>>global
>>variables.
>>>
>>Using finer-grained locks has higher run-time cost; eliminating the use
>>of
>>global variables has a higher programmer-time cost, but would actually
>>run
>>faster and more concurrently than using a GIL. Especially on a
>>multi-core/multi-CPU machine.
>>>

>>
>Those "globals" include classes, modules, and functions. You can't
>have *any* objects shared. Your interpreters are entirely isolated,
>much like processes (and we all start wondering why you don't use
>processes in the first place.)
>>

>
Indeed; isolated, independent interpreters are one of the goals. It is,
indeed, much like processes, but in a single address space. It allows the
master process (Python or C for the embedded case) to be coded using memory
references and copies and pointer swaps instead of using semaphores, and
potentially multi-megabyte message transfers.
>
It is not clear to me that with the use of shared memory between processes,
that the application couldn't use processes, and achieve many of the same
goals. On the other hand, the code to create and manipulate processes and
shared memory blocks is harder to write and has more overhead than the code
to create and manipulate threads, which can, when told, access any memory
block in the process. This allows the shared memory to be resized more
easily, or more blocks of shared memory created more easily. On the other
hand, the creation of shared memory blocks shouldn't be a high-use operation
in a program that has sufficient number crunching to do to be able to
consume multiple cores/CPUs.
>

>Or use safethread. It imposes safe semantics on shared objects, so
>you can keep your global classes, modules, and functions. Still need
>garbage collection though, and on CPython that means refcounting and
>the GIL.
>>

>
Sounds like safethread has 35-40% overhead. Sounds like too much, to me.

The specific implementation of safethread, which attempts to remove
the GIL from CPython, has significant overhead and had very limited
success at being scalable.

The monitor design proposed by safethread has no inherent overhead and
is completely scalable.

--
Adam Olsen, aka Rhamphoryncus

**=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=** · Oct 25 '08, 01:55 AM

Re: 2.6, 3.0, and truly independent intepreters

>A c-level module, on the other hand, can sidestep/release

>the GIL at will, and go on it's merry way and process away.

>
...Unless part of the C module execution involves the need do CPU-
bound work on another thread through a different python interpreter,
right?

Wrong.

(even if the interpreter is 100% independent, yikes).

Again, wrong.

For
example, have a python C module designed to programmaticall y generate
images (and video frames) in RAM for immediate and subsequent use in
animation. Meanwhile, we'd like to have a pthread with its own
interpreter with an instance of this module and have it dequeue jobs
as they come in (in fact, there'd be one of these threads for each
excess core present on the machine).

I don't understand how this example involves multiple threads. You
mention a single thread (running the module), and you mention designing
a module. Where is the second thread?

Let's assume there is another thread producing jobs, and then
a thread that generates the images. The structure would be this

while 1:
job = queue.get()
processing_modu le.process(job)

and in process:

PyArg_ParseTupl e(args, "s", job_data);
result = PyString_New(bu fsize);
buf = PyString_AsStri ng(result);
Py_BEGIN_ALLOW_ THREADS
compute_frame(j ob_data, buf);
Py_END_ALLOW_TH READS
return PyString_FromSt ring(buf);

All these compute_frames could happily run in parallel.

As far as I can tell, it seems
CPython's current state can't CPU bound parallelization in the same
address space.

That's not true.

Regards,
Martin

**=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=** · Oct 25 '08, 01:55 AM

Re: 2.6, 3.0, and truly independent intepreters

It seems to me that the very simplest move would be to remove global

static data so the app could provide all thread-related data, which
Andy suggests through references to the QuickTime API. This would
suggest compiling python without thread support so as to leave it up
to the application.

I'm not sure whether you realize that this is not simple at all.
Consider this fragment

if (string == Py_None || index >= state->lastmark ||
!state->mark[index] || !state->mark[index+1]) {
if (empty)
/* want empty string */
i = j = 0;
else {
Py_INCREF(Py_No ne);
return Py_None;

Py_None here is a global variable. How would you replace it?
It's used in thousands of places.

For another example, consider

PyErr_SetString (PyExc_ValueErr or,
"Empty module name");
or

dp = PyObject_New(db mobject, &Dbmtype);

There are tons of different variables denoting exceptions and
other types which all somehow need to be rewritten (likely with
undesirable effects on readability).

So I don't think that this is a simple solution. It's the right
one, but it will take five or ten years to implement.

Regards,
Martin

**Terry Reedy** · Oct 25 '08, 03:45 AM

Re: 2.6, 3.0, and truly independent intepreters

Glenn Linderman wrote:

For example, Python presently has a rather stupid algorithm for string
concatenation.

Python the language has syntax and semantics. Python implementations
have algorithms that fulfill the defined semantics.

It allocates only the exactly necessary space for the
concatenated string. This is a brilliant move, when you realize that
strings are immutable, and once allocated can never change, but the
operation
>
for line in mylistofstrings :
string = string + line
>
is basically O(N-squared) as a result. The better algorithm would
double the size of memory allocated for string each time there is not
enough room to add the next line, and that reduces the cost of the
algorithm to O(N).

If there is more than one reference to a guaranteed immutable object,
such as a string, the 'stupid' algorithm seem necessary to me. In-place
modification of a shared immutable would violate semantics.

However, if you do

string = ''
for line in strings:
string =+ line

so that there is only one reference and you tell the interpreter that
you don't mind the old value being updated, then I believe in 2.6, if
not before, CPython does overallocation and in-place extension. (I am
not sure about s=s+l.) But this is just ref-counted CPython.

Terry Jan Reedy

**greg** · Oct 25 '08, 05:35 AM

Re: 2.6, 3.0, and truly independent intepreters

Andy O'Meara wrote:

I would definitely agree if there was a context (i.e. environment)
object passed around then perhaps we'd have the best of all worlds.

Moreover, I think this is probably the *only* way that
totally independent interpreters could be realized.

Converting the whole C API to use this strategy would be
a very big project. Also, on the face of it, it seems like
it would render all existing C extension code obsolete,
although it might be possible to do something clever with
macros to create a compatibility layer.

Another thing to consider is that passing all these extra
pointers around everywhere is bound to have some effect
on performance. The idea mightn't go down too well if it
slows things significantly in the case where you're only
using one interpreter.

--
Greg

**greg** · Oct 25 '08, 06:05 AM

Re: 2.6, 3.0, and truly independent intepreters

Andy O'Meara wrote:

- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app.

I hope you realize that starting up one of these interpreters
is going to be fairly expensive. It will have to create its
own versions of all the builtin constants and type objects,
and import its own copy of all the modules it uses.

One wonders if it wouldn't be cheaper just to fork the
process. Shared memory can be used to transfer large lumps
of data if needed.

--
Greg

**greg** · Oct 25 '08, 06:25 AM

Re: 2.6, 3.0, and truly independent intepreters

Glenn Linderman wrote:

If Py_None corresponds to None in Python syntax ... then
it is a fixed constant and could be left global, probably.

No, it couldn't, because it's a reference-counted object
like any other Python object, and therefore needs to be
protected against simultaneous refcount manipulation by
different threads. So each interpreter would need its own
instance of Py_None.

The same goes for all the other built-in constants and
type objects -- there are dozens of these.

The cost is one more push on every function call,

Which sounds like it could be a rather high cost! If
(just a wild guess) each function has an average of 2
parameters, then this is increasing the amount of
argument pushing going on by 50%...

On many platforms, there is the concept of TLS, or thread-local storage.

That's another possibility, although doing it that
way would require you to have a separate thread for
each interpreter, which you mightn't always want.

--
Greg

**greg** · Oct 25 '08, 06:35 AM

Re: 2.6, 3.0, and truly independent intepreters

Andy O'Meara wrote:

In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space.

Have you considered using shared memory?

Using mmap or equivalent, you can arrange for a block of
memory to be shared between processes. Then you can dump
the big lump of data to be transferred in there, and send
a short message through a pipe to the other process to
let it know it's there.

--
Greg

2.6, 3.0, and truly independent intepreters

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment