interpreter vs. compiled

This topic is closed.

alex23
#16

Jul 29 '08, 12:45 PM

Re: interpreter vs. compiled

On Jul 29, 2:21 pm, castironpi <castiro...@gma il.comwrote:

On Jul 28, 5:58 pm, Fuzzyman <fuzzy...@gmail .comwrote:

Well - in IronPython user code gets compiled to in memory assemblies
which can be JIT'ed.

>
I don't believe so.

Uh, you're questioning someone who is not only co-author of a book on
IronPython, but also a developer on one of the first IronPython-based
commercial applications.

I know authorship isn't always a guarantee of correctness, but what
experience do you have with IronPython that makes you so unwilling to
accept the opinion of someone with substantial knowledge of the
subject?
Comment
castironpi
#17

Jul 30 '08, 04:25 AM

Re: interpreter vs. compiled

On Jul 29, 7:39 am, alex23 <wuwe...@gmail. comwrote:

On Jul 29, 2:21 pm, castironpi <castiro...@gma il.comwrote:
>

On Jul 28, 5:58 pm, Fuzzyman <fuzzy...@gmail .comwrote:

Well - in IronPython user code gets compiled to in memory assemblies
which can be JIT'ed.

>

I don't believe so.

>
Uh, you're questioning someone who is not only co-author of a book on
IronPython, but also a developer on one of the first IronPython-based
commercial applications.
>
I know authorship isn't always a guarantee of correctness, but what
experience do you have with IronPython that makes you so unwilling to
accept the opinion of someone with substantial knowledge of the
subject?

None, no experience, no authority, only the stated premises &
classifications , which I am generally tending to misinterpret. I'm
overstepping my bounds and trying to do it politely. (Some might call
it learning, which yes, though uncustomary, *requires questioning
authorities*, or reinventing.)

Evidently, I have a "fundamenta l misunderstandin g of the compilation
process", which I'm trying to correct by stating what I believe. I'm
trying to elaborate, and I'm meeting with increasingly much detail.
So, perhaps I'll learn something out of this. Until then...

What I know I have is two conflicting, contradictory, inconsistent
beliefs. Maybe I've spent too much time in Python to imagine how a
dynamic language can compile.

This is from 7/22/08, same author:

I wouldn't say "can't". The current CPython VM does not compile
code. It COULD. The C#/.NET VM does.

Three big claims here that I breezed right over and didn't believe.

It COULD.

I'm evidently assuming that if it could, it would.

The current CPython VM does not compile code.

Therefore it couldn't, or the assumption is wrong. Tim says it is.
And the glaring one--

WHY NOT? Why doesn't CPython do it?

From 7/18/08, own author:

>>

#define TOP() (stack_pointer[-1])
#define BASIC_POP() (*--stack_pointer)

...(line 1159)...
w = POP();
v = TOP();
if (PyInt_CheckExa ct(v) && PyInt_CheckExac t(w)) {
/* INLINE: int + int */
register long a, b, i;
a = PyInt_AS_LONG(v );
b = PyInt_AS_LONG(w );
i = a + b;
<<

I am imagining that every Python implementation has something like
it. If IronPython does not, in particular, not have the 'POP();
TOP();' sequence, then it isn't running on a stack machine. Is the
IronPython code open source, and can someone link to it? I'm not
wading through it from scratch. What does it have instead? Does
dynamic typing still work?

<closing hostile remark>
If you're bluffing, bluff harder; I call. If you're not, I apologize;
teach me something. If you can ask better, teach me that too.
</hostile>
Comment
castironpi
#18

Jul 30 '08, 04:55 AM

Re: interpreter vs. compiled

On Jul 29, 1:46 am, Tim Roberts <t...@probo.com wrote:

castironpi <castiro...@gma il.comwrote:
>

In CPython yes. In IronPython yes: the parts that are compiled into
machine code are the interpreter, *not user's code*.

>
WRONG! You are WRONG. At "compile" time, the Python code is compiled to
an intermediate language. At "run" time, the intermediate language (which
is still the user's code, just in another representation) is compiled into
machine language. It is the user's program, not the interpreter.
>
It's the exact same process that occurs in a C compiler. Most C compilers
translate the C program to an intermediate form before finally converting
it to machine language. The only difference is that, in a C compiler, both
steps occur within the compiler. In IronPython, the two steps are
separated in time. There is no other difference.
>

Without that
step, the interpreter would be running on an interpreter, but that
doesn't get the user's statement 'a= b+ 1' into registers-- it gets
'push, push, add, pop' into registers.

>
You have a fundamental misunderstandin g of the compilation process.
--
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.

In C, we have:

int x, y;
x= 10;
y= x+ 1;

It translates as, roughly:

8000 .data
7996 ffffffff #x
7992 ffffffff #y
7988 .end data
7984 loadi reg0 7996
7980 loadi reg1 7992
7976 loadi reg2 10
7972 loadi reg3 1
7968 storv reg2 reg0
7964 add reg0 reg1 reg2
7960 storv reg3 reg1

You are telling me that the same thing happens in IronPython. By the
time the instruction pointer gets to 'x= 10', the next 7 instructions
are the ones shown here compiled from C.

CMIIW, but the CPython implementation -does- -not-. Instead, it has,

push 10
stor x
push 1
add
stor y

which each amounts to, to give a rough figure, 5-10 machine
instructions. push 10, for example, with instruction_poi nter in reg0:

loadi reg1 4 #add 4 to stack pointer (one word)
add reg0 reg1 reg2
load reg0 reg2
loadi reg2 10 #load ten
stor reg0 reg2 #store at top of stack

And this is all not to mention (i) the extra comparisons in
intobject.h,

#define PyInt_CheckExac t(op) ((op)->ob_type == &PyInt_Type)

(ii) the huge case statement just to evaluate add, OR (iii)

a = PyInt_AS_LONG(v );
b = PyInt_AS_LONG(w );

because CPython -does- -not- -know- ahead of time which op it will be
executing, or what addresses (remember __coerce__), it will be
performing the op on. Does not know EVER, not until it gets there.

My point is, CPython takes more than seven steps. My question is,
does IronPython?
Comment
Dino Viehland
#19

Jul 30 '08, 05:15 AM

RE: interpreter vs. compiled

IronPython doesn't have an interpreter loop and therefore has no POP / TOP / etc... Instead what IronPython has is a method call Int32Ops.Add which looks like:

public static object Add(Int32 x, Int32 y) {
long result = (long) x + y;
if (Int32.MinValue <= result && result <= Int32.MaxValue) {
return Microsoft.Scrip ting.Runtime.Ru ntimeHelpers.In t32ToObject((In t32)(result));
}
return BigIntegerOps.A dd((BigInteger) x, (BigInteger)y);
}

This is the implementation of int.__add__. Note that calling int.__add__ can actually return NotImplemented and that's handled by the method binder looking at the strong typing defined on Add's signature here - and then automatically generating the NotImplemented result when the arguments aren't ints. So that's why you don't see that here even though it's the full implementation of int.__add__.

Ok, next if you define a function like:

def adder(a, b):
return a + b

this turns into a .NET method, which will get JITed, which in C# would looksomething like like:

static object adder(object a, object b) {
return $addSite.Invoke (a, b)
}

where $addSite is a dynamically updated call site.

$addSite knows that it's performing addition and knows how to do nothing other than update the call site the 1st time it's invoked. $addSite is localto the function so if you define another function doing addition it'll have its own site instance.

So the 1st thing the call site does is a call back into the IronPython runtime which starts looking at a & b to figure out what to do. Python definesthat as try __add__, maybe try __radd__, handle coercion, etc... So we golooking through finding the __add__ method - if that can return NotImplemented then we find the __radd__ method, etc... In this case we're just adding two integers and we know that the implementation of Add() won't return NotImplemented - so there's no need to call __radd__. We know we don't haveto worry about NotImplemented because the Add method doesn't have the .NETattribute indicating it can return NotImplemented.

At this point we need to do two things. We need to generate the test whichis going to see if future arguments are applicable to what we just figuredout and then we need to generate the code which is actually going to handle this. That gets combined together into the new call site delegate and it'll look something like:

static void CallSiteStub(Ca llSite site, object a, object b) {
if (a != null && a.GetType() == typeof(int) && b != null &&b.GetType( ) == typeof(int)) {
return IntOps.Add((int )a, (int)b);
}
return site.UpdateBind ingAndInvoke(a, b);
}

That gets compiled down as a lightweight dynamic method which also gets JITed. The next time through the call site's Invoke body will be this method and things will go really fast if we have int's again. Also notice this islooking an awful lot like the inlined/fast-path(?) code dealing with int'sthat you quoted. If everything was awesome (currently it's not for a couple of reasons) the JIT would even inline the IntOps.Add call and it'd probably be near identical. And everything would be running native on the CPU.

So that's how 2 + 2 works... Finally if it's a user type then we'd generate a more complicated test like (and getting more and more pseudo code to keep things simple):

if (PythonOps.Chec kTypeVersion(a, 42) && PythonOps.Check TypeVersion(b, 42)){
return $callSite.Invok e(__cachedAddSl ot__.__get__(a) , b);
}

Here $callSite is another stub which will handle doing optimal dispatch to whatever __add__.__get__ will return. It could be a Python type, it could be a user defined function, it could be the Python built-in sum function, etc... so that's the reason for the extra dynamic dispatch.

So in summary: everything is compiled to IL. At runtime we have lots of stubs all over the place which do the work to figure out the dynamic operation and then cache the result of that calculation.

Also what I've just described is how IronPython 2.0 works. IronPython 1.0 is basically the same but mostly w/o the stubs and where we use stub methods they're much less sophisticated.

Also, IronPython is open source - www.codeplex.com/IronPython

-----Original Message-----
From: python-list-bounces+dinov=m icrosoft.com@py thon.org [mailto:python-list-bounces+dinov=m icrosoft.com@py thon.org] On Behalf Of castironpi
Sent: Tuesday, July 29, 2008 9:20 PM
To: python-list@python.org
Subject: Re: interpreter vs. compiled

On Jul 29, 7:39 am, alex23 <wuwe...@gmail. comwrote:

On Jul 29, 2:21 pm, castironpi <castiro...@gma il.comwrote:
>

On Jul 28, 5:58 pm, Fuzzyman <fuzzy...@gmail .comwrote:

Well - in IronPython user code gets compiled to in memory assemblies
which can be JIT'ed.

>

I don't believe so.

>
Uh, you're questioning someone who is not only co-author of a book on
IronPython, but also a developer on one of the first IronPython-based
commercial applications.
>
I know authorship isn't always a guarantee of correctness, but what
experience do you have with IronPython that makes you so unwilling to
accept the opinion of someone with substantial knowledge of the
subject?

None, no experience, no authority, only the stated premises &
classifications , which I am generally tending to misinterpret. I'm
overstepping my bounds and trying to do it politely. (Some might call
it learning, which yes, though uncustomary, *requires questioning
authorities*, or reinventing.)

Evidently, I have a "fundamenta l misunderstandin g of the compilation
process", which I'm trying to correct by stating what I believe. I'm
trying to elaborate, and I'm meeting with increasingly much detail.
So, perhaps I'll learn something out of this. Until then...

What I know I have is two conflicting, contradictory, inconsistent
beliefs. Maybe I've spent too much time in Python to imagine how a
dynamic language can compile.

This is from 7/22/08, same author:

I wouldn't say "can't". The current CPython VM does not compile
code. It COULD. The C#/.NET VM does.

Three big claims here that I breezed right over and didn't believe.

It COULD.

I'm evidently assuming that if it could, it would.

The current CPython VM does not compile code.

Therefore it couldn't, or the assumption is wrong. Tim says it is.
And the glaring one--

WHY NOT? Why doesn't CPython do it?

>From 7/18/08, own author:

>>

#define TOP() (stack_pointer[-1])
#define BASIC_POP() (*--stack_pointer)

....(line 1159)...
w = POP();
v = TOP();
if (PyInt_CheckExa ct(v) && PyInt_CheckExac t(w)) {
/* INLINE: int + int */
register long a, b, i;
a = PyInt_AS_LONG(v );
b = PyInt_AS_LONG(w );
i = a + b;
<<

I am imagining that every Python implementation has something like
it. If IronPython does not, in particular, not have the 'POP();
TOP();' sequence, then it isn't running on a stack machine. Is the
IronPython code open source, and can someone link to it? I'm not
wading through it from scratch. What does it have instead? Does
dynamic typing still work?

<closing hostile remark>
If you're bluffing, bluff harder; I call. If you're not, I apologize;
teach me something. If you can ask better, teach me that too.
</hostile>
--

Mailman 3 Info | python-list@python.org - python.org

http://mail.python.org/mailman/listinfo/python-list
Comment
castironpi
#20

Jul 30 '08, 06:45 AM

Re: interpreter vs. compiled

I note that IronPython and Python's pickle.dumps do not return the
same value. Perhaps this relates to the absence of interpreter loop.

>>p.dumps( { 'a': True, 'b': set( ) } )

IPy: '(dp0\nVb\np1\n c__builtin__\ns et\np3\n((lp4\n tp5\nRp2\nsVa
\np6\nI01\ns.'
CPy: "(dp0\nS'a'\np1 \nI01\nsS'b'\np 2\nc__builtin__ \nset
\np3\n((lp4\ntp 5\nRp6\ns."

You make me think of a more elaborate example.

for k in range( 100 ):
i= j()
g= h+ i
e= f+ g
c= d+ e
a= b+ c

Here, j creates a new class dynamically, and returns an instance of
it. Addition is defined on it but the return type from it varies.

If I read you correctly, IPy can leave hundreds of different addition
stubs laying around at the end of the for-loop, each of which only
gets executed once or twice, each of which was compiled for the exact
combination of types it was called for.

I might construe this to be a degenerate case, and the majority of
times, you'll reexecute stubs enough to outweigh the length of time
the compilation step takes. If you still do the bounds checking, it
takes extra instructions (C doesn't), but operation switch-case
BINARY_ADD, (PyInt_CheckExa ct(v) && PyInt_CheckExac t(w)), and POP and
TOP, are all handled by the selection of stubs from $addSite.

I'm read from last April:

>>The most interesting cases to me are the 5 tests where CPython is more than 3x faster than IronPython and the other 5 tests where IronPython is more than 3x faster than CPython. CPython's strongest performance is in dictionaries with integer and string keys, list slicing, small tuples and code that actually throws and catches exceptions. IronPython's strongest performance is in calling builtin functions, if/then/else blocks, calling python functions, deep recursion, and try/except blocks that don't actually catch an exception.

<<< http://lists.ironpython.com/pipermai...il/004773.html

It's interesting that CPython can make those gains still by using a
stack implementation.

I'll observe that IronPython has the additional dependency of the
full .NET runtime. (It was my point 7/18 about incorporating the GNU
libs, that to compile to machine-native, as a JIT does, you need the
instruction set of the machine.) Whereas, CPython can disregard
them, having already been compiled for it.

I think what I was looking for is that IronPython employs the .NET to
compile to machine instructions, once it's known what the values of
the variables are that are the operands. The trade-off is compilation
time + type checks + stub look-up.

What I want to know is, if __add__ performs an attribute look-up, is
that optimized in any way, after the IP is already in compiled code?

After all that, I don't feel so guilty about stepping on Tim's toes.

On Jul 30, 12:12 am, Dino Viehland <di...@exchange .microsoft.com>
wrote:

IronPython doesn't have an interpreter loop and therefore has no POP / TOP / etc... Instead what IronPython has is a method call Int32Ops.Add which looks like:
>
public static object Add(Int32 x, Int32 y) {
long result = (long) x + y;
if (Int32.MinValue <= result && result <= Int32.MaxValue) {
return Microsoft.Scrip ting.Runtime.Ru ntimeHelpers.In t32ToObject((In t32)(result));
}
return BigIntegerOps.A dd((BigInteger) x, (BigInteger)y);
}
>
This is the implementation of int.__add__. Note that calling int.__add__ can actually return NotImplemented and that's handled by the method binder looking at the strong typing defined on Add's signature here - and then automatically generating the NotImplemented result when the arguments aren't ints. So that's why you don't see that here even though it's the full implementation of int.__add__.
>
Ok, next if you define a function like:
>
def adder(a, b):
return a + b
>
this turns into a .NET method, which will get JITed, which in C# would look something like like:
>
static object adder(object a, object b) {
return $addSite.Invoke (a, b)
>
}
>
where $addSite is a dynamically updated call site.
>
$addSite knows that it's performing addition and knows how to do nothing other than update the call site the 1st time it's invoked. $addSite is local to the function so if you define another function doing addition it'llhave its own site instance.
>
So the 1st thing the call site does is a call back into the IronPython runtime which starts looking at a & b to figure out what to do. Python defines that as try __add__, maybe try __radd__, handle coercion, etc... Sowe go looking through finding the __add__ method - if that can return NotImplemented then we find the __radd__ method, etc... In this case we're just adding two integers and we know that the implementation of Add() won't return NotImplemented - so there's no need to call __radd__. We know we don't have to worry about NotImplemented because the Add method doesn't have the .NET attribute indicating it can return NotImplemented.
>
At this point we need to do two things. We need to generate the test which is going to see if future arguments are applicable to what we just figured out and then we need to generate the code which is actually going to handle this. That gets combined together into the new call site delegate and it'll look something like:
>
static void CallSiteStub(Ca llSite site, object a, object b) {
if (a != null && a.GetType() == typeof(int) && b != null && b.GetType() == typeof(int)) {
return IntOps.Add((int )a, (int)b);
}
return site.UpdateBind ingAndInvoke(a, b);
>
}
>
That gets compiled down as a lightweight dynamic method which also gets JITed. The next time through the call site's Invoke body will be this method and things will go really fast if we have int's again. Also notice this is looking an awful lot like the inlined/fast-path(?) code dealing withint's that you quoted. If everything was awesome (currently it's not for a couple of reasons) the JIT would even inline the IntOps.Add call and it'd probably be near identical. And everything would be running native onthe CPU.
>
So that's how 2 + 2 works... Finally if it's a user type then we'd generate a more complicated test like (and getting more and more pseudo code to keep things simple):
>
if (PythonOps.Chec kTypeVersion(a, 42) && PythonOps.Check TypeVersion(b, 42)) {
return $callSite.Invok e(__cachedAddSl ot__.__get__(a) , b);
>
}
>
Here $callSite is another stub which will handle doing optimal dispatch to whatever __add__.__get__ will return. It could be a Python type, it could be a user defined function, it could be the Python built-in sum function, etc... so that's the reason for the extra dynamic dispatch.
>
So in summary: everything is compiled to IL. At runtime we have lots of stubs all over the place which do the work to figure out the dynamic operation and then cache the result of that calculation.
>
Also what I've just described is how IronPython 2.0 works. IronPython 1.0 is basically the same but mostly w/o the stubs and where we use stub methods they're much less sophisticated.
>
Also, IronPython is open source -www.codeplex.co m/IronPython
>
-----Original Message-----

abriged:

This is from 7/22/08, same author:

I wouldn't say "can't". The current CPython VM does not compile
code. It COULD. The C#/.NET VM does.

>
Three big claims here that I breezed right over and didn't believe.
>

It COULD.

>
I'm evidently assuming that if it could, it would.
>

The current CPython VM does not compile code.

>
Therefore it couldn't, or the assumption is wrong. Tim says it is.
And the glaring one--
>
WHY NOT? Why doesn't CPython do it?
>
I am imagining that every Python implementation has something like
it. If IronPython does not, in particular, not have the 'POP();
TOP();' sequence, then it isn't running on a stack machine. Is the
IronPython code open source, and can someone link to it? I'm not
wading through it from scratch. What does it have instead? Does
dynamic typing still work?
Comment
Dino Viehland
#21

Jul 30 '08, 03:45 PM

RE: interpreter vs. compiled

It looks like the pickle differences are due to two issues. First IronPython doesn't have ASCII strings so it serializes strings as Unicode. Second there are dictionary ordering differences. If you just do:

{ 'a': True, 'b': set( ) }

Cpy prints: {'a': True, 'b': set([])}
Ipy prints: {'b': set([]), 'a': True}

The important thing is that we interop - and indeed you can send either pickle string to either implementation and the correct results are deserialized (modulo getting Unicode strings).

For your more elaborate example you're right that there could be a problem here. But the DLR actually recognizes this sort of pattern and optimizes for it. All of the additions in your code are what I've been calling serially monomorphic call sites. That is they see the same types for a while, maybe even just once as in your example, and then they switch to a new type -never to return to the old one. When IronPython gives the DLR the code for the call site the DLR can detect when the code only differs by constants - in this case type version checks. It will then re-write the code turningthe changing constants into variables. The next time through when it seesthe same code again it'll re-use the existing compiled code with the new sets of constants.

That's still slower than we were in 1.x so we'll need to push on this more in the future - for example producing a general rule instead of a type-specific rule. But for the time being having the DLR automatically handle thishas been working good enough for these situations.

-----Original Message-----
From: python-list-bounces+dinov=m icrosoft.com@py thon.org [mailto:python-list-bounces+dinov=m icrosoft.com@py thon.org] On Behalf Of castironpi
Sent: Tuesday, July 29, 2008 11:40 PM
To: python-list@python.org
Subject: Re: interpreter vs. compiled

I note that IronPython and Python's pickle.dumps do not return the
same value. Perhaps this relates to the absence of interpreter loop.

>>p.dumps( { 'a': True, 'b': set( ) } )

IPy: '(dp0\nVb\np1\n c__builtin__\ns et\np3\n((lp4\n tp5\nRp2\nsVa
\np6\nI01\ns.'
CPy: "(dp0\nS'a'\np1 \nI01\nsS'b'\np 2\nc__builtin__ \nset
\np3\n((lp4\ntp 5\nRp6\ns."

You make me think of a more elaborate example.

for k in range( 100 ):
i= j()
g= h+ i
e= f+ g
c= d+ e
a= b+ c

Here, j creates a new class dynamically, and returns an instance of
it. Addition is defined on it but the return type from it varies.

If I read you correctly, IPy can leave hundreds of different addition
stubs laying around at the end of the for-loop, each of which only
gets executed once or twice, each of which was compiled for the exact
combination of types it was called for.

I might construe this to be a degenerate case, and the majority of
times, you'll reexecute stubs enough to outweigh the length of time
the compilation step takes. If you still do the bounds checking, it
takes extra instructions (C doesn't), but operation switch-case
BINARY_ADD, (PyInt_CheckExa ct(v) && PyInt_CheckExac t(w)), and POP and
TOP, are all handled by the selection of stubs from $addSite.

I'm read from last April:

>>The most interesting cases to me are the 5 tests where CPython is more than 3x faster than IronPython and the other 5 tests where IronPython is more than 3x faster than CPython. CPython's strongest performance is in dictionaries with integer and string keys, list slicing, small tuples and code that actually throws and catches exceptions. IronPython's strongest performance is in calling builtin functions, if/then/else blocks, calling python functions, deep recursion, and try/except blocks that don't actually catch an exception.

<<< http://lists.ironpython.com/pipermai...il/004773.html

It's interesting that CPython can make those gains still by using a
stack implementation.

I'll observe that IronPython has the additional dependency of the
full .NET runtime. (It was my point 7/18 about incorporating the GNU
libs, that to compile to machine-native, as a JIT does, you need the
instruction set of the machine.) Whereas, CPython can disregard
them, having already been compiled for it.

I think what I was looking for is that IronPython employs the .NET to
compile to machine instructions, once it's known what the values of
the variables are that are the operands. The trade-off is compilation
time + type checks + stub look-up.

What I want to know is, if __add__ performs an attribute look-up, is
that optimized in any way, after the IP is already in compiled code?

After all that, I don't feel so guilty about stepping on Tim's toes.

On Jul 30, 12:12 am, Dino Viehland <di...@exchange .microsoft.com>
wrote:

IronPython doesn't have an interpreter loop and therefore has no POP / TOP / etc... Instead what IronPython has is a method call Int32Ops.Add which looks like:
>
public static object Add(Int32 x, Int32 y) {
long result = (long) x + y;
if (Int32.MinValue <= result && result <= Int32.MaxValue) {
return Microsoft.Scrip ting.Runtime.Ru ntimeHelpers.In t32ToObject((In t32)(result));
}
return BigIntegerOps.A dd((BigInteger) x, (BigInteger)y);
}
>
This is the implementation of int.__add__. Note that calling int.__add__can actually return NotImplemented and that's handled by the method binderlooking at the strong typing defined on Add's signature here - and then automatically generating the NotImplemented result when the arguments aren't ints. So that's why you don't see that here even though it's the full implementation of int.__add__.
>
Ok, next if you define a function like:
>
def adder(a, b):
return a + b
>
this turns into a .NET method, which will get JITed, which in C# would look something like like:
>
static object adder(object a, object b) {
return $addSite.Invoke (a, b)
>
}
>
where $addSite is a dynamically updated call site.
>
$addSite knows that it's performing addition and knows how to do nothing other than update the call site the 1st time it's invoked. $addSite is local to the function so if you define another function doing addition it'll have its own site instance.
>
So the 1st thing the call site does is a call back into the IronPython runtime which starts looking at a & b to figure out what to do. Python defines that as try __add__, maybe try __radd__, handle coercion, etc... So we go looking through finding the __add__ method - if that can return NotImplemented then we find the __radd__ method, etc... In this case we're just adding two integers and we know that the implementation of Add() won't returnNotImplem ented - so there's no need to call __radd__. We know we don't have to worry about NotImplemented because the Add method doesn't have the .NET attribute indicating it can return NotImplemented.
>
At this point we need to do two things. We need to generate the test which is going to see if future arguments are applicable to what we just figured out and then we need to generate the code which is actually going to handle this. That gets combined together into the new call site delegate and it'll look something like:
>
static void CallSiteStub(Ca llSite site, object a, object b) {
if (a != null && a.GetType() == typeof(int) && b != null && b.GetType() == typeof(int)) {
return IntOps.Add((int )a, (int)b);
}
return site.UpdateBind ingAndInvoke(a, b);
>
}
>
That gets compiled down as a lightweight dynamic method which also gets JITed. The next time through the call site's Invoke body will be this method and things will go really fast if we have int's again. Also notice this is looking an awful lot like the inlined/fast-path(?) code dealing with int's that you quoted. If everything was awesome (currently it's not for a couple of reasons) the JIT would even inline the IntOps.Add call and it'd probably be near identical. And everything would be running native on the CPU..
>
So that's how 2 + 2 works... Finally if it's a user type then we'd generate a more complicated test like (and getting more and more pseudo code to keep things simple):
>
if (PythonOps.Chec kTypeVersion(a, 42) && PythonOps.Check TypeVersion(b, 42)) {
return $callSite.Invok e(__cachedAddSl ot__.__get__(a) , b);
>
}
>
Here $callSite is another stub which will handle doing optimal dispatch to whatever __add__.__get__ will return. It could be a Python type, it could be a user defined function, it could be the Python built-in sum function,etc... so that's the reason for the extra dynamic dispatch.
>
So in summary: everything is compiled to IL. At runtime we have lots of stubs all over the place which do the work to figure out the dynamic operation and then cache the result of that calculation.
>
Also what I've just described is how IronPython 2.0 works. IronPython 1.0 is basically the same but mostly w/o the stubs and where we use stub methods they're much less sophisticated.
>
Also, IronPython is open source -www.codeplex.co m/IronPython
>
-----Original Message-----

abriged:

This is from 7/22/08, same author:

I wouldn't say "can't". The current CPython VM does not compile
code. It COULD. The C#/.NET VM does.

>
Three big claims here that I breezed right over and didn't believe.
>

It COULD.

>
I'm evidently assuming that if it could, it would.
>

The current CPython VM does not compile code.

>
Therefore it couldn't, or the assumption is wrong. Tim says it is.
And the glaring one--
>
WHY NOT? Why doesn't CPython do it?
>
I am imagining that every Python implementation has something like
it. If IronPython does not, in particular, not have the 'POP();
TOP();' sequence, then it isn't running on a stack machine. Is the
IronPython code open source, and can someone link to it? I'm not
wading through it from scratch. What does it have instead? Does
dynamic typing still work?

--

Mailman 3 Info | python-list@python.org - python.org

http://mail.python.org/mailman/listinfo/python-list
Comment
Terry Reedy
#22

Jul 30 '08, 04:15 PM

Re: interpreter vs. compiled

castironpi wrote:

>The current CPython VM does not compile code.

CPython compiles Python code to bytecode for its CPython *hardware
independent* VM using standard compiler methdods and tools (lexer,
parser, code generator, and optimizer). That VM (interpreter) is
written in portable-as-possible C, with machine/OS #ifdefs added as
needed.

WHY NOT? Why doesn't CPython do it?

1. Portability: The Microsoft C# JIT compiler runs under Windows .NET on
x86/amd64 and maybe it64 and what else? Just porting .NET to run 0n
Linux on the same processors was/is a bit task. Does MONO have a JIT also?

There is a JIT for Python: Psyco. It originally only worked on x86. I
am not sure what else. It originated as a PhD project, working with
CPython, and was developed further as part of PyPy, but I do not know if
there is any current progress.

Python VM runs on numerous platforms.

2. Money: C#, its JIT, and IronPython were and are funded by MS.
Getting JIT right is hard and tedious.

CPython is mostly a volunteer project. It is also the Python development
platform. So it has to be simple enough for volunteers to pick up on
its innards and for experimentation to be possible. Give the PSF more
resources and

tjr
Comment
Tim Roberts
#23

Jul 31 '08, 06:25 AM

Re: interpreter vs. compiled

castironpi <castironpi@gma il.comwrote:

>
>In C, we have:
>
>int x, y;
>x= 10;
>y= x+ 1;
>
>It translates as, roughly:

>>

>8000 .data
>7996 ffffffff #x
>7992 ffffffff #y
>7988 .end data
>7984 loadi reg0 7996
>7980 loadi reg1 7992
>7976 loadi reg2 10
>7972 loadi reg3 1
>7968 storv reg2 reg0
>7964 add reg0 reg1 reg2
>7960 storv reg3 reg1

I don't recognize that assembly language. Is that another intermediate
language?

>You are telling me that the same thing happens in IronPython.

Yes, the same process happens.

>By the
>time the instruction pointer gets to 'x= 10', the next 7 instructions
>are the ones shown here compiled from C.

I most certainly did NOT say that, as you well know. Different C compilers
produce different instruction sequences for a given chunk of code. Indeed,
a single C compiler will produce different instruction sequences based on
the different command-line options. It's unreasonable to expect a Python
compiler to produce exactly the same code as a C compiler.

However, that does disqualify the Python processor as a "compiler".

>CMIIW, but the CPython implementation -does- -not-.

And again, I never said that it did. CPython is an interpreter. the
user's code is never translated into machine language.

>My point is, CPython takes more than seven steps. My question is,
>does IronPython?

So, if compiler B isn't as good at optimization as compiler A, does that
mean in your mind that compiler B is not a "compiler"?
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
Comment
Bob Martin
#24

Jul 31 '08, 08:55 AM

Re: interpreter vs. compiled

in 76135 20080731 090911 Dennis Lee Bieber <wlfraed@ix.net com.comwrote:

>On Thu, 31 Jul 2008 06:17:59 GMT, Tim Roberts <timr@probo.com declaimed
>the following in comp.lang.pytho n:
>
>

>And again, I never said that it did. CPython is an interpreter. the
>user's code is never translated into machine language.
>>

>Using that definition, the UCSD P-code Pascal and Java are also not
>"compilers" -- all three create files containing instructions for a
>non-hardware virtual machine.
>
>The only difference between Python, UCSD Pascal, and Java is that
>Python foregoes the explicit "compiler" pass.
>
>BASIC (classical microcomputer implementations -- like the one M$
>supplied for TRS-80s) is an interpreter -- the pre-scan of the source
>merely translated BASIC keywords into a byte index, not into opcodes for
>any virtual machine.

You are confusing languages with implementations , as I pointed out earlier.
Java is a language.
I have used at least 2 Java compilers, ie they compiled Java source to native
machine language.
Comment
Duncan Booth
#25

Jul 31 '08, 09:05 AM

Re: interpreter vs. compiled

Terry Reedy <tjreedy@udel.e duwrote:

1. Portability: The Microsoft C# JIT compiler runs under Windows .NET
on x86/amd64 and maybe it64 and what else? Just porting .NET to run
0n Linux on the same processors was/is a bit task. Does MONO have a
JIT also?

Technically there is no such thing as a Microsoft C# JIT compiler: the C#
compiler targets IL and the JIT compilers convert IL to the native machine,
but C# is just one of the frontend compilers you could use.

Microsoft do JIT compilers for .Net Compact Framework that target ARM,
MIPS, SHx and x86. The Mono JIT supports:

s390, s390x (32 and 64 bits) Linux
SPARC(32) Solaris, Linux
PowerPC Linux, Mac OSX
x86 Linux, FreeBSD, OpenBSD, NetBSD, Microsoft Windows, Solaris, OS X
x86-64: AMD64 and EM64T (64 bit) Linux, Solaris
IA64 Itanium2 (64 bit) Linux
ARM: little and big endian Linux (both the old and the new ABI)
Alpha Linux
MIPS Linux
HPPA Linux

(from http://www.mono-project.com/Supported_Platforms)

So I'd say .Net scores pretty highly on the portability stakes. (Although
of course code written for .Net might not do so well).

--
Duncan Booth http://kupuguy.blogspot.com
Comment
Terry Reedy
#26

Aug 1 '08, 04:35 AM

Re: interpreter vs. compiled

Duncan Booth wrote:

Terry Reedy <tjreedy@udel.e duwrote:
>

>1. Portability: The Microsoft C# JIT compiler runs under Windows .NET
>on x86/amd64 and maybe it64 and what else? Just porting .NET to run
>0n Linux on the same processors was/is a bit task. Does MONO have a
>JIT also?

>
Technically there is no such thing as a Microsoft C# JIT compiler: the C#
compiler targets IL and the JIT compilers convert IL to the native machine,
but C# is just one of the frontend compilers you could use.
>
Microsoft do JIT compilers for .Net Compact Framework that target ARM,
MIPS, SHx and x86. The Mono JIT supports:
>
s390, s390x (32 and 64 bits) Linux
SPARC(32) Solaris, Linux
PowerPC Linux, Mac OSX
x86 Linux, FreeBSD, OpenBSD, NetBSD, Microsoft Windows, Solaris, OS X
x86-64: AMD64 and EM64T (64 bit) Linux, Solaris
IA64 Itanium2 (64 bit) Linux
ARM: little and big endian Linux (both the old and the new ABI)
Alpha Linux
MIPS Linux
HPPA Linux
>
(from http://www.mono-project.com/Supported_Platforms)
>
>
So I'd say .Net scores pretty highly on the portability stakes. (Although
of course code written for .Net might not do so well).

Did you mean IL scores highly? In any case, scratch 1. portability as a
reason why CPython lacks JIT. That leave 2. $ in the multimillions.

More
3. Design difference1: I suspect that IL was designed with JIT
compilation in mind, whereas PyCode was certainly not.

4. Design difference2: The first 'killer ap' for Python was driving
compiled Fortran/C functions (early Numeric). If a 'Python' program
spend 95% of its time in compiled-to-machine-code extensions, reducing
the other 5% to nothing gains little. CPython *was* and has-been
designed for this.

The process continues. The relatively new itertools module was designed
and tested in Python (see itertools in Libary reference). But the
delivered module is compiled C.

tjr
Comment
castironpi
#27

Aug 1 '08, 05:15 AM

Re: interpreter vs. compiled

On Jul 31, 1:17 am, Tim Roberts <t...@probo.com wrote:

castironpi <castiro...@gma il.comwrote:
>

In C, we have:

>

int x, y;
x= 10;
y= x+ 1;

>

It translates as, roughly:

>

8000 .data
7996 ffffffff #x
7992 ffffffff #y
7988 .end data
7984 loadi reg0 7996
7980 loadi reg1 7992
7976 loadi reg2 10
7972 loadi reg3 1
7968 storv reg2 reg0
7964 add reg0 reg1 reg2
7960 storv reg3 reg1

>
I don't recognize that assembly language. Is that another intermediate
language?

I'm looking at a system word of 1's and 0's that gets executed on a
per-cycle basis in the processor. Could easily be that the designs
are tuned to JIT's these days and I'm out of date, what with
pipelining and lookahead branching and all, but no, it's what I
remember from system architecture class.

You are telling me that the same thing happens in IronPython.

>
Yes, the same process happens.
>

By the
time the instruction pointer gets to 'x= 10', the next 7 instructions
are the ones shown here compiled from C.

>
I most certainly did NOT say that, as you well know. Different C compilers
produce different instruction sequences for a given chunk of code. Indeed,
a single C compiler will produce different instruction sequences based on
the different command-line options. It's unreasonable to expect a Python
compiler to produce exactly the same code as a C compiler.
>
However, that does disqualify the Python processor as a "compiler".
>

CMIIW, but the CPython implementation -does- -not-.

>
And again, I never said that it did. CPython is an interpreter. the
user's code is never translated into machine language.
>

My point is, CPython takes more than seven steps. My question is,
does IronPython?

>
So, if compiler B isn't as good at optimization as compiler A, does that
mean in your mind that compiler B is not a "compiler"?
--
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.

You can translate C code to machine code, for any given C code, for
any given machine.

You can translate Python code to machine code, for some Python code,
for any given machine.

Given the restrictions (or rather, freedoms) of Python, does there
exist code that necessarily cannot translate to machine code? In
other words, can you translate all Python code to machine code?
Similarly, I take it that the decision to make CPython a stack machine
+ VM was a design decision, not a necessity, favoring internal
simplicity over the extra 5%.

The output of the program is determined from the input by the Python
specification, regardless of implementation, but the answer still
isn't necessarily yes. But I think the only counterexample that comes
to me is the case of a dynamic grammar, not merely dynamic data type,
so for Python maybe it is. And furthermore, I think I'm getting
confused about what exactly constitutes an interpreter: it is whether
there is a process that runs product instructions, or the product
instructions can run standalone. I would take 'compiler' to mean,
something that outputs an .EXE executable binary file, and I don't
just mean bundling up the python.exe executable with a file. Python
needs to be present on the machine you run target code on, not so with
C binaries. Can .NET bring its targets to such a state, or are those
run-times requried? Are they just DLL's, or are they, for lack of
better word, driving?

(Of course, for futuristic abstract hardware designs, CPython may be
mostly native instructions, and the output of its processor on
function definition, can be stored and run directly, such as for a
stack machine architecture.)

But I don't want to restrict my question to a question of
optimization. Your compiler could output something like this:

read variables into registers
reorganize variable dictionaries
perform addition
do laundry
write variables into system memory
clean sink

and run when you write 'C:\>silly.exe' , and so on. And still be
compiled, even if the live IronPython session, which you invoke with,
'C:\>ironpy.exe silly.py', outputs the same 7 MIPS instructions.
Comment
Paul Boddie
#28

Aug 1 '08, 10:25 AM

Re: interpreter vs. compiled

On 1 Aug, 07:11, castironpi <castiro...@gma il.comwrote:

>
Given the restrictions (or rather, freedoms) of Python, does there
exist code that necessarily cannot translate to machine code? In
other words, can you translate all Python code to machine code?

Given that all valid Python code can be executed somehow and that
execution takes place as the processor performs instructions which "it
gets from somewhere", meaning that those instructions can belong
either to a general interpreter or to specific code generated for a
given user program (or a combination of these things), I think that
you have to refine your question. What you seem to be asking is this:
can you translate Python code to machine code which encodes the
behaviour of the user program in a way nearing the efficiency of code
generated from other programming languages? Rephrased, the question is
this: can Python code be efficiently represented using low-level
machine instructions?

I think you've already touched upon this when thinking about integer
operations. The apparently simple case of integer addition in Python
is not completely encoded by a few machine instructions. In other
words...

a + b # in Python

...is not sufficiently represented by...

ldr r1, a
ldr r2, b
add r3, r1, r2

...in some assembly language (and the resulting machine code), mostly
because the semantics of Python addition are more complicated. Of
course, you can generate code for those semantics, which would lead to
quite a few more machine instructions than those suggested above, but
then it might be interesting to bundle those instructions in some kind
of subroutine, and we could call this subroutine BINARY_ADD. At this
point, you'd almost be back at the stage where you're writing a
bytecode interpreter again.

Of course, it's worth considering something in between these
situations (the verbose expansion of the user program vs. a bytecode
interpreter which examines virtual instructions and jumps to
subroutines), and there are apparently a few techniques which make
virtual machines more efficient (so that the processor isn't jumping
around too much in the interpreter code, for example), and there are
also going to be techniques which permit the simplification of any
verbose machine code representation (most likely by not generating
code which is never going to be executed, due to various properties of
the program).

Obviously, CPython isn't oriented towards investigating these matters
in great depth, but that doesn't mean that other implementations can't
pursue other approaches.

Similarly, I take it that the decision to make CPython a stack machine
+ VM was a design decision, not a necessity, favoring internal
simplicity over the extra 5%.

Probably: it simplifies code generation somewhat.

Paul
Comment
Terry Reedy
#29

Aug 1 '08, 08:45 PM

Re: interpreter vs. compiled

castironpi wrote:

Similarly, I take it that the decision to make CPython a stack machine
+ VM was a design decision, not a necessity, favoring internal
simplicity over the extra 5%.

Years ago, someone once started a project to write a register-based
virtual machine for (C)Python. I suspect it was abandoned for some
combination of lack of time and preliminary results showing little
speedup for the increased complication. But I never saw any 'final
report'.

And furthermore, I think I'm getting
confused about what exactly constitutes an interpreter: it is whether
there is a process that runs product instructions, or the product
instructions can run standalone. I would take 'compiler' to mean,
something that outputs an .EXE executable binary file,

This is way too restrictive. Does *nix have no compilers? In any case,
the CPython compiler uses stadard compiler components: lexer, parser,
syntax tree, code generator, and peephole optimizer. The result is a
binary file (.pyc for Python compiled) executable on a PyCode machine.
Comment
castironpi
#30

Aug 2 '08, 06:35 AM

Re: interpreter vs. compiled

On Aug 1, 5:24 am, Paul Boddie <p...@boddie.or g.ukwrote:

On 1 Aug, 07:11, castironpi <castiro...@gma il.comwrote:
>
>
>

Given the restrictions (or rather, freedoms) of Python, does there
exist code that necessarily cannot translate to machine code? In
other words, can you translate all Python code to machine code?

>
Given that all valid Python code can be executed somehow and that
execution takes place as the processor performs instructions which "it
gets from somewhere", meaning that those instructions can belong
either to a general interpreter or to specific code generated for a
given user program (or a combination of these things), I think that
you have to refine your question. What you seem to be asking is this:
can you translate Python code to machine code which encodes the
behaviour of the user program in a way nearing the efficiency of code
generated from other programming languages? Rephrased, the question is
this: can Python code be efficiently represented using low-level
machine instructions?
>
I think you've already touched upon this when thinking about integer
operations. The apparently simple case of integer addition in Python
is not completely encoded by a few machine instructions. In other
words...
>
a + b # in Python
>
...is not sufficiently represented by...
>
ldr r1, a
ldr r2, b
add r3, r1, r2
>
...in some assembly language (and the resulting machine code), mostly
because the semantics of Python addition are more complicated.

No, it is not sufficiently represented. Python runs checks before and
after, to check for overflows.

test safeinteger a
test safeinteger b
ldr r1, a
ldr r2, b
add r3, r1, r2
test not overflow

However, no implementation of Python can do better, given Python's
specification.

Of
course, you can generate code for those semantics, which would lead to
quite a few more machine instructions than those suggested above, but
then it might be interesting to bundle those instructions in some kind
of subroutine, and we could call this subroutine BINARY_ADD. At this
point, you'd almost be back at the stage where you're writing a
bytecode interpreter again.

This isn't the bytecode interpreter returning, it's bounds checking,
which is part and parcel of Python, I hold.

Another factor, a and b are known to be and are always integers, in a
given C context.

int a, b;
...
a + b

The C compilation process outputs:

ldr r1, a
ldr r2, b
add r3, r1, r2

and you are correct. However, for:

string a, b;
a + b

performs a concatenation which is not that simple. The point is, C
compilation runs, and you actually have -ldr, ldr, add- lying around
in a file somewhere, which can run as three consecutive instructions
on a processor. It's already in the interpreter in Python, and you
have the -test, test, ldr, ldr, add, test- sequence somewhere in
Python.exe, specifically wherever the object code for ceval.c is
going.

Incidentally, I find 2 bytes in 16K different in a simple C program
that merely executes a + b vs. a - b. It is not in this practical
case a three-word output (12 bytes, ldr-ldr-add vs. ldr-ldr-subtr),
though it's not clear what entry points the OS requires.

Of course, it's worth considering something in between these
situations (the verbose expansion of the user program vs. a bytecode
interpreter which examines virtual instructions and jumps to
subroutines), and there are apparently a few techniques which make
virtual machines more efficient (so that the processor isn't jumping
around too much in the interpreter code, for example), and there are
also going to be techniques which permit the simplification of any
verbose machine code representation (most likely by not generating
code which is never going to be executed, due to various properties of
the program).

I think it's relevant and fair to consider -consecutive- language
instructions at this point. Working example:

int a, b, c, d;
...
a + b
c + d

The C compilation process outputs:

ldr r1, a
ldr r2, b
add r3, r1, r2
ldr r1, c
ldr r2, d
add r3, r1, r2

Whereas there is no equivalent in CPython. The actual code that runs
on the processor (summary) is:

:loop
...
:addition_sign
test safeinteger a
test safeinteger b
ldr r1, a
ldr r2, b
add r3, r1, r2
test not overflow
:goto loop

as opposed to two duplicate sections being outputted back-to-back,
even though they are -run- effectively back-to-back. For a different
type, say list concatenation, the disassembly looks like:

>>def f():

... []+[]
...

>>f()
>>dis.dis(f)

2 0 BUILD_LIST 0
3 BUILD_LIST 0
6 BINARY_ADD
7 POP_TOP
8 LOAD_CONST 0 (None)
11 RETURN_VALUE

the meaning of which I am not finding in ceval.c. Anyone?

Regardless, the JIT compilation process allocates a new executable
block of memory:

:loop
...
:addition_sign
output( 'ldr r1, a; ldr r1, a; ldr r2, b' )
:goto loop

which in this stage executes twice, yielding

ldr r1, a
ldr r2, b
add r3, r1, r2
ldr r1, c
ldr r2, d
add r3, r1, r2

somewhere in memory, same as C. Then it runs the block. It also has
to have already ascertained that 'a' and 'b' are necessarily integers
by the time it makes the output( ) statement. I remain unclear on the
disagreement between why JIT is not called compiling, and why it
doesn't output an executable binary, unless the practical reasons of
saving developer time and cross-platform object files, or merely
terminology, are the only difference.
Comment

Previous 1 2 3 template Next

interpreter vs. compiled

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment