make faster Richards benchmark

**Peter Maas** · Jul 18 '05, 10:51 AM

Re: make faster Richards benchmark

Duncan Lissett wrote:[color=blue]
> I'd appreciate any suggestions on how to make faster Python
> implementations of Richards benchmark. Perhaps there are obvious
> problems that can be corrected?[/color]

The most obvious problem: how does the Richards benchmark work?
Care to post the source code?

Mit freundlichen Gruessen,

Peter Maas

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail peter.maas@mplu sr.de
-------------------------------------------------------------------

**Michel Claveau/Hamster** · Jul 18 '05, 10:51 AM

Re: make faster Richards benchmark

Hi !

On the site, click on the word "Python #92"

**Josef Meile** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

Duncan Lissett wrote:[color=blue]
> I'd appreciate any suggestions on how to make faster Python
> implementations of Richards benchmark. Perhaps there are obvious
> problems that can be corrected?
>
> http://www.lissett.com/ben/bench1.htm[/color]
What's about including a second python implementation of the Richards
benchmark using psyco? You don't have to modify your code, you only have
to add two lines. It would be also interesting to see the differences
between both source codes.

Regards,
Josef

**Wilk** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

dlissett0@yahoo .com (Duncan Lissett) writes:
[color=blue]
> I'd appreciate any suggestions on how to make faster Python
> implementations of Richards benchmark. Perhaps there are obvious
> problems that can be corrected?
>
> http://www.lissett.com/ben/bench1.htm[/color]

import psyco
psyco.full()

2 times faster :-)

--
Wilk - http://flibuste.net

**Duncan Booth** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

dlissett0@yahoo .com (Duncan Lissett) wrote in
news:6748553f.0 405122211.5be5a 150@posting.goo gle.com:
[color=blue]
> I'd appreciate any suggestions on how to make faster Python
> implementations of Richards benchmark. Perhaps there are obvious
> problems that can be corrected?[/color]

That should say "the Richards benchmark", named for Martin Richards.
[color=blue]
>
> http://www.lissett.com/ben/bench1.htm[/color]

Well, if I was going to do a Python implementation of this I would want to
look at what the benchmark is trying to achieve, and then program it fresh
from the ground up instead of trying to ape some other language. In
particular I expect that the classes with 'run' methods would probably
become generators with most or all of their state held as local variables.

For example, imagining that all the 'run' methods have first been renamed
'next', IdleTask becomes (untested):

def IdleTask(schedu ler, v1, v2):
for i in range(v2):
if v1 & 1:
v1 = (v1 >> 1) ^ 0xD008
yield scheduler.relea se(DEVICEB)
else:
v1 = v1 >> 1
yield scheduler.relea se(DEVICEA)
yield scheduler.holdC urrent()

Next most obvious thing is to junk the linked lists and replace them with
ordinary Python builtin lists. Pass the relevant list around instead of the
id constants 'DEVICEA', 'DEVICEB' and you can get rid of all that lookup
overhead. This involves quite a major restructuring of the scheduler
though: hence my comment about understanding what the code is trying to
achieve and starting from the ground up.

Packet.addTo should look more like:

def addTo(self, queue):
queue.append(se lf)

and then vapourise entirely.

Minor points:

Remove the pointless use of the 'global' keyword in various places. Replace
the traceOn variable with __debug__ so you get the same benefits as
compiled languages by optimising out the test for the trace statements.

Remove the pointless set/get methods and just access the members directly.
If you are writing a benchmark they will cripple performance.

Avoiding multiple accesses to the same instance variable, or assigning to
instance variables until you are about to return from a method: use a local
during the execution of the method.

**Peter Hansen** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

Josef Meile wrote:
[color=blue]
> Duncan Lissett wrote:
>[color=green]
>> I'd appreciate any suggestions on how to make faster Python
>> implementations of Richards benchmark. Perhaps there are obvious
>> problems that can be corrected?
>>
>> http://www.lissett.com/ben/bench1.htm[/color]
>
> What's about including a second python implementation of the Richards
> benchmark using psyco? You don't have to modify your code, you only have
> to add two lines. It would be also interesting to see the differences
> between both source codes.[/color]

I get an immediate 38% speedup by doing "import psyco; psyco.full()"
at the start of the benchmark.

-Peter

**Peter Maas** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

Michel Claveau/Hamster wrote:[color=blue]
> On the site, click on the word "Python #92"[/color]

Thanks. Oh, how silly of me. I didn't recognize these as links,
still tuned to underscore links.

Mit freundlichen Gruessen,

Peter Maas

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Hubert-Wienen-Str. 24
Tel +49-241-93878-0 Fax +49-241-93878-20 eMail peter.maas@mplu sr.de
-------------------------------------------------------------------

**Duncan Lissett** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

Duncan Booth <me@privacy.net > wrote in message news:<Xns94E868 3D9EEF2duncanrc pcouk@127.0.0.1 >...

-snip-[color=blue]
> Well, if I was going to do a Python implementation of this I would want to
> look at what the benchmark is trying to achieve, and then program it fresh
> from the ground up instead of trying to ape some other language. In
> particular I expect that the classes with 'run' methods would probably
> become generators with most or all of their state held as local variables.
>
> For example, imagining that all the 'run' methods have first been renamed
> 'next', IdleTask becomes (untested):
>
> def IdleTask(schedu ler, v1, v2):
> for i in range(v2):
> if v1 & 1:
> v1 = (v1 >> 1) ^ 0xD008
> yield scheduler.relea se(DEVICEB)
> else:
> v1 = v1 >> 1
> yield scheduler.relea se(DEVICEA)
> yield scheduler.holdC urrent()
>
> Next most obvious thing is to junk the linked lists and replace them with
> ordinary Python builtin lists. Pass the relevant list around instead of the
> id constants 'DEVICEA', 'DEVICEB' and you can get rid of all that lookup
> overhead. This involves quite a major restructuring of the scheduler
> though: hence my comment about understanding what the code is trying to
> achieve and starting from the ground up.
>
> Packet.addTo should look more like:
>
> def addTo(self, queue):
> queue.append(se lf)
>
> and then vapourise entirely.
>
>
> Minor points:
>
> Remove the pointless use of the 'global' keyword in various places. Replace
> the traceOn variable with __debug__ so you get the same benefits as
> compiled languages by optimising out the test for the trace statements.[/color]

Thanks, these are all interesting suggestions.

[color=blue]
> Remove the pointless set/get methods and just access the members directly.
> If you are writing a benchmark they will cripple performance.
>
> Avoiding multiple accesses to the same instance variable, or assigning to
> instance variables until you are about to return from a method: use a local
> during the execution of the method.[/color]

The pointless set/get methods are pointless in the other language
implementations as well - that's the point. (And I'll cripple that
Oberon-2 implementation real soon by enforcing privacy with modules
and set/get procedures.)

OTOH there's definitely a place for a truly Pythonic implementation
here:

http://www.lissett.com/ben/bench3.htm

best wishes, Duncan

**Michel Claveau/Hamster** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

>>> Mit freundlichen Gruessen,

Désolé, je ne comprend pas l'allemand. Et pourtant, en octobre, je vais
aller là :

Object not found!

http://www.babstsoft.com/Paradox/Convention04/Info_1.htm

Je présenterai PONX (Python for Paradox) (cf http://ponx.org)

@-salutations
--
Michel Claveau
mél : http://cerbermail.com/?6J1TthIa8B
site : http://mclaveau.com

**Duncan Lissett** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

Wilk <wilkSPAM@OUTfl ibuste.net> wrote in message news:<873c64zpu 7.fsf@blakie.ri ol>...[color=blue]
> dlissett0@yahoo .com (Duncan Lissett) writes:
>[color=green]
> > I'd appreciate any suggestions on how to make faster Python
> > implementations of Richards benchmark. Perhaps there are obvious
> > problems that can be corrected?
> >
> > http://www.lissett.com/ben/bench1.htm[/color]
>
> import psyco
> psyco.full()
>
> 2 times faster :-)[/color]

And Simon: "I just tried the benchmark with Psyco, and cut the run
time for input=10000 from 8.1 seconds to 3.4"

Joseph, Simon, Wilk: thanks for the Psyco suggestion, and special
thanks for going ahead and trying it - we get slightly better than x2.

Given how little I know about Python, my assumption is that there's
scope for x10 improvement by writing better code... a more Pythonic
version for http://www.lissett.com/ben/bench3.htm

Suggestions appreciated.

best wishes, Duncan

**Peter Hansen** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

Duncan Lissett wrote:
[color=blue]
> Wilk <wilkSPAM@OUTfl ibuste.net> wrote in message news:<873c64zpu 7.fsf@blakie.ri ol>...
>[color=green]
>>dlissett0@yah oo.com (Duncan Lissett) writes:
>>
>>[color=darkred]
>>>I'd appreciate any suggestions on how to make faster Python
>>>implementati ons of Richards benchmark. Perhaps there are obvious
>>>problems that can be corrected?
>>>
>>>http://www.lissett.com/ben/bench1.htm[/color]
>>
>>import psyco
>>psyco.full( )
>>
>>2 times faster :-)[/color]
>
> And Simon: "I just tried the benchmark with Psyco, and cut the run
> time for input=10000 from 8.1 seconds to 3.4"
>
> Joseph, Simon, Wilk: thanks for the Psyco suggestion, and special
> thanks for going ahead and trying it - we get slightly better than x2.[/color]

What platform is this on? On WinXP, with an AMD 2500+ chip, and lots
of memory etc., I'm still getting only 38% speedup, nowhere near 100%...

-Peter

**Jack Diederich** · Jul 18 '05, 10:52 AM

Re: make faster Richards benchmark

On Thu, May 13, 2004 at 09:15:58AM -0700, Duncan Lissett wrote:[color=blue]
> Duncan Booth <me@privacy.net > wrote in message news:<Xns94E868 3D9EEF2duncanrc pcouk@127.0.0.1 >...[color=green]
> > Remove the pointless set/get methods and just access the members directly.
> > If you are writing a benchmark they will cripple performance.
> >
> > Avoiding multiple accesses to the same instance variable, or assigning to
> > instance variables until you are about to return from a method: use a local
> > during the execution of the method.[/color]
>
> The pointless set/get methods are pointless in the other language
> implementations as well - that's the point. (And I'll cripple that
> Oberon-2 implementation real soon by enforcing privacy with modules
> and set/get procedures.)[/color]

I would leave python and some other languages out of the comparison.
You can do near line-for-line translations for languages in the same class
- say C++ and Java, or Assembly and C. Requiring the python/perl/ruby
versions to look like a C++ program just measures how badly C++ maps to
Python, and not much else.

I've even seen some line-for-line translations in production use that
use python list-of-strings where the orignal version used char arrarys.
You can imagine what that does for performance *shudder*.

Writing benchmarks is just hard, if you allow people to solve the problem
in whatever way they like you end up measuring how good a coder the
language A guy is compared to the language B submitter.

-jackdied

**Peter Hansen** · Jul 18 '05, 10:53 AM

Re: make faster Richards benchmark

Jack Diederich wrote:
[color=blue]
> On Thu, May 13, 2004 at 09:15:58AM -0700, Duncan Lissett wrote:[color=green]
>>The pointless set/get methods are pointless in the other language
>>implementatio ns as well - that's the point.[/color]
>
> I would leave python and some other languages out of the comparison.
> You can do near line-for-line translations for languages in the same class
> - say C++ and Java, or Assembly and C. Requiring the python/perl/ruby
> versions to look like a C++ program just measures how badly C++ maps to
> Python, and not much else.
>
> I've even seen some line-for-line translations in production use that
> use python list-of-strings where the orignal version used char arrarys.
> You can imagine what that does for performance *shudder*.
>
> Writing benchmarks is just hard, if you allow people to solve the problem
> in whatever way they like you end up measuring how good a coder the
> language A guy is compared to the language B submitter.[/color]

In the case of the Richards benchmark, it's better to say "specifying
benchmarks is hard". The problem that I saw with it is that it
describes the problem in terms that are inappropriate for many of the
languages it has been translated to, effectively constraining the
implementation in ways that are more suitable to certain languages
than others.

One key example is the requirement for "global registers" which act
similarly to how the low-level registers in the CPU have to be handled,
specifically by saving and restoring them in a task-local context
whenever a task switch occurs. This is clearly what Richards really
wanted, but I'd argue that the approach is ill-conceived, especially
in this century...

On the other hand, if the specification had been more thoroughly
thought out, or perhaps modernized is a more appropriate way of looking
at it, then it would describe the expected *use and behaviour* of the
tasks in terms of black box testability... in other words, it shouldn't
matter that "global registers" are used, but merely that the
benchmark produces certain results.

Without those tests, it's fairly open to interpretation just how
much deviation from the original BCPL implementation is appropriate.
Duncan L's ideas for a generator-based approach are interesting,
though there are probably those who would argue that it wouldn't
follow the benchmark specs exactly. (Especially if it showed
performance much closer to C. <wink>)

-Peter

**Duncan Lissett** · Jul 18 '05, 10:55 AM

Re: make faster Richards benchmark

Duncan Booth <me@privacy.net > wrote in message news:<Xns94E868 3D9EEF2duncanrc pcouk@127.0.0.1 >...

-snip-[color=blue]
> Next most obvious thing is to junk the linked lists and replace them with
> ordinary Python builtin lists.[/color]

After the Packet links were replace with a Python list the code was
~1% slower...
[color=blue]
> Replace the traceOn variable with __debug__ so you get the same benefits as
> compiled languages by optimising out the test for the trace statements.[/color]

Using __debug__ made the code ~1% faster
[color=blue]
> Remove the pointless set/get methods and just access the members directly.[/color]

As a special dispensation to Python, directly accessed Packet and Tcb
fields, ~10% faster

[color=blue]
> Avoiding multiple accesses to the same instance variable, or assigning to
> instance variables until you are about to return from a method: use a local
> during the execution of the method.[/color]

Seems we're pushing into code optimization rather than not coding
badly.

Being careful about the tests in conditionals, and reordering some
conditionals sped things up some more. (I should roll the same changes
into the other langauge implementations .)

The OO style implementation took 104.4s and now takes 87.1s

I took the OO style implementation, made the methods of the scheduler
class into functions, made the fields into globals, and made the Task
class run methods into top-level functions.

The C style implementation took 114.9s and the OO conversion takes
80.3s

http://www.lissett.com/ben/bench3.htm

Further suggestions are welcome (yes, I will try pysco later)

make faster Richards benchmark

make faster Richards benchmark

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment