Speed: bytecode vz C API calls

**Bengt Richter** · Jul 18 '05, 06:35 AM

Re: Speed: bytecode vz C API calls

On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <francisgavila@ yahoo.com> wrote:
[color=blue]
>Stuart Bishop wrote in message ...[color=green]
>>About the only thing that could be done then is:
>>
>>def memoize(callabl e):
>> cache = {}
>> def proxy(*args, _cache=cache):[/color]
>
>Not legal. *args must always come after default args. There's no way to do[/color]
Never say never ;-)
A function has the __get__ method, which is what getattr magic uses to make a bound method.
You can use this put cache in the "self" position of a bound method, like
[color=blue][color=green][color=darkred]
>>> def sq(x): return x*x[/color][/color][/color]
...[color=blue][color=green][color=darkred]
>>> def memoize(func): # callable is a builtin ;-)[/color][/color][/color]
... def proxy(cache, *args):
... try: return cache[args]
... except KeyError: return cache.setdefaul t(args, func(*args))
... return proxy.__get__({ }, proxy.__class__ )
...
[color=blue][color=green][color=darkred]
>>> msq = memoize(sq)
>>> msq[/color][/color][/color]
<bound method function.proxy of {}>[color=blue][color=green][color=darkred]
>>> msq(3)[/color][/color][/color]
9[color=blue][color=green][color=darkred]
>>> msq[/color][/color][/color]
<bound method function.proxy of {(3,): 9}>[color=blue][color=green][color=darkred]
>>> msq(3)[/color][/color][/color]
9[color=blue][color=green][color=darkred]
>>> msq(2)[/color][/color][/color]
4[color=blue][color=green][color=darkred]
>>> msq[/color][/color][/color]
<bound method function.proxy of {(3,): 9, (2,): 4}>

Looking at the code, it might be worth timing, if you have 90%+ cache hits.
I'd be curious to see what you get
[color=blue][color=green][color=darkred]
>>> import dis
>>> dis.dis(msq)[/color][/color][/color]
3 0 SETUP_EXCEPT 12 (to 15)
3 LOAD_FAST 0 (cache)
6 LOAD_FAST 1 (args)
9 BINARY_SUBSCR
10 RETURN_VALUE
11 POP_BLOCK
12 JUMP_FORWARD 41 (to 56)

4 >> 15 DUP_TOP
16 LOAD_GLOBAL 2 (KeyError)
19 COMPARE_OP 10 (exception match)
22 JUMP_IF_FALSE 29 (to 54)
25 POP_TOP
26 POP_TOP
27 POP_TOP
28 POP_TOP
29 LOAD_FAST 0 (cache)
32 LOAD_ATTR 3 (setdefault)
35 LOAD_FAST 1 (args)
38 LOAD_DEREF 0 (func)
41 LOAD_FAST 1 (args)
44 CALL_FUNCTION_V AR 0
47 CALL_FUNCTION 2
50 RETURN_VALUE
51 JUMP_FORWARD 2 (to 56)[color=blue][color=green]
>> 54 POP_TOP[/color][/color]
55 END_FINALLY[color=blue][color=green]
>> 56 LOAD_CONST 0 (None)[/color][/color]
59 RETURN_VALUE

[color=blue]
>this without changing the calling characteristics of the function, which the
>OP has stated is vital.[/color]
Should be ok, but I wonder about non-hashable args. Never expect them?
[color=blue]
>
>It is probably possible to custom-build a code object which makes _cache a
>local reference to the outer cache, and play with the bytecode to make the
>LOAD_DEREF into a LOAD_FAST, but it's not worth the effort. I don't know
>how fast/slow a LOAD_DEREF is compared to a LOAD_FAST, though. In any case,
>it won't help more than a few percent.[/color]
The above seems to do it for the cache-hit case ;-)[color=blue]
>
>Jacek, I think this is as fast as pure Python will get you with this
>approach. If the speed is simply not acceptable, there are some things you
>need to consider, Is this the *real* bottleneck?[/color]

Regards,
Bengt Richter

**Jacek Generowicz** · Jul 18 '05, 06:35 AM

Re: Speed: bytecode vz C API calls

Peter Otten <__peter__@web. de> writes:
[color=blue]
> See below for my ideas towards a generalized memoizing framework.[/color]

[I haven't the time to have a look at it in any detail right now, so
forgive me for not responding/commenting in full, just yet.]
[color=blue][color=green]
> > def foo(some_type):
> > ...
> > return something
> >
> >
> > foo = memoize(foo)
> >
> > data = [int, float, my_class, his_class, str, other_class] * 10**6
> >
> > map(foo, data) [+][/color]
>
> Do you discard the result for real?[/color]

No, sorry, of course not.
[color=blue][color=green]
> > [+] Can you see why I don't want to mess with the call site ?[/color]
>
> No.[/color]

It would mean taking it out of the map and that would mean losing the
C-coded loop that map provides.
[color=blue]
> Now to my code. The idea is to make memoized functions available to
> different modules from a central registry, and to expose the cached results
> for hand-optimized hotspots. Otherwise it was all mentioned in this thread,
> if not the original post.[/color]

I'll make sure to find the time to study this.

**Jacek Generowicz** · Jul 18 '05, 06:35 AM

Re: Speed: bytecode vz C API calls

bokr@oz.net (Bengt Richter) writes:
[color=blue]
> On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <francisgavila@ yahoo.com> wrote:
>[color=green]
> >Stuart Bishop wrote in message ...[color=darkred]
> >>About the only thing that could be done then is:
> >>
> >>def memoize(callabl e):
> >> cache = {}
> >> def proxy(*args, _cache=cache):[/color][/color][/color]

To be honest, I have trouble believing that a local variable lookup
would be significantly faster than a closure lookup ... or is there
some other point to this ?
[color=blue][color=green]
> >Not legal. *args must always come after default args. There's no way to do[/color]
> Never say never ;-)
> A function has the __get__ method, which is what getattr magic uses
> to make a bound method. You can use this put cache in the "self"
> position of a bound method, like
>[color=green][color=darkred]
> >>> def sq(x): return x*x[/color][/color]
> ...[color=green][color=darkred]
> >>> def memoize(func): # callable is a builtin ;-)[/color][/color]
> ... def proxy(cache, *args):
> ... try: return cache[args]
> ... except KeyError: return cache.setdefaul t(args, func(*args))
> ... return proxy.__get__({ }, proxy.__class__ )[/color]

[color=blue]
> Looking at the code, it might be worth timing, if you have 90%+ cache hits.
> I'd be curious to see what you get[/color]

I get your version being almost imperceptibly slower.

I used this:

import timing
import sys

def Amemoize(func):
def proxy(cache, *args):
try: return cache[args]
except KeyError: return cache.setdefaul t(args, func(*args))
return proxy.__get__({ }, proxy.__class__ )

def Bmemoize(func):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefaul t(args, func(*args))
return proxy

N = int(sys.argv[1])

def foo(n):
if n<2: return 1
return foo(n-1) + foo(n-2)

Afoo = Amemoize(foo)
Bfoo = Bmemoize(foo)

Acode = """
def Acall():
""" + "Afoo(10);" * N + "\n"

Bcode = """
def Bcall():
""" + "Bfoo(10);" * N + "\n"

exec Acode
exec Bcode

timing.start()
Acall()
timing.finish()
print timing.micro()

timing.start()
Bcall()
timing.finish()
print timing.micro()

**Bengt Richter** · Jul 18 '05, 06:35 AM

Re: Speed: bytecode vz C API calls

On 12 Dec 2003 11:17:16 +0100, Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:
[color=blue]
>bokr@oz.net (Bengt Richter) writes:
>[color=green]
>> On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <francisgavila@ yahoo.com> wrote:
>>[color=darkred]
>> >Stuart Bishop wrote in message ...
>> >>About the only thing that could be done then is:
>> >>
>> >>def memoize(callabl e):
>> >> cache = {}
>> >> def proxy(*args, _cache=cache):[/color][/color]
>
>To be honest, I have trouble believing that a local variable lookup
>would be significantly faster than a closure lookup ... or is there
>some other point to this ?[/color]
No, I think you are right. I was mainly reacting to the "there's no way" statement ;-)
[color=blue]
>[color=green][color=darkred]
>> >Not legal. *args must always come after default args. There's no way to do[/color]
>> Never say never ;-)
>> A function has the __get__ method, which is what getattr magic uses
>> to make a bound method. You can use this put cache in the "self"
>> position of a bound method, like
>>[color=darkred]
>> >>> def sq(x): return x*x[/color]
>> ...[color=darkred]
>> >>> def memoize(func): # callable is a builtin ;-)[/color]
>> ... def proxy(cache, *args):
>> ... try: return cache[args]
>> ... except KeyError: return cache.setdefaul t(args, func(*args))
>> ... return proxy.__get__({ }, proxy.__class__ )[/color]
>
>[color=green]
>> Looking at the code, it might be worth timing, if you have 90%+ cache hits.
>> I'd be curious to see what you get[/color]
>
>I get your version being almost imperceptibly slower.
>
>I used this:[/color]
When things are that close, it's very hard to draw conclusions except that they are
probably close ;-) But to split hairs you probably have to run the two tests separately,
in quiescent system situations, so that the first is not improving or worsening the
environment for the second (i.e., in terms of interpreter state re memory, or even cpu
cache in some cases, though that usually only applies to the first time through a loop.
And also keep the fastest of 5-10 trials, to eliminate glitches due to spurious irrelevant
system activity. I don't have your "timing" module, so I don't know what it does. If it
randomly reordered multiloop trials of A vs B and took minimums, then maybe interactions
could be washed out, but otherwise A & B in the same test source leaves some ambiguity
about the real ranking (which of course will not matter, practically speaking;-)
[color=blue]
>
>import timing
>import sys[/color]
[...]

Regards,
Bengt Richter

**Jacek Generowicz** · Jul 18 '05, 06:35 AM

Re: Speed: bytecode vz C API calls

bokr@oz.net (Bengt Richter) writes:
[color=blue]
> On 12 Dec 2003 11:17:16 +0100, Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:
>[color=green]
> >bokr@oz.net (Bengt Richter) writes:
> >
> >To be honest, I have trouble believing that a local variable lookup
> >would be significantly faster than a closure lookup ... or is there
> >some other point to this ?[/color]
>
> No, I think you are right. I was mainly reacting to the "there's no
> way" statement ;-)[/color]

Sure ... I was sort of answering to a few posts upthread, as I hadn't
done that yet.
[color=blue][color=green]
> >I get your version being almost imperceptibly slower.[/color][/color]
[color=blue]
> When things are that close, it's very hard to draw conclusions
> except that they are probably close ;-) But to split hairs you
> probably have to run the two tests separately, in quiescent system
> situations,[/color]

.... when it's that close, I just don't care :-)

When it's that close, the Python version, CPU, temperature, phase of
the moon etc. are probably more significant than the algorithm
.... and, most importantly, my time is better spent on finding more
effective ways of speeding up my code.
[color=blue]
> I don't have your "timing" module,[/color]

Are you sure? It's on every Python installation I currently have
access to, so I assumed it was in the standard distribution. Hmm
.... OK, the library reference lists it under "Obsolete".

Speed: bytecode vz C API calls

Comment

Comment

Comment

Comment

Comment