Speed: bytecode vz C API calls

**Jacek Generowicz** · Jul 18 '05, 06:31 AM

Re: Speed: bytecode vz C API calls

Peter Otten <__peter__@web. de> writes:
[color=blue]
> Jacek Generowicz wrote:
>[color=green]
> > aahz@pythoncraf t.com (Aahz) writes:
> >[color=darkred]
> >> I think you're going to need to write a solution that doesn't call
> >> functions (function unrolling, so to speak). Instead of returning a
> >> memoizer function, just use a dict of dicts.[/color]
> >
> > Sounds interesting. But how do I get the dictionary to use the
> > function which it is memoizing, to calculate values which haven't been
> > cached yet?[/color]
>
> Guessing into the blue:
>
> try:
> result = cache[fun, args]
> except KeyError:
> result = cache[fun, args] = fun(args)[/color]

Allow me to remind you of the memoizer I posted at the top of the thread:

def memoize(callabl e):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefaul t(args, callable(*args) )
return proxy

One key feature of this is that I can use it to replace any[*]
function with a functionally equivalent but faster one, without having
to change any of the call sites of the original.

IIUC, Aahz suggested to replace the proxy function with a
dictionary. This sounds reasonable, as, after all, the proxy maps its
arguments to its return values, while a dictionary maps keys to
values. The whole point of this is that function call overhead is
large compared to dictionary lookup, which is (almost) all that the
proxy does. However, it's the "almost" that creates the problem. If
I'm prepared to mess around with the calls to the original functions
and replace them with your suggestion, then, of course the problem is
solved ... however, I _really_ do not want to replace the calls to the
original ...
[color=blue]
> Maybe it's time to show some inner loop lookalike to the wider
> public for more to the point suggestions...[/color]

Sure.

def foo(some_type):
...
return something

foo = memoize(foo)

data = [int, float, my_class, his_class, str, other_class] * 10**6

map(foo, data) [+]

[*] ... with hashable arguments, etc. etc.

[+] Can you see why I don't want to mess with the call site ?

**Jacek Generowicz** · Jul 18 '05, 06:31 AM

Re: Function unrolling (was Re: Speed: bytecode vz C API calls)

aahz@pythoncraf t.com (Aahz) writes:
[color=blue]
> In article <tyfd6ayylhy.fs f@pcepsft001.ce rn.ch>,
> Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=green]
> >aahz@pythoncra ft.com (Aahz) writes:[color=darkred]
> >>
> >> I think you're going to need to write a solution that doesn't call
> >> functions (function unrolling, so to speak). Instead of returning a
> >> memoizer function, just use a dict of dicts.[/color][/color][/color]
^^^^^^^^

(Maybe this is where we are misunderstandin g eachother. On first
reading, I thought that "memoizer" was just a typo. I don't return a
memoizer function, I return a memoized function.)
[color=blue][color=green]
> >Sounds interesting. But how do I get the dictionary to use the
> >function which it is memoizing, to calculate values which haven't been
> >cached yet?[/color]
>
> Same way you decide which function to call currently.[/color]

The way I decide which function to call currently is to type its name
directly into the source at the call site. Deciding which function to
call is not the problem: calling any function at all, once I've
reduced myself to having _just_ a dictionary, is the problem.

Maybe I didn't make the use of "memoize" clear enough.

You use it thus:

def foo():
...

foo = memoize(foo)

And from now on foo is a lot faster than it was before[*].
[color=blue]
> Essentially, you do something like this:
>
> def foo():
> pass
>
> def bar():
> pass
>
> lookup = {
> 'foo': {'func': foo, 'cache': {}},
> 'bar': {'func': bar, 'cache': {}},
> }
>
> for operation,args in WorkQueue:
> curr = lookup[operation]
> try:
> return curr['cache'][args]
> except KeyError:
> return curr['cache'].setdefault(arg s, curr['func'](*args))[/color]

Nononononooooo ! :-)

In any given loop, I know _which_ function I want to call, so the
dictionary of dictionaries is pointless. Let's recap the original
memoizer:

def memoize(callabl e):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefaul t(args, callable(*args) )
return proxy

I thought that you were suggesting to replace "proxy" with just the
dictionary "cache", in order to save the function call overhead, as
proxy is (almost) just a wrapper around the dictionary lookup. That's
great, but what do I do in the (very rare) cases where I'm presented
with an argument/key I haven't seen yet? Somehow, I need to recognize
that the key is not present, call the original function, and add the
key-value pair to the cache. I can't see how to do that without
re-introducing the function call. I could to it by inlining the try
block, but then I wouldn't be able to use the memoized function inside
map ... and turning a map into a for-loop would lose me more than I'd
gained.

[*] With a whole bunch of exceptions which are beside the point here.

**Stuart Bishop** · Jul 18 '05, 06:31 AM

Re: Speed: bytecode vz C API calls

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/12/2003, at 8:55 AM, Jacek Generowicz wrote:
[color=blue]
> Peter Otten <__peter__@web. de> writes:
>[color=green]
>> Jacek Generowicz wrote:
>>[color=darkred]
>>> aahz@pythoncraf t.com (Aahz) writes:
>>>
>>>> I think you're going to need to write a solution that doesn't call
>>>> functions (function unrolling, so to speak). Instead of returning a
>>>> memoizer function, just use a dict of dicts.
>>>
>>> Sounds interesting. But how do I get the dictionary to use the
>>> function which it is memoizing, to calculate values which haven't
>>> been
>>> cached yet?[/color]
>>
>> Guessing into the blue:
>>
>> try:
>> result = cache[fun, args]
>> except KeyError:
>> result = cache[fun, args] = fun(args)[/color]
>
> Allow me to remind you of the memoizer I posted at the top of the
> thread:
>
> def memoize(callabl e):
> cache = {}
> def proxy(*args):
> try: return cache[args]
> except KeyError: return cache.setdefaul t(args,
> callable(*args) )
> return proxy
>
> One key feature of this is that I can use it to replace any[*]
> function with a functionally equivalent but faster one, without having
> to change any of the call sites of the original.[/color]

[color=blue]
> proxy does. However, it's the "almost" that creates the problem. If
> I'm prepared to mess around with the calls to the original functions
> and replace them with your suggestion, then, of course the problem is
> solved ... however, I _really_ do not want to replace the calls to the
> original ...[/color]

About the only thing that could be done then is:

def memoize(callabl e):
cache = {}
def proxy(*args, _cache=cache):
try: return _cache[args]
except KeyError: return cache.setdefaul t(args, callable(*args) )
return proxy

Might help a little bit, although probably not enough.

- --
Stuart Bishop <stuart@stuartb ishop.net>

Stuart Bishop's Home Page

http://www.stuartbishop.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQE/1n5TAfqZj7rGN0o RAlSQAJ9WZe39F8 L1KECCMTS3HxUnW KGx8wCgjZ8P
68pLTP33rlFI0eS EHK8FVXY=
=25Fx
-----END PGP SIGNATURE-----

**Francis Avila** · Jul 18 '05, 06:31 AM

Re: Speed: bytecode vz C API calls

Stuart Bishop wrote in message ...[color=blue]
>About the only thing that could be done then is:
>
>def memoize(callabl e):
> cache = {}
> def proxy(*args, _cache=cache):[/color]

Not legal. *args must always come after default args. There's no way to do
this without changing the calling characteristics of the function, which the
OP has stated is vital.

It is probably possible to custom-build a code object which makes _cache a
local reference to the outer cache, and play with the bytecode to make the
LOAD_DEREF into a LOAD_FAST, but it's not worth the effort. I don't know
how fast/slow a LOAD_DEREF is compared to a LOAD_FAST, though. In any case,
it won't help more than a few percent.

Jacek, I think this is as fast as pure Python will get you with this
approach. If the speed is simply not acceptable, there are some things you
need to consider, Is this the *real* bottleneck?

Consider this before you delve into C solutions, which greatly increase
headaches. An easy way to check is to cut out the map and memoization and
just use a dictionary preloaded with known argument:value pairs for a given
set of test data.

e.g.,

def func(...):
...

data = [foo, bar, ....]
cache = dict([(i, func(i)) for i in data])

def timetrial(_data =data, _cache=cache):
return map(_cache.__ge titem__, _data)
# [operator.getite m(_cache, i) for i in _data] *might* be faster.

then timeit.py -s"import ttrial" "ttrial.timetri al()"

If it's still too slow, then either Python simply isn't for this task, or
you need to optimize your functions.

If this isn't the real bottleneck:
- Perhaps the functions I'm memoizing need optimization themselves?
- Are there any modules out there which implement these in C already?
- How does the competing C++ code do it? Does it get its speed from some
wicked data structure for the memoizing, or are the individual functions
faster, or what?

If this *is* the only significant bottleneck, but you want to try harder to
speed it up, you can pursue the C closure path you were mentioning. With
the caveat that I don't do any CPython hacking, I know that closures are
implemented using co_cellvars and the LOAD_DEREF bytecode. The Python C api
does describe an api to cell objects, so that will be the place to look, if
it's possible to do a closure in C. It may not be faster than your
straightforward approach, of course.

However, comparing your Python to a C++ version on the basis of speed is
most certainly the wrong way to go. If the C++ is well designed with good
algorithms, its going to be faster than Python. Period. The advantage of
Python is that it's much faster and easier to *write* (and less likely to
introduce bugs), easier to *read* and maintain, and can model your problem
more intuitively and flexibly. But if you need maximum speed at all costs,
the best Python can do for you is act as a glue language to bits and pieces
of hand-crafted C.

Do you really *need* the speed, or are you just bothered that it's not as
fast as the C++ version?
--
Francis Avila

**Jacek Generowicz** · Jul 18 '05, 06:32 AM

Re: Speed: bytecode vz C API calls

"Francis Avila" <francisgavila@ yahoo.com> writes:
[color=blue]
> It is probably possible to custom-build a code object which makes _cache a
> local reference to the outer cache, and play with the bytecode to make the
> LOAD_DEREF into a LOAD_FAST, but it's not worth the effort.[/color]

I'm tempted to agree.
[color=blue]
> Jacek, I think this is as fast as pure Python will get you with this
> approach.[/color]

That's pretty much what I thought ... which is why I was (reluctantly)
investigating the C approach.
[color=blue]
> If the speed is simply not acceptable, there are some things you
> need to consider, Is this the *real* bottleneck?[/color]

Well, when I profile the code, proxy comes out at the top in terms of
total time. Yes, there are other functions that are hotspots too. Yes,
maybe I'm being completely stupid, and 99% of those calls are unnecessary.

However, I simply thought that if I can increase the speed of "proxy"
by a factor of 3 or 7 or something, without too much pain, by recoding
it in C, then I'd get a noticeable speed gain overall. If it involves
re-hacking the interpreter, then I won't bother.
[color=blue]
> Consider this before you delve into C solutions, which greatly increase
> headaches.[/color]

You don't need to tell me that :-)
[color=blue]
> An easy way to check is to cut out the map and memoization and
> just use a dictionary preloaded with known argument:value pairs for a given
> set of test data.
>
> e.g.,
>
> def func(...):
> ...
>
> data = [foo, bar, ....]
> cache = dict([(i, func(i)) for i in data])
>
> def timetrial(_data =data, _cache=cache):
> return map(_cache.__ge titem__, _data)
> # [operator.getite m(_cache, i) for i in _data] *might* be faster.
>
> then timeit.py -s"import ttrial" "ttrial.timetri al()"[/color]

I tried something similar, namely inlining the

try: return cache[args]
except KeyError: return cache.setdefaul t(args, callable(*args) )

rather than using proxy. This gives a little over a factor of 3
speedup, but it's not something I can use everywhere, as often the
proxy is called inside map(...)
[color=blue]
> If it's still too slow, then either Python simply isn't for this task, or
> you need to optimize your functions.
>
> If this isn't the real bottleneck:
> - Perhaps the functions I'm memoizing need optimization themselves?[/color]

But, to a pretty good approximation, they _never_ get called (the
value is (almost) always looked up). How would optimizing a function
that doesn't get called, help ?
[color=blue]
> - Are there any modules out there which implement these in C
> already?[/color]

Other than the competing C++ implementation, no.
[color=blue]
> - How does the competing C++ code do it? Does it get its speed from some
> wicked data structure for the memoizing, or are the individual functions
> faster, or what?[/color]

I have about 10 functions, written in Python, which the profiler
highlights as taking up most of the time, between them (of which
"proxy" is the main culprit). The C++ version doesn't have _any_
Python in its inner loops. I think it's as simple as that.

(Admittedly, a group of 4 of my functions was initially written with
development ease in mind, and should be replaced by a single mean and
lean C(++) function. That's probably where the real bottleneck is.)
[color=blue]
> If this *is* the only significant bottleneck,[/color]

No, it's not the _only_ bottleneck.

Yes, I am pursuing other paths too. But I feel more confident that I
know what I am doing in the other cases, so I won't bother the list
with those. If you remember, my original question was about what I
could expect in terms of speedup when recoding a Python function in C,
which will make many calls to the Python C API. It's not that I'm
determined to fix my whole speed problem just in this one area, but I
would like to understand a bit more about what to expect when going to
C.
[color=blue]
> but you want to try harder to speed it up, you can pursue the C
> closure path you were mentioning.[/color]

Yes, I think that this could be edifying. It's also likely to be a
huge pain, so I'm not that determined to pursue it, as I do have more
productive things to be getting on with.
[color=blue]
> However, comparing your Python to a C++ version on the basis of speed is
> most certainly the wrong way to go. If the C++ is well designed with good
> algorithms, its going to be faster than Python. Period.[/color]

Not if my inner loops are C.

(BTW, I don't believe that the C++ version is all that wonderfully
designed, but enough about that :-)
[color=blue]
> But if you need maximum speed at all costs, the best Python can do
> for you is act as a glue language to bits and pieces of hand-crafted
> C.[/color]

I think that it's the other way around, in this case. I build up a
whole structure in Python ... and just need C(++) to call a very small
subset of the functionality I provide, very efficiently.
[color=blue]
> Do you really *need* the speed, or are you just bothered that it's not as
> fast as the C++ version?[/color]

Politically speaking, yes, I need the speed. If a silly benchmark (one
which completely misrepresents reality, BTW), shows that the Python
version is slower than the C++ one, then the Python one is terminated.

However, as you point out:
[color=blue]
> The advantage of Python is that it's much faster and easier to
> *write* (and less likely to introduce bugs), easier to *read* and
> maintain, and can model your problem more intuitively and flexibly.[/color]

.... and for these reasons I am very keen to ensure that the Python
version survives ... which, ironically enough, means that it must be
fast, even though I personally don't care that much about the speed.

**Terry Reedy** · Jul 18 '05, 06:32 AM

Re: Speed: bytecode vz C API calls

You presented an example function as taking a type (including new classes)
as parameter. If your type domain is boundedly finite and preknowable*, you
could precompute the function for *all* inputs and replace the call with a
dict lookup (or bound method for map).

(*) If 'preknowable', could either create classes with factory function that
adds them to cache or give metaclass that does same.

Terry

**Aahz** · Jul 18 '05, 06:32 AM

Re: Speed: bytecode vz C API calls

In article <tyf8yllyq51.fs f@pcepsft001.ce rn.ch>,
Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
>
>I tried something similar, namely inlining the
>
> try: return cache[args]
> except KeyError: return cache.setdefaul t(args, callable(*args) )
>
>rather than using proxy. This gives a little over a factor of 3
>speedup, but it's not something I can use everywhere, as often the
>proxy is called inside map(...)[/color]

Why are you using map()? Generally speaking, map() won't be any faster
than a for loop.
--
Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

**Aahz** · Jul 18 '05, 06:32 AM

Re: Function unrolling (was Re: Speed: bytecode vz C API calls)

[Too busy to snip, sorry]

In article <tyfvfopzkpr.fs f@lxplus025.cer n.ch>,
Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
>aahz@pythoncra ft.com (Aahz) writes:[color=green]
>> In article <tyfd6ayylhy.fs f@pcepsft001.ce rn.ch>,
>> Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=darkred]
>>>aahz@pythonc raft.com (Aahz) writes:
>>>>
>>>> I think you're going to need to write a solution that doesn't call
>>>> functions (function unrolling, so to speak). Instead of returning a
>>>> memoizer function, just use a dict of dicts.[/color][/color]
> ^^^^^^^^
>
>(Maybe this is where we are misunderstandin g eachother. On first
>reading, I thought that "memoizer" was just a typo. I don't return a
>memoizer function, I return a memoized function.)
>[color=green][color=darkred]
>>>Sounds interesting. But how do I get the dictionary to use the
>>>function which it is memoizing, to calculate values which haven't been
>>>cached yet?[/color]
>>
>> Same way you decide which function to call currently.[/color]
>
>The way I decide which function to call currently is to type its name
>directly into the source at the call site. Deciding which function to
>call is not the problem: calling any function at all, once I've
>reduced myself to having _just_ a dictionary, is the problem.
>
>Maybe I didn't make the use of "memoize" clear enough.
>
>You use it thus:
>
> def foo():
> ...
>
> foo = memoize(foo)
>
>And from now on foo is a lot faster than it was before[*].[/color]

That's why I changed the Subject: to "Function unrolling"; you're going
to need to change your code to avoid function calls in inner loops. One
technique you might try is to create a generalized inner loop function:

def innerloop(itera ble, func, cache):
l = []
for args in iterable:
try:
l.append(cache[args])
except KeyError:
l.append(cache. setdefault(args , func(args))

return l

Yes, it's a performance hack, but I bet it's a lot clearer than C++
code.
--
Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

**Jacek Generowicz** · Jul 18 '05, 06:32 AM

Re: Speed: bytecode vz C API calls

aahz@pythoncraf t.com (Aahz) writes:
[color=blue]
> Generally speaking, map() won't be any faster than a for loop.[/color]

This is at odds with my own experience (including a lot of timing runs
in the last two days), countless posts on this newsgroup over the
years, and, for exapmle, this article

Welcome to Python.org

http://www.python.org/doc/essays/list2str.html

The official home of the Python Programming Language

by Guido, in which he concludes that

"An implied loop in map() is faster than an explicit for loop"

**Aahz** · Jul 18 '05, 06:32 AM

Re: Speed: bytecode vz C API calls

In article <tyfu148y6ye.fs f@pcepsft001.ce rn.ch>,
Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
>aahz@pythoncra ft.com (Aahz) writes:[color=green]
>>
>> Generally speaking, map() won't be any faster than a for loop.[/color]
>
>This is at odds with my own experience (including a lot of timing runs
>in the last two days), countless posts on this newsgroup over the
>years, and, for exapmle, this article
>
> http://www.python.org/doc/essays/list2str.html[/color]

All right, what I should have said is that map() isn't enough faster
than a for loop in the cases where it is faster to make much difference,
and there are definitely cases where it's slower than the equivalent for
loop, and it's usually less readable than some other way of getting
speed.
--
Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

**Jacek Generowicz** · Jul 18 '05, 06:32 AM

Re: Function unrolling (was Re: Speed: bytecode vz C API calls)

aahz@pythoncraf t.com (Aahz) writes:
[color=blue]
> That's why I changed the Subject: to "Function unrolling"; you're going
> to need to change your code to avoid function calls in inner loops. One
> technique you might try is to create a generalized inner loop function:
>
> def innerloop(itera ble, func, cache):
> l = []
> for args in iterable:
> try:
> l.append(cache[args])
> except KeyError:
> l.append(cache. setdefault(args , func(args))
>
> return l[/color]

OK, I see what you mean.

I agree, I can do this, in some places, and toy tests suggest that
getting rid of the function call could give me a factor of 3
improvement. I'll have to change the code in many places, but that's
simpler than going to C.

(However, we disagree on the specific formula you give above: I think
that map is faster: see other post in the thread. [A quick test
suggests that a map is faster than the loop by a factor of 2.])
[color=blue]
> Yes, it's a performance hack, but I bet it's a lot clearer than C++
> code.[/color]

No doubt about that :-)

Cheers,

**Jacek Generowicz** · Jul 18 '05, 06:32 AM

Re: Speed: bytecode vz C API calls

"Terry Reedy" <tjreedy@udel.e du> writes:
[color=blue]
> You presented an example function as taking a type (including new classes)
> as parameter. If your type domain is boundedly finite[/color]

It's unboundedly finite :-)
[color=blue]
> and preknowable*, you could precompute the function for *all* inputs
> and replace the call with a dict lookup (or bound method for map).
>
> (*) If 'preknowable', could either create classes with factory function that
> adds them to cache or give metaclass that does same.[/color]

Hmmm ... all the non-builtin types in my domain are actually
dynamically created, and are themselves instances of a metaclass,
which could be instructed to add information about the new classes, to
the caches, as the classes are being created.

However, there are a number of problems I foresee (which I don't think
are particularly interesting to discuss here). Still, it could be put
to good use in some of the cases. I'll bear it in mind, thanks.

But first, I think that I'll try inlining the exception handler in
those cases where it can be done.

Cheers,

**Francis Avila** · Jul 18 '05, 06:33 AM

Re: Speed: bytecode vz C API calls

Aahz wrote in message ...[color=blue]
>In article <tyfu148y6ye.fs f@pcepsft001.ce rn.ch>,
>Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=green]
>>aahz@pythoncr aft.com (Aahz) writes:
>> http://www.python.org/doc/essays/list2str.html[/color]
>
>All right, what I should have said is that map() isn't enough faster
>than a for loop in the cases where it is faster to make much difference,
>and there are definitely cases where it's slower than the equivalent for
>loop, and it's usually less readable than some other way of getting
>speed.[/color]

That essay is a bit old, so not everything it says may still be true.

But in 2.3 at least, one case in which map is almost always much faster is
when the function used is a builtin. In this case, map (which is also
builtin) calls the C function from C, and I'm pretty sure it gets this speed
by staying outside of the Python eval loop. This can be wickedly fast (in
Python terms), and faster than list comps or for-loops with explicit list
appending, as a simple timeit will show.

For *Python* functions and lambdas (non-builtins), map can very often be
slower than a for-loop or list comprehension, for reasons not quite clear to
me. And of course, where a for-loop or list comp can avoid calls
alltogether, it will always be faster than the equivalent function-using map
(unless map uses None as the function).

I think in the OP's case, any little drop of speed he can get more than
justifies the very, very slightly un-idiomatic use of map instead of a
for-loop. Be warned that map() may go away in future Pythons, though....
--
Francis Avila

**Jacek Generowicz** · Jul 18 '05, 06:33 AM

Re: Speed: bytecode vz C API calls

"Francis Avila" <francisgavila@ yahoo.com> writes:
[color=blue]
> Aahz wrote in message ...[/color]

[about map]
[color=blue][color=green]
> > and it's usually less readable than some other way of getting
> > speed.[/color][/color]

Actually, I use map in my code because I find it very elegant and very
readable[*]. Usually, the speed has nothing to do with it. But if I
already have a map, and I'm interested in speed then I'm loathe to
replace it with something slower.
[color=blue]
> I think in the OP's case, any little drop of speed he can get more than
> justifies the very, very slightly un-idiomatic use of map instead of a
> for-loop.[/color]

Why do you consider my use of map to be un-idiomatic? I want to
transform a sequence into another one, by applying a function to each
element of the original. This is exactly the point of map. What am I missing?
[color=blue]
> Be warned that map() may go away in future Pythons, though....[/color]

Yes, I've heard idle rumours to this effect. I think it would be a
great shame.

Yes, I realize that there are list comprehensions. I appreciate both,
and find that sometimes one looks more natural than the other, and
sometimes it's the other way around. I find this ability to pick the
right-looking idiom worth more than a One Way To Do it principle.

[*] There seems to be a faction in the Python community which
considers map, reduce, filter and lambda to be evil infiltrators
from those perfidious inventions that are functional languages,
and therefore the claim is made that they have no place in
Python. Some newbies believe this **** and the myth spreads. Just
my 2 cents.

**Peter Otten** · Jul 18 '05, 06:34 AM

Re: Speed: bytecode vz C API calls

Jacek Generowicz wrote:
[color=blue]
> One key feature of this is that I can use it to replace any[*]
> function with a functionally equivalent but faster one, without having
> to change any of the call sites of the original.[/color]

Understood, but not until this post.
Strictly speaking, all optimizations that fulfill the above criterium should
be built into the compiler, while memoizing will not for its memory
tradeoffs and dependancy of the input data.
[color=blue]
> IIUC, Aahz suggested to replace the proxy function with a
> dictionary. This sounds reasonable, as, after all, the proxy maps its
> arguments to its return values, while a dictionary maps keys to
> values. The whole point of this is that function call overhead is
> large compared to dictionary lookup, which is (almost) all that the
> proxy does. However, it's the "almost" that creates the problem. If
> I'm prepared to mess around with the calls to the original functions
> and replace them with your suggestion, then, of course the problem is
> solved ... however, I _really_ do not want to replace the calls to the
> original ...[/color]

See below for my ideas towards a generalized memoizing framework. I've
adressed the problem by writing a map() variant that knows about result
caches (essentially Aahz' innerloop() I see). You could use it to shade the
builtin.
[color=blue]
> def foo(some_type):
> ...
> return something
>
>
> foo = memoize(foo)
>
> data = [int, float, my_class, his_class, str, other_class] * 10**6
>
> map(foo, data) [+][/color]

Do you discard the result for real?
[color=blue]
> [+] Can you see why I don't want to mess with the call site ?[/color]

No.

Now to my code. The idea is to make memoized functions available to
different modules from a central registry, and to expose the cached results
for hand-optimized hotspots. Otherwise it was all mentioned in this thread,
if not the original post.

<memoize.py>
import __builtin__

class RegistryItem:
""" provide the necessary infrastructure to transparently speed up
one function
"""
def __init__(self, func, argc):
cache = self.cache = {}
def memoized(*args) :
try:
return cache[args]
except KeyError:
result = cache[args] = func(*args)
return result

assert argc > 0
if argc == 1:
def callseq(seq):
print seq[:3]
result = []
append = result.append
for arg in seq:
try:
append(cache[arg])
except KeyError:
value = cache[arg] = func(arg)
append(value)
return result
else:
def callseq(*seqs):
result = []
append = result.append
if len(seqs) == 1:
# multiple args as one tuple in the sequence
for args in seqs[0]:
try:
append(cache[args])
except KeyError:
value = cache[args] = func(*args)
append(value)
else:
# multiple args in multiple sequences
# XXX special case (too lazy for now)
return map(memoized, *seqs)
return result
self.wrapped = memoized
self.callseq = callseq
self.original = func

class Registry:
""" Registry for function variants using a result cache
fast = register(slow, <number of args>)
"""
def __init__(self):
self.data = {}
def __call__(self, func, argc):
assert argc > 0
try:
return self.data[func]
except KeyError:
ri = RegistryItem(fu nc, argc)
self.data[func] = ri
self.data[ri.wrapped] = ri
return ri.wrapped

def __getitem__(sel f, func):
return self.data[func]

def getmap(self):
""" provide a cache-enhanced version of map
"""
data = self.data
def memomap(func, *seqs):
try:
callseq = data[func].callseq
except KeyError:
return __builtin__.map (func, *seqs)
return callseq(*seqs)
return memomap

registry = register = Registry()
</memoize.py>

<example.py>
# for improved clarity of this example I've have omitted
# rebinding the original names and instead used fastXXX
import memoize, time

def slow(arg):
print "no cache or cache miss with %r" % arg
time.sleep(0.1)
return "<%s>" % arg

sample = range(10) * 100

# preparation code

fast = memoize.registe r(slow, True)
# typically used like so:
# slow = memoize.registe r(slow)

fastmap = memoize.registr y.getmap()
# could also be used like so:
# map = memoize.registr y[slow].getmap()

# some tests
assert fast is memoize.registr y[fast].wrapped
assert fast is memoize.registr y[slow].wrapped
assert slow is memoize.registr y[slow].original

# stray function call
print slow(10)

# stray function call using cache
print fast(20)

#built-in loop support
fastmap(slow, sample)
fastmap(fast, sample)

# hand-crafted code using the cache
cache = memoize.registr y[slow].cache
for arg in sample:
cache[arg]

</example.py>

Feel free to play with the above. Be warned that there will be bugs.
However, before working on the code I will have to profile to make at least
sure it does not slow down things... maybe this weekend.

Peter

Speed: bytecode vz C API calls

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment