Speed: bytecode vz C API calls

**Michael Hudson** · Jul 18 '05, 06:28 AM

Re: Speed: bytecode vz C API calls

Jacek Generowicz <jacek.generowi cz@cern.ch> writes:
[color=blue]
> Could anyone suggest how I could squeeze more speed out of the
> memoizer? (I'll include the core of my memoize extension type at the
> below.)[/color]

You *might* get a speed bump by making your memoized callable a method
of the memoizer object rather than implementing tp_call.

Cheers,
mwh

--
M-x psych[TAB][RETURN]
-- try it

**Jacek Generowicz** · Jul 18 '05, 06:28 AM

Re: Speed: bytecode vz C API calls

Jacek Generowicz <jacek.generowi cz@cern.ch> writes:
[color=blue]
> I was expecting C-extensions which make lots of calls to Python C API
> functions, not to be spectacularly fast, but I'm still a little
> disapponited with what I've got. Does this sort of speedup (or rather,
> lack of it) seem right, to those of you experienced with this sort of
> thing? or does it look like I'm doing it wrong?[/color]
[color=blue]
> static int
> memoize_init(me moizeObject* self, PyObject* args, PyObject* kwds) {
> PyArg_ParseTupl e(args, "O", &(self->fn));[/color]

Obvious bug: &(self->cache) !!
[color=blue]
> Py_INCREF(self->fn);
> self->cache = PyDict_New();
> return 0;
> }[/color]

Fixing the bug gives me a factor of 3 speedup over the class-based
version.

That's a bit more like what I was expecting.

**Terry Reedy** · Jul 18 '05, 06:28 AM

Re: Speed: bytecode vz C API calls

"Jacek Generowicz" <jacek.generowi cz@cern.ch> wrote in message
news:tyf7k171jb q.fsf@pcepsft00 1.cern.ch...[color=blue]
> I have a program in which I make very good use of a memoizer:
>
> def memoize(callabl e):
> cache = {}
> def proxy(*args):
> try: return cache[args]
> except KeyError: return cache.setdefaul t(args, callable(*args) )
> return proxy[/color]
....[color=blue]
> I've got to the stage where my program is still not fast enough, and
> calls to the memoizer proxy are topping profiler output table. So I
> thought I'd try to see whether I can speed it up by recoding it in C.[/color]

Have you tried psyco on above? Or rather, on the proxies?
Can any of your callables be memoized by list rather than dict?
(ie, any count or restricted int args?)

If you have just a few wrapped functions, faster, special-cased proxies
(with exact arg lists) would be feasible and possibly easier for psyco.
Special-casing would include inlining the callable when possible
-- or rather, inlining the memoization within the function.

TJR

**Aahz** · Jul 18 '05, 06:28 AM

Re: Speed: bytecode vz C API calls

In article <tyf7k171jbq.fs f@pcepsft001.ce rn.ch>,
Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
>
>I have a program in which I make very good use of a memoizer:
>
> def memoize(callabl e):
> cache = {}
> def proxy(*args):
> try: return cache[args]
> except KeyError: return cache.setdefaul t(args, callable(*args) )
> return proxy
>
>I've got to the stage where my program is still not fast enough, and
>calls to the memoizer proxy are topping profiler output table. So I
>thought I'd try to see whether I can speed it up by recoding it in C.[/color]

I'm wondering whether you're tackling something the wrong way or perhaps
you need to throw hardware at the problem. Dicts are one of the most
highly optimized parts of Python, and if your cache hit rate is higher
than 90%, you'll be dealing with exceptions only rarely, so the speed of
cache.setdefaul t() isn't where your time is going. I'm wondering whether
you're hitting some kind of memory issue or something, or possibly you're
having cache effects (in CPU/memory), or maybe even your key is poorly
constructed so that you're getting hash duplicates.

OTOH, if the cache hit rate is smaller than that, you'll be paying the
penalty for raised exceptions *and* the callable() call, in which case
redesigning that part will pay the highest dividends.
--
Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

**Jacek Generowicz** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

"Terry Reedy" <tjreedy@udel.e du> writes:
[color=blue]
> "Jacek Generowicz" <jacek.generowi cz@cern.ch> wrote in message
> news:tyf7k171jb q.fsf@pcepsft00 1.cern.ch...[color=green]
> > I have a program in which I make very good use of a memoizer:
> >
> > def memoize(callabl e):
> > cache = {}
> > def proxy(*args):
> > try: return cache[args]
> > except KeyError: return cache.setdefaul t(args, callable(*args) )
> > return proxy[/color]
> ...[color=green]
> > I've got to the stage where my program is still not fast enough, and
> > calls to the memoizer proxy are topping profiler output table. So I
> > thought I'd try to see whether I can speed it up by recoding it in C.[/color]
>
> Have you tried psyco on above?[/color]

Nope, but psyco would be politically unacceptable for my users, in
this case. OTOH, I've been wanting to play with psyco for ages ...
[color=blue]
> Can any of your callables be memoized by list rather than dict?
> (ie, any count or restricted int args?)[/color]

You've lost me there.

**Jacek Generowicz** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

aahz@pythoncraf t.com (Aahz) writes:
[color=blue]
> In article <tyf7k171jbq.fs f@pcepsft001.ce rn.ch>,
> Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=green]
> >
> >I have a program in which I make very good use of a memoizer:
> >
> > def memoize(callabl e):
> > cache = {}
> > def proxy(*args):
> > try: return cache[args]
> > except KeyError: return cache.setdefaul t(args, callable(*args) )
> > return proxy
> >
> >I've got to the stage where my program is still not fast enough, and
> >calls to the memoizer proxy are topping profiler output table. So I
> >thought I'd try to see whether I can speed it up by recoding it in C.[/color]
>
> I'm wondering whether you're tackling something the wrong way[/color]

There _is_ always that possibility, but I don't think that's what my
problem is here (it may well be elsewhere in my program).
[color=blue]
> or perhaps you need to throw hardware at the problem.[/color]

The definition of "not fast enough" in this case, is "much slower than
a competing C++ implementation" , so throwing hardware at it advances
me not one bit, I'm afraid.
[color=blue]
> Dicts are one of the most highly optimized parts of Python,[/color]

Which is exactly why I'm using them extensively.
[color=blue]
> and if your cache hit rate is higher than 90%, you'll be dealing
> with exceptions only rarely,[/color]

My cache hit rate is well over 90%.
[color=blue]
> so the speed of cache.setdefaul t() isn't where your time is going.
> I'm wondering whether you're hitting some kind of memory issue or
> something, or possibly you're having cache effects (in CPU/memory),
> or maybe even your key is poorly constructed so that you're getting
> hash duplicates.[/color]

No, I think that my memoizer works just fine. The "problem" is that
I've memoized a lot of functions, and quite a few of them are being
called in my inner loops. If I want to compete on speed with C++, I'm
simply going have to eliminate Python bytecode from my inner loops[*].

[*] Unless my overall algorithm is completely wrong, of course.

**Jacek Generowicz** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

Jacek Generowicz <jacek.generowi cz@cern.ch> writes:
[color=blue]
> Jacek Generowicz <jacek.generowi cz@cern.ch> writes:
>[color=green]
> > static int
> > memoize_init(me moizeObject* self, PyObject* args, PyObject* kwds) {
> > PyArg_ParseTupl e(args, "O", &(self->fn));[/color]
>
> Obvious bug: &(self->cache) !!
>[color=green]
> > Py_INCREF(self->fn);
> > self->cache = PyDict_New();
> > return 0;[/color][/color]

Apologies, I misreported the bug. self->fn is appropriate here
.... using self->fn (instread of self->cache) as the dictionary in the
other function is what was wrong, of course.
[color=blue]
> Fixing the bug gives me a factor of 3 speedup over the class-based
> version.[/color]

This is still true.

**Aahz** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

In article <tyfu14az6so.fs f@pcepsft001.ce rn.ch>,
Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
>aahz@pythoncra ft.com (Aahz) writes:[color=green]
>> In article <tyf7k171jbq.fs f@pcepsft001.ce rn.ch>,
>> Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=darkred]
>>>
>>>I have a program in which I make very good use of a memoizer:
>>>
>>> def memoize(callabl e):
>>> cache = {}
>>> def proxy(*args):
>>> try: return cache[args]
>>> except KeyError: return cache.setdefaul t(args, callable(*args) )
>>> return proxy
>>>
>>>I've got to the stage where my program is still not fast enough, and
>>>calls to the memoizer proxy are topping profiler output table. So I
>>>thought I'd try to see whether I can speed it up by recoding it in C.[/color]
>>
>> I'm wondering whether you're tackling something the wrong way[/color]
>
>There _is_ always that possibility, but I don't think that's what my
>problem is here (it may well be elsewhere in my program).
>[color=green]
>> or perhaps you need to throw hardware at the problem.[/color]
>
>The definition of "not fast enough" in this case, is "much slower than
>a competing C++ implementation" , so throwing hardware at it advances
>me not one bit, I'm afraid.
>[color=green]
>> Dicts are one of the most highly optimized parts of Python,[/color]
>
>Which is exactly why I'm using them extensively.
>[color=green]
>> and if your cache hit rate is higher than 90%, you'll be dealing
>> with exceptions only rarely,[/color]
>
>My cache hit rate is well over 90%.[/color]

All your comments taken as a set make no sense. There's just no way
that a bunch of dict accesses in Python can be much slower than a C++
program. (C program, *maybe* -- not C++)
[color=blue][color=green]
>> so the speed of cache.setdefaul t() isn't where your time is going.
>> I'm wondering whether you're hitting some kind of memory issue or
>> something, or possibly you're having cache effects (in CPU/memory),
>> or maybe even your key is poorly constructed so that you're getting
>> hash duplicates.[/color]
>
>No, I think that my memoizer works just fine. The "problem" is that
>I've memoized a lot of functions, and quite a few of them are being
>called in my inner loops. If I want to compete on speed with C++, I'm
>simply going have to eliminate Python bytecode from my inner loops[*].
>
>[*] Unless my overall algorithm is completely wrong, of course.[/color]

Waitasec .... okay, I see what your problem is, assuming you're
describing it correctly. The problem is that you're calling Python
functions in an inner loop. I think you're going to need to write a
solution that doesn't call functions (function unrolling, so to speak).
Instead of returning a memoizer function, just use a dict of dicts.
--
Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

**Michael Hudson** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

"Terry Reedy" <tjreedy@udel.e du> writes:
[color=blue]
> "Jacek Generowicz" <jacek.generowi cz@cern.ch> wrote in message
> news:tyf7k171jb q.fsf@pcepsft00 1.cern.ch...[color=green]
> > I have a program in which I make very good use of a memoizer:
> >
> > def memoize(callabl e):
> > cache = {}
> > def proxy(*args):
> > try: return cache[args]
> > except KeyError: return cache.setdefaul t(args, callable(*args) )
> > return proxy[/color]
> ...[color=green]
> > I've got to the stage where my program is still not fast enough, and
> > calls to the memoizer proxy are topping profiler output table. So I
> > thought I'd try to see whether I can speed it up by recoding it in C.[/color]
>
> Have you tried psyco on above? Or rather, on the proxies?[/color]

I doubt that would help -- the largest overhead is likely the function
call, and I think calling across the pysco/python boundary is
expensive.

Cheers,
mwh

--
Screaming 14-year-old boys attempting to prove to each other that
they are more 3133t than j00.
-- Reason #8 for quitting slashdot today, from

404 Not Found

http://www.cs.washington.edu/homes/klee/misc/slashdot.html

**Jacek Generowicz** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

aahz@pythoncraf t.com (Aahz) writes:
[color=blue]
> Waitasec .... okay, I see what your problem is, assuming you're
> describing it correctly. The problem is that you're calling Python
> functions in an inner loop.[/color]

.... indeedy ...
[color=blue]
> I think you're going to need to write a solution that doesn't call
> functions (function unrolling, so to speak). Instead of returning a
> memoizer function, just use a dict of dicts.[/color]

Sounds interesting. But how do I get the dictionary to use the
function which it is memoizing, to calculate values which haven't been
cached yet?

Hmm ... maybe by subclassing dict, I can add the necessary
functionality ... but even then I don't see how to avoid wrapping the
dictionary access in a function call ... and then we're back to square
one.

**Peter Otten** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

Jacek Generowicz wrote:
[color=blue]
> aahz@pythoncraf t.com (Aahz) writes:
>[color=green]
>> Waitasec .... okay, I see what your problem is, assuming you're
>> describing it correctly. The problem is that you're calling Python
>> functions in an inner loop.[/color]
>
> ... indeedy ...
>[color=green]
>> I think you're going to need to write a solution that doesn't call
>> functions (function unrolling, so to speak). Instead of returning a
>> memoizer function, just use a dict of dicts.[/color]
>
> Sounds interesting. But how do I get the dictionary to use the
> function which it is memoizing, to calculate values which haven't been
> cached yet?[/color]

Guessing into the blue:

try:
result = cache[fun, args]
except KeyError:
result = cache[fun, args] = fun(args)

should make a fast inner loop if cache misses are rare.
[color=blue]
>
> Hmm ... maybe by subclassing dict, I can add the necessary
> functionality ... but even then I don't see how to avoid wrapping the
> dictionary access in a function call ... and then we're back to square
> one.[/color]

Subclassing dict will not make it faster. Maybe it's time to show some inner
loop lookalike to the wider public for more to the point suggestions...

Peter

**Terry Reedy** · Jul 18 '05, 06:30 AM

Re: Speed: bytecode vz C API calls

"Jacek Generowicz" <jacek.generowi cz@cern.ch> wrote in message
news:tyfy8tmz7h q.fsf@pcepsft00 1.cern.ch...[color=blue]
> "Terry Reedy" <tjreedy@udel.e du> writes:[color=green]
> > Can any of your callables be memoized by list rather than dict?
> > (ie, any count or restricted int args?)[/color]
>
> You've lost me there.[/color]

Lookup tables. Dicts can memoize any function. Lists can memoize or
partially memoize functions with one or more periodic domains. Perhaps I am
being too obvious or simple for *you*, but I have seen people memoize fib()
or fac() with a generic dict-based wrapper, which is overkill compared to a
simple list.

In another post, you said
[color=blue]
> If I want to compete on speed with C++, I'm
>simply going have to eliminate Python bytecode from my inner loops[*].
>[*] Unless my overall algorithm is completely wrong, of course.[/color]

For speed, you need to eliminate *executed* and *expensive* bytecodes, not
static, possibly skipped-over, bytecodes. I'll take that as what you meant.
It is certainly what our comments are aimed at. Hence my other suggestion
to eliminate a layer of (expensive) function calls by combining value
calculation and storage in one function (more space, less time). More
specific suggestions will require more specific examples.

Terry J. Reedy

**Aahz** · Jul 18 '05, 06:31 AM

Function unrolling (was Re: Speed: bytecode vz C API calls)

In article <tyfd6ayylhy.fs f@pcepsft001.ce rn.ch>,
Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
>aahz@pythoncra ft.com (Aahz) writes:[color=green]
>>
>> I think you're going to need to write a solution that doesn't call
>> functions (function unrolling, so to speak). Instead of returning a
>> memoizer function, just use a dict of dicts.[/color]
>
>Sounds interesting. But how do I get the dictionary to use the
>function which it is memoizing, to calculate values which haven't been
>cached yet?[/color]

Same way you decide which function to call currently. Essentially, you
do something like this:

def foo():
pass

def bar():
pass

lookup = {
'foo': {'func': foo, 'cache': {}},
'bar': {'func': bar, 'cache': {}},
}

for operation,args in WorkQueue:
curr = lookup[operation]
try:
return curr['cache'][args]
except KeyError:
return curr['cache'].setdefault(arg s, curr['func'](*args))

(Note: untested; you'll have to fill in the blanks)
--
Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

**Jacek Generowicz** · Jul 18 '05, 06:31 AM

Re: Speed: bytecode vz C API calls

"Terry Reedy" <tjreedy@udel.e du> writes:
[color=blue]
> "Jacek Generowicz" <jacek.generowi cz@cern.ch> wrote in message
> news:tyfy8tmz7h q.fsf@pcepsft00 1.cern.ch...[color=green]
> > "Terry Reedy" <tjreedy@udel.e du> writes:[color=darkred]
> > > Can any of your callables be memoized by list rather than dict?
> > > (ie, any count or restricted int args?)[/color]
> >
> > You've lost me there.[/color]
>
> Lookup tables. Dicts can memoize any function. Lists can memoize or
> partially memoize functions with one or more periodic domains. Perhaps I am
> being too obvious or simple for *you*, but I have seen people memoize fib()
> or fac() with a generic dict-based wrapper, which is overkill compared to a
> simple list.[/color]

The arguments passed to the functions I have memoized are usually
types, so, no, I cannot memoize them by list.
[color=blue]
> In another post, you said
>[color=green]
> > If I want to compete on speed with C++, I'm
> >simply going have to eliminate Python bytecode from my inner loops[*].
> >[*] Unless my overall algorithm is completely wrong, of course.[/color]
>
> For speed, you need to eliminate *executed* and *expensive* bytecodes, not
> static, possibly skipped-over, bytecodes. I'll take that as what you meant.
> It is certainly what our comments are aimed at. Hence my other suggestion
> to eliminate a layer of (expensive) function calls by combining value
> calculation and storage in one function (more space, less time).[/color]

You are suggesting that I hand-memoize my functions, rather than using
the generic memoizer I posted at the top of the thread? in order to
save, in the case of cache misses, the function call overhead to the
memoized function? If I had a high proportion of cache misses, then I
suspect that would be worth while. In my case, however, I doubt that
would have any appreciable effect. Almost all the calls to the
memoized functions are cache hits, so effectively there is no value
calculation ... it's pretty much all lookup, so my "expensive"
function is *effectively*:

def proxy(*args):
try:
return cache[args]
except KeyError: pass

(where proxy is a closure over cache). Yes, there is something other
than "pass" in the handler, but we (essentially) never go there.

There's really not much left to play with (though I'm open to
suggestions), which is why I wanted to try recoding in C.

Speed: bytecode vz C API calls

Speed: bytecode vz C API calls

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment