Speed: bytecode vz C API calls

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Bengt Richter

    #31
    Re: Speed: bytecode vz C API calls

    On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <francisgavila@ yahoo.com> wrote:
    [color=blue]
    >Stuart Bishop wrote in message ...[color=green]
    >>About the only thing that could be done then is:
    >>
    >>def memoize(callabl e):
    >> cache = {}
    >> def proxy(*args, _cache=cache):[/color]
    >
    >Not legal. *args must always come after default args. There's no way to do[/color]
    Never say never ;-)
    A function has the __get__ method, which is what getattr magic uses to make a bound method.
    You can use this put cache in the "self" position of a bound method, like
    [color=blue][color=green][color=darkred]
    >>> def sq(x): return x*x[/color][/color][/color]
    ...[color=blue][color=green][color=darkred]
    >>> def memoize(func): # callable is a builtin ;-)[/color][/color][/color]
    ... def proxy(cache, *args):
    ... try: return cache[args]
    ... except KeyError: return cache.setdefaul t(args, func(*args))
    ... return proxy.__get__({ }, proxy.__class__ )
    ...
    [color=blue][color=green][color=darkred]
    >>> msq = memoize(sq)
    >>> msq[/color][/color][/color]
    <bound method function.proxy of {}>[color=blue][color=green][color=darkred]
    >>> msq(3)[/color][/color][/color]
    9[color=blue][color=green][color=darkred]
    >>> msq[/color][/color][/color]
    <bound method function.proxy of {(3,): 9}>[color=blue][color=green][color=darkred]
    >>> msq(3)[/color][/color][/color]
    9[color=blue][color=green][color=darkred]
    >>> msq(2)[/color][/color][/color]
    4[color=blue][color=green][color=darkred]
    >>> msq[/color][/color][/color]
    <bound method function.proxy of {(3,): 9, (2,): 4}>

    Looking at the code, it might be worth timing, if you have 90%+ cache hits.
    I'd be curious to see what you get
    [color=blue][color=green][color=darkred]
    >>> import dis
    >>> dis.dis(msq)[/color][/color][/color]
    3 0 SETUP_EXCEPT 12 (to 15)
    3 LOAD_FAST 0 (cache)
    6 LOAD_FAST 1 (args)
    9 BINARY_SUBSCR
    10 RETURN_VALUE
    11 POP_BLOCK
    12 JUMP_FORWARD 41 (to 56)

    4 >> 15 DUP_TOP
    16 LOAD_GLOBAL 2 (KeyError)
    19 COMPARE_OP 10 (exception match)
    22 JUMP_IF_FALSE 29 (to 54)
    25 POP_TOP
    26 POP_TOP
    27 POP_TOP
    28 POP_TOP
    29 LOAD_FAST 0 (cache)
    32 LOAD_ATTR 3 (setdefault)
    35 LOAD_FAST 1 (args)
    38 LOAD_DEREF 0 (func)
    41 LOAD_FAST 1 (args)
    44 CALL_FUNCTION_V AR 0
    47 CALL_FUNCTION 2
    50 RETURN_VALUE
    51 JUMP_FORWARD 2 (to 56)[color=blue][color=green]
    >> 54 POP_TOP[/color][/color]
    55 END_FINALLY[color=blue][color=green]
    >> 56 LOAD_CONST 0 (None)[/color][/color]
    59 RETURN_VALUE

    [color=blue]
    >this without changing the calling characteristics of the function, which the
    >OP has stated is vital.[/color]
    Should be ok, but I wonder about non-hashable args. Never expect them?
    [color=blue]
    >
    >It is probably possible to custom-build a code object which makes _cache a
    >local reference to the outer cache, and play with the bytecode to make the
    >LOAD_DEREF into a LOAD_FAST, but it's not worth the effort. I don't know
    >how fast/slow a LOAD_DEREF is compared to a LOAD_FAST, though. In any case,
    >it won't help more than a few percent.[/color]
    The above seems to do it for the cache-hit case ;-)[color=blue]
    >
    >Jacek, I think this is as fast as pure Python will get you with this
    >approach. If the speed is simply not acceptable, there are some things you
    >need to consider, Is this the *real* bottleneck?[/color]

    Regards,
    Bengt Richter

    Comment

    • Jacek Generowicz

      #32
      Re: Speed: bytecode vz C API calls

      Peter Otten <__peter__@web. de> writes:
      [color=blue]
      > See below for my ideas towards a generalized memoizing framework.[/color]

      [I haven't the time to have a look at it in any detail right now, so
      forgive me for not responding/commenting in full, just yet.]
      [color=blue][color=green]
      > > def foo(some_type):
      > > ...
      > > return something
      > >
      > >
      > > foo = memoize(foo)
      > >
      > > data = [int, float, my_class, his_class, str, other_class] * 10**6
      > >
      > > map(foo, data) [+][/color]
      >
      > Do you discard the result for real?[/color]

      No, sorry, of course not.
      [color=blue][color=green]
      > > [+] Can you see why I don't want to mess with the call site ?[/color]
      >
      > No.[/color]

      It would mean taking it out of the map and that would mean losing the
      C-coded loop that map provides.
      [color=blue]
      > Now to my code. The idea is to make memoized functions available to
      > different modules from a central registry, and to expose the cached results
      > for hand-optimized hotspots. Otherwise it was all mentioned in this thread,
      > if not the original post.[/color]

      I'll make sure to find the time to study this.

      Comment

      • Jacek Generowicz

        #33
        Re: Speed: bytecode vz C API calls

        bokr@oz.net (Bengt Richter) writes:
        [color=blue]
        > On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <francisgavila@ yahoo.com> wrote:
        >[color=green]
        > >Stuart Bishop wrote in message ...[color=darkred]
        > >>About the only thing that could be done then is:
        > >>
        > >>def memoize(callabl e):
        > >> cache = {}
        > >> def proxy(*args, _cache=cache):[/color][/color][/color]

        To be honest, I have trouble believing that a local variable lookup
        would be significantly faster than a closure lookup ... or is there
        some other point to this ?
        [color=blue][color=green]
        > >Not legal. *args must always come after default args. There's no way to do[/color]
        > Never say never ;-)
        > A function has the __get__ method, which is what getattr magic uses
        > to make a bound method. You can use this put cache in the "self"
        > position of a bound method, like
        >[color=green][color=darkred]
        > >>> def sq(x): return x*x[/color][/color]
        > ...[color=green][color=darkred]
        > >>> def memoize(func): # callable is a builtin ;-)[/color][/color]
        > ... def proxy(cache, *args):
        > ... try: return cache[args]
        > ... except KeyError: return cache.setdefaul t(args, func(*args))
        > ... return proxy.__get__({ }, proxy.__class__ )[/color]

        [color=blue]
        > Looking at the code, it might be worth timing, if you have 90%+ cache hits.
        > I'd be curious to see what you get[/color]

        I get your version being almost imperceptibly slower.

        I used this:

        import timing
        import sys

        def Amemoize(func):
        def proxy(cache, *args):
        try: return cache[args]
        except KeyError: return cache.setdefaul t(args, func(*args))
        return proxy.__get__({ }, proxy.__class__ )


        def Bmemoize(func):
        cache = {}
        def proxy(*args):
        try: return cache[args]
        except KeyError: return cache.setdefaul t(args, func(*args))
        return proxy


        N = int(sys.argv[1])

        def foo(n):
        if n<2: return 1
        return foo(n-1) + foo(n-2)

        Afoo = Amemoize(foo)
        Bfoo = Bmemoize(foo)


        Acode = """
        def Acall():
        """ + "Afoo(10);" * N + "\n"

        Bcode = """
        def Bcall():
        """ + "Bfoo(10);" * N + "\n"

        exec Acode
        exec Bcode

        timing.start()
        Acall()
        timing.finish()
        print timing.micro()

        timing.start()
        Bcall()
        timing.finish()
        print timing.micro()

        Comment

        • Bengt Richter

          #34
          Re: Speed: bytecode vz C API calls

          On 12 Dec 2003 11:17:16 +0100, Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:
          [color=blue]
          >bokr@oz.net (Bengt Richter) writes:
          >[color=green]
          >> On Tue, 9 Dec 2003 23:50:06 -0500, "Francis Avila" <francisgavila@ yahoo.com> wrote:
          >>[color=darkred]
          >> >Stuart Bishop wrote in message ...
          >> >>About the only thing that could be done then is:
          >> >>
          >> >>def memoize(callabl e):
          >> >> cache = {}
          >> >> def proxy(*args, _cache=cache):[/color][/color]
          >
          >To be honest, I have trouble believing that a local variable lookup
          >would be significantly faster than a closure lookup ... or is there
          >some other point to this ?[/color]
          No, I think you are right. I was mainly reacting to the "there's no way" statement ;-)
          [color=blue]
          >[color=green][color=darkred]
          >> >Not legal. *args must always come after default args. There's no way to do[/color]
          >> Never say never ;-)
          >> A function has the __get__ method, which is what getattr magic uses
          >> to make a bound method. You can use this put cache in the "self"
          >> position of a bound method, like
          >>[color=darkred]
          >> >>> def sq(x): return x*x[/color]
          >> ...[color=darkred]
          >> >>> def memoize(func): # callable is a builtin ;-)[/color]
          >> ... def proxy(cache, *args):
          >> ... try: return cache[args]
          >> ... except KeyError: return cache.setdefaul t(args, func(*args))
          >> ... return proxy.__get__({ }, proxy.__class__ )[/color]
          >
          >[color=green]
          >> Looking at the code, it might be worth timing, if you have 90%+ cache hits.
          >> I'd be curious to see what you get[/color]
          >
          >I get your version being almost imperceptibly slower.
          >
          >I used this:[/color]
          When things are that close, it's very hard to draw conclusions except that they are
          probably close ;-) But to split hairs you probably have to run the two tests separately,
          in quiescent system situations, so that the first is not improving or worsening the
          environment for the second (i.e., in terms of interpreter state re memory, or even cpu
          cache in some cases, though that usually only applies to the first time through a loop.
          And also keep the fastest of 5-10 trials, to eliminate glitches due to spurious irrelevant
          system activity. I don't have your "timing" module, so I don't know what it does. If it
          randomly reordered multiloop trials of A vs B and took minimums, then maybe interactions
          could be washed out, but otherwise A & B in the same test source leaves some ambiguity
          about the real ranking (which of course will not matter, practically speaking;-)
          [color=blue]
          >
          >import timing
          >import sys[/color]
          [...]

          Regards,
          Bengt Richter

          Comment

          • Jacek Generowicz

            #35
            Re: Speed: bytecode vz C API calls

            bokr@oz.net (Bengt Richter) writes:
            [color=blue]
            > On 12 Dec 2003 11:17:16 +0100, Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:
            >[color=green]
            > >bokr@oz.net (Bengt Richter) writes:
            > >
            > >To be honest, I have trouble believing that a local variable lookup
            > >would be significantly faster than a closure lookup ... or is there
            > >some other point to this ?[/color]
            >
            > No, I think you are right. I was mainly reacting to the "there's no
            > way" statement ;-)[/color]

            Sure ... I was sort of answering to a few posts upthread, as I hadn't
            done that yet.
            [color=blue][color=green]
            > >I get your version being almost imperceptibly slower.[/color][/color]
            [color=blue]
            > When things are that close, it's very hard to draw conclusions
            > except that they are probably close ;-) But to split hairs you
            > probably have to run the two tests separately, in quiescent system
            > situations,[/color]

            .... when it's that close, I just don't care :-)

            When it's that close, the Python version, CPU, temperature, phase of
            the moon etc. are probably more significant than the algorithm
            .... and, most importantly, my time is better spent on finding more
            effective ways of speeding up my code.
            [color=blue]
            > I don't have your "timing" module,[/color]

            Are you sure? It's on every Python installation I currently have
            access to, so I assumed it was in the standard distribution. Hmm
            .... OK, the library reference lists it under "Obsolete".

            Comment

            Working...