Speed: bytecode vz C API calls

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jacek Generowicz

    #16
    Re: Speed: bytecode vz C API calls

    Peter Otten <__peter__@web. de> writes:
    [color=blue]
    > Jacek Generowicz wrote:
    >[color=green]
    > > aahz@pythoncraf t.com (Aahz) writes:
    > >[color=darkred]
    > >> I think you're going to need to write a solution that doesn't call
    > >> functions (function unrolling, so to speak). Instead of returning a
    > >> memoizer function, just use a dict of dicts.[/color]
    > >
    > > Sounds interesting. But how do I get the dictionary to use the
    > > function which it is memoizing, to calculate values which haven't been
    > > cached yet?[/color]
    >
    > Guessing into the blue:
    >
    > try:
    > result = cache[fun, args]
    > except KeyError:
    > result = cache[fun, args] = fun(args)[/color]

    Allow me to remind you of the memoizer I posted at the top of the thread:

    def memoize(callabl e):
    cache = {}
    def proxy(*args):
    try: return cache[args]
    except KeyError: return cache.setdefaul t(args, callable(*args) )
    return proxy

    One key feature of this is that I can use it to replace any[*]
    function with a functionally equivalent but faster one, without having
    to change any of the call sites of the original.

    IIUC, Aahz suggested to replace the proxy function with a
    dictionary. This sounds reasonable, as, after all, the proxy maps its
    arguments to its return values, while a dictionary maps keys to
    values. The whole point of this is that function call overhead is
    large compared to dictionary lookup, which is (almost) all that the
    proxy does. However, it's the "almost" that creates the problem. If
    I'm prepared to mess around with the calls to the original functions
    and replace them with your suggestion, then, of course the problem is
    solved ... however, I _really_ do not want to replace the calls to the
    original ...
    [color=blue]
    > Maybe it's time to show some inner loop lookalike to the wider
    > public for more to the point suggestions...[/color]


    Sure.


    def foo(some_type):
    ...
    return something


    foo = memoize(foo)

    data = [int, float, my_class, his_class, str, other_class] * 10**6

    map(foo, data) [+]


    [*] ... with hashable arguments, etc. etc.

    [+] Can you see why I don't want to mess with the call site ?

    Comment

    • Jacek Generowicz

      #17
      Re: Function unrolling (was Re: Speed: bytecode vz C API calls)

      aahz@pythoncraf t.com (Aahz) writes:
      [color=blue]
      > In article <tyfd6ayylhy.fs f@pcepsft001.ce rn.ch>,
      > Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=green]
      > >aahz@pythoncra ft.com (Aahz) writes:[color=darkred]
      > >>
      > >> I think you're going to need to write a solution that doesn't call
      > >> functions (function unrolling, so to speak). Instead of returning a
      > >> memoizer function, just use a dict of dicts.[/color][/color][/color]
      ^^^^^^^^

      (Maybe this is where we are misunderstandin g eachother. On first
      reading, I thought that "memoizer" was just a typo. I don't return a
      memoizer function, I return a memoized function.)
      [color=blue][color=green]
      > >Sounds interesting. But how do I get the dictionary to use the
      > >function which it is memoizing, to calculate values which haven't been
      > >cached yet?[/color]
      >
      > Same way you decide which function to call currently.[/color]

      The way I decide which function to call currently is to type its name
      directly into the source at the call site. Deciding which function to
      call is not the problem: calling any function at all, once I've
      reduced myself to having _just_ a dictionary, is the problem.

      Maybe I didn't make the use of "memoize" clear enough.

      You use it thus:

      def foo():
      ...

      foo = memoize(foo)

      And from now on foo is a lot faster than it was before[*].
      [color=blue]
      > Essentially, you do something like this:
      >
      > def foo():
      > pass
      >
      > def bar():
      > pass
      >
      > lookup = {
      > 'foo': {'func': foo, 'cache': {}},
      > 'bar': {'func': bar, 'cache': {}},
      > }
      >
      > for operation,args in WorkQueue:
      > curr = lookup[operation]
      > try:
      > return curr['cache'][args]
      > except KeyError:
      > return curr['cache'].setdefault(arg s, curr['func'](*args))[/color]

      Nononononooooo ! :-)

      In any given loop, I know _which_ function I want to call, so the
      dictionary of dictionaries is pointless. Let's recap the original
      memoizer:

      def memoize(callabl e):
      cache = {}
      def proxy(*args):
      try: return cache[args]
      except KeyError: return cache.setdefaul t(args, callable(*args) )
      return proxy

      I thought that you were suggesting to replace "proxy" with just the
      dictionary "cache", in order to save the function call overhead, as
      proxy is (almost) just a wrapper around the dictionary lookup. That's
      great, but what do I do in the (very rare) cases where I'm presented
      with an argument/key I haven't seen yet? Somehow, I need to recognize
      that the key is not present, call the original function, and add the
      key-value pair to the cache. I can't see how to do that without
      re-introducing the function call. I could to it by inlining the try
      block, but then I wouldn't be able to use the memoized function inside
      map ... and turning a map into a for-loop would lose me more than I'd
      gained.


      [*] With a whole bunch of exceptions which are beside the point here.

      Comment

      • Stuart Bishop

        #18
        Re: Speed: bytecode vz C API calls

        -----BEGIN PGP SIGNED MESSAGE-----
        Hash: SHA1


        On 10/12/2003, at 8:55 AM, Jacek Generowicz wrote:
        [color=blue]
        > Peter Otten <__peter__@web. de> writes:
        >[color=green]
        >> Jacek Generowicz wrote:
        >>[color=darkred]
        >>> aahz@pythoncraf t.com (Aahz) writes:
        >>>
        >>>> I think you're going to need to write a solution that doesn't call
        >>>> functions (function unrolling, so to speak). Instead of returning a
        >>>> memoizer function, just use a dict of dicts.
        >>>
        >>> Sounds interesting. But how do I get the dictionary to use the
        >>> function which it is memoizing, to calculate values which haven't
        >>> been
        >>> cached yet?[/color]
        >>
        >> Guessing into the blue:
        >>
        >> try:
        >> result = cache[fun, args]
        >> except KeyError:
        >> result = cache[fun, args] = fun(args)[/color]
        >
        > Allow me to remind you of the memoizer I posted at the top of the
        > thread:
        >
        > def memoize(callabl e):
        > cache = {}
        > def proxy(*args):
        > try: return cache[args]
        > except KeyError: return cache.setdefaul t(args,
        > callable(*args) )
        > return proxy
        >
        > One key feature of this is that I can use it to replace any[*]
        > function with a functionally equivalent but faster one, without having
        > to change any of the call sites of the original.[/color]

        [color=blue]
        > proxy does. However, it's the "almost" that creates the problem. If
        > I'm prepared to mess around with the calls to the original functions
        > and replace them with your suggestion, then, of course the problem is
        > solved ... however, I _really_ do not want to replace the calls to the
        > original ...[/color]

        About the only thing that could be done then is:

        def memoize(callabl e):
        cache = {}
        def proxy(*args, _cache=cache):
        try: return _cache[args]
        except KeyError: return cache.setdefaul t(args, callable(*args) )
        return proxy

        Might help a little bit, although probably not enough.

        - --
        Stuart Bishop <stuart@stuartb ishop.net>

        -----BEGIN PGP SIGNATURE-----
        Version: GnuPG v1.2.3 (Darwin)

        iD8DBQE/1n5TAfqZj7rGN0o RAlSQAJ9WZe39F8 L1KECCMTS3HxUnW KGx8wCgjZ8P
        68pLTP33rlFI0eS EHK8FVXY=
        =25Fx
        -----END PGP SIGNATURE-----


        Comment

        • Francis Avila

          #19
          Re: Speed: bytecode vz C API calls

          Stuart Bishop wrote in message ...[color=blue]
          >About the only thing that could be done then is:
          >
          >def memoize(callabl e):
          > cache = {}
          > def proxy(*args, _cache=cache):[/color]

          Not legal. *args must always come after default args. There's no way to do
          this without changing the calling characteristics of the function, which the
          OP has stated is vital.

          It is probably possible to custom-build a code object which makes _cache a
          local reference to the outer cache, and play with the bytecode to make the
          LOAD_DEREF into a LOAD_FAST, but it's not worth the effort. I don't know
          how fast/slow a LOAD_DEREF is compared to a LOAD_FAST, though. In any case,
          it won't help more than a few percent.

          Jacek, I think this is as fast as pure Python will get you with this
          approach. If the speed is simply not acceptable, there are some things you
          need to consider, Is this the *real* bottleneck?

          Consider this before you delve into C solutions, which greatly increase
          headaches. An easy way to check is to cut out the map and memoization and
          just use a dictionary preloaded with known argument:value pairs for a given
          set of test data.

          e.g.,

          def func(...):
          ...

          data = [foo, bar, ....]
          cache = dict([(i, func(i)) for i in data])

          def timetrial(_data =data, _cache=cache):
          return map(_cache.__ge titem__, _data)
          # [operator.getite m(_cache, i) for i in _data] *might* be faster.

          then timeit.py -s"import ttrial" "ttrial.timetri al()"

          If it's still too slow, then either Python simply isn't for this task, or
          you need to optimize your functions.

          If this isn't the real bottleneck:
          - Perhaps the functions I'm memoizing need optimization themselves?
          - Are there any modules out there which implement these in C already?
          - How does the competing C++ code do it? Does it get its speed from some
          wicked data structure for the memoizing, or are the individual functions
          faster, or what?

          If this *is* the only significant bottleneck, but you want to try harder to
          speed it up, you can pursue the C closure path you were mentioning. With
          the caveat that I don't do any CPython hacking, I know that closures are
          implemented using co_cellvars and the LOAD_DEREF bytecode. The Python C api
          does describe an api to cell objects, so that will be the place to look, if
          it's possible to do a closure in C. It may not be faster than your
          straightforward approach, of course.

          However, comparing your Python to a C++ version on the basis of speed is
          most certainly the wrong way to go. If the C++ is well designed with good
          algorithms, its going to be faster than Python. Period. The advantage of
          Python is that it's much faster and easier to *write* (and less likely to
          introduce bugs), easier to *read* and maintain, and can model your problem
          more intuitively and flexibly. But if you need maximum speed at all costs,
          the best Python can do for you is act as a glue language to bits and pieces
          of hand-crafted C.

          Do you really *need* the speed, or are you just bothered that it's not as
          fast as the C++ version?
          --
          Francis Avila

          Comment

          • Jacek Generowicz

            #20
            Re: Speed: bytecode vz C API calls

            "Francis Avila" <francisgavila@ yahoo.com> writes:
            [color=blue]
            > It is probably possible to custom-build a code object which makes _cache a
            > local reference to the outer cache, and play with the bytecode to make the
            > LOAD_DEREF into a LOAD_FAST, but it's not worth the effort.[/color]

            I'm tempted to agree.
            [color=blue]
            > Jacek, I think this is as fast as pure Python will get you with this
            > approach.[/color]

            That's pretty much what I thought ... which is why I was (reluctantly)
            investigating the C approach.
            [color=blue]
            > If the speed is simply not acceptable, there are some things you
            > need to consider, Is this the *real* bottleneck?[/color]

            Well, when I profile the code, proxy comes out at the top in terms of
            total time. Yes, there are other functions that are hotspots too. Yes,
            maybe I'm being completely stupid, and 99% of those calls are unnecessary.

            However, I simply thought that if I can increase the speed of "proxy"
            by a factor of 3 or 7 or something, without too much pain, by recoding
            it in C, then I'd get a noticeable speed gain overall. If it involves
            re-hacking the interpreter, then I won't bother.
            [color=blue]
            > Consider this before you delve into C solutions, which greatly increase
            > headaches.[/color]

            You don't need to tell me that :-)
            [color=blue]
            > An easy way to check is to cut out the map and memoization and
            > just use a dictionary preloaded with known argument:value pairs for a given
            > set of test data.
            >
            > e.g.,
            >
            > def func(...):
            > ...
            >
            > data = [foo, bar, ....]
            > cache = dict([(i, func(i)) for i in data])
            >
            > def timetrial(_data =data, _cache=cache):
            > return map(_cache.__ge titem__, _data)
            > # [operator.getite m(_cache, i) for i in _data] *might* be faster.
            >
            > then timeit.py -s"import ttrial" "ttrial.timetri al()"[/color]

            I tried something similar, namely inlining the

            try: return cache[args]
            except KeyError: return cache.setdefaul t(args, callable(*args) )

            rather than using proxy. This gives a little over a factor of 3
            speedup, but it's not something I can use everywhere, as often the
            proxy is called inside map(...)
            [color=blue]
            > If it's still too slow, then either Python simply isn't for this task, or
            > you need to optimize your functions.
            >
            > If this isn't the real bottleneck:
            > - Perhaps the functions I'm memoizing need optimization themselves?[/color]

            But, to a pretty good approximation, they _never_ get called (the
            value is (almost) always looked up). How would optimizing a function
            that doesn't get called, help ?
            [color=blue]
            > - Are there any modules out there which implement these in C
            > already?[/color]

            Other than the competing C++ implementation, no.
            [color=blue]
            > - How does the competing C++ code do it? Does it get its speed from some
            > wicked data structure for the memoizing, or are the individual functions
            > faster, or what?[/color]

            I have about 10 functions, written in Python, which the profiler
            highlights as taking up most of the time, between them (of which
            "proxy" is the main culprit). The C++ version doesn't have _any_
            Python in its inner loops. I think it's as simple as that.

            (Admittedly, a group of 4 of my functions was initially written with
            development ease in mind, and should be replaced by a single mean and
            lean C(++) function. That's probably where the real bottleneck is.)
            [color=blue]
            > If this *is* the only significant bottleneck,[/color]

            No, it's not the _only_ bottleneck.

            Yes, I am pursuing other paths too. But I feel more confident that I
            know what I am doing in the other cases, so I won't bother the list
            with those. If you remember, my original question was about what I
            could expect in terms of speedup when recoding a Python function in C,
            which will make many calls to the Python C API. It's not that I'm
            determined to fix my whole speed problem just in this one area, but I
            would like to understand a bit more about what to expect when going to
            C.
            [color=blue]
            > but you want to try harder to speed it up, you can pursue the C
            > closure path you were mentioning.[/color]

            Yes, I think that this could be edifying. It's also likely to be a
            huge pain, so I'm not that determined to pursue it, as I do have more
            productive things to be getting on with.
            [color=blue]
            > However, comparing your Python to a C++ version on the basis of speed is
            > most certainly the wrong way to go. If the C++ is well designed with good
            > algorithms, its going to be faster than Python. Period.[/color]

            Not if my inner loops are C.

            (BTW, I don't believe that the C++ version is all that wonderfully
            designed, but enough about that :-)
            [color=blue]
            > But if you need maximum speed at all costs, the best Python can do
            > for you is act as a glue language to bits and pieces of hand-crafted
            > C.[/color]

            I think that it's the other way around, in this case. I build up a
            whole structure in Python ... and just need C(++) to call a very small
            subset of the functionality I provide, very efficiently.
            [color=blue]
            > Do you really *need* the speed, or are you just bothered that it's not as
            > fast as the C++ version?[/color]

            Politically speaking, yes, I need the speed. If a silly benchmark (one
            which completely misrepresents reality, BTW), shows that the Python
            version is slower than the C++ one, then the Python one is terminated.

            However, as you point out:
            [color=blue]
            > The advantage of Python is that it's much faster and easier to
            > *write* (and less likely to introduce bugs), easier to *read* and
            > maintain, and can model your problem more intuitively and flexibly.[/color]

            .... and for these reasons I am very keen to ensure that the Python
            version survives ... which, ironically enough, means that it must be
            fast, even though I personally don't care that much about the speed.

            Comment

            • Terry Reedy

              #21
              Re: Speed: bytecode vz C API calls

              You presented an example function as taking a type (including new classes)
              as parameter. If your type domain is boundedly finite and preknowable*, you
              could precompute the function for *all* inputs and replace the call with a
              dict lookup (or bound method for map).

              (*) If 'preknowable', could either create classes with factory function that
              adds them to cache or give metaclass that does same.

              Terry


              Comment

              • Aahz

                #22
                Re: Speed: bytecode vz C API calls

                In article <tyf8yllyq51.fs f@pcepsft001.ce rn.ch>,
                Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
                >
                >I tried something similar, namely inlining the
                >
                > try: return cache[args]
                > except KeyError: return cache.setdefaul t(args, callable(*args) )
                >
                >rather than using proxy. This gives a little over a factor of 3
                >speedup, but it's not something I can use everywhere, as often the
                >proxy is called inside map(...)[/color]

                Why are you using map()? Generally speaking, map() won't be any faster
                than a for loop.
                --
                Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

                Weinberg's Second Law: If builders built buildings the way programmers wrote
                programs, then the first woodpecker that came along would destroy civilization.

                Comment

                • Aahz

                  #23
                  Re: Function unrolling (was Re: Speed: bytecode vz C API calls)

                  [Too busy to snip, sorry]

                  In article <tyfvfopzkpr.fs f@lxplus025.cer n.ch>,
                  Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
                  >aahz@pythoncra ft.com (Aahz) writes:[color=green]
                  >> In article <tyfd6ayylhy.fs f@pcepsft001.ce rn.ch>,
                  >> Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=darkred]
                  >>>aahz@pythonc raft.com (Aahz) writes:
                  >>>>
                  >>>> I think you're going to need to write a solution that doesn't call
                  >>>> functions (function unrolling, so to speak). Instead of returning a
                  >>>> memoizer function, just use a dict of dicts.[/color][/color]
                  > ^^^^^^^^
                  >
                  >(Maybe this is where we are misunderstandin g eachother. On first
                  >reading, I thought that "memoizer" was just a typo. I don't return a
                  >memoizer function, I return a memoized function.)
                  >[color=green][color=darkred]
                  >>>Sounds interesting. But how do I get the dictionary to use the
                  >>>function which it is memoizing, to calculate values which haven't been
                  >>>cached yet?[/color]
                  >>
                  >> Same way you decide which function to call currently.[/color]
                  >
                  >The way I decide which function to call currently is to type its name
                  >directly into the source at the call site. Deciding which function to
                  >call is not the problem: calling any function at all, once I've
                  >reduced myself to having _just_ a dictionary, is the problem.
                  >
                  >Maybe I didn't make the use of "memoize" clear enough.
                  >
                  >You use it thus:
                  >
                  > def foo():
                  > ...
                  >
                  > foo = memoize(foo)
                  >
                  >And from now on foo is a lot faster than it was before[*].[/color]

                  That's why I changed the Subject: to "Function unrolling"; you're going
                  to need to change your code to avoid function calls in inner loops. One
                  technique you might try is to create a generalized inner loop function:

                  def innerloop(itera ble, func, cache):
                  l = []
                  for args in iterable:
                  try:
                  l.append(cache[args])
                  except KeyError:
                  l.append(cache. setdefault(args , func(args))

                  return l

                  Yes, it's a performance hack, but I bet it's a lot clearer than C++
                  code.
                  --
                  Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

                  Weinberg's Second Law: If builders built buildings the way programmers wrote
                  programs, then the first woodpecker that came along would destroy civilization.

                  Comment

                  • Jacek Generowicz

                    #24
                    Re: Speed: bytecode vz C API calls

                    aahz@pythoncraf t.com (Aahz) writes:
                    [color=blue]
                    > Generally speaking, map() won't be any faster than a for loop.[/color]

                    This is at odds with my own experience (including a lot of timing runs
                    in the last two days), countless posts on this newsgroup over the
                    years, and, for exapmle, this article

                    The official home of the Python Programming Language


                    by Guido, in which he concludes that

                    "An implied loop in map() is faster than an explicit for loop"

                    Comment

                    • Aahz

                      #25
                      Re: Speed: bytecode vz C API calls

                      In article <tyfu148y6ye.fs f@pcepsft001.ce rn.ch>,
                      Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=blue]
                      >aahz@pythoncra ft.com (Aahz) writes:[color=green]
                      >>
                      >> Generally speaking, map() won't be any faster than a for loop.[/color]
                      >
                      >This is at odds with my own experience (including a lot of timing runs
                      >in the last two days), countless posts on this newsgroup over the
                      >years, and, for exapmle, this article
                      >
                      > http://www.python.org/doc/essays/list2str.html[/color]

                      All right, what I should have said is that map() isn't enough faster
                      than a for loop in the cases where it is faster to make much difference,
                      and there are definitely cases where it's slower than the equivalent for
                      loop, and it's usually less readable than some other way of getting
                      speed.
                      --
                      Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

                      Weinberg's Second Law: If builders built buildings the way programmers wrote
                      programs, then the first woodpecker that came along would destroy civilization.

                      Comment

                      • Jacek Generowicz

                        #26
                        Re: Function unrolling (was Re: Speed: bytecode vz C API calls)

                        aahz@pythoncraf t.com (Aahz) writes:
                        [color=blue]
                        > That's why I changed the Subject: to "Function unrolling"; you're going
                        > to need to change your code to avoid function calls in inner loops. One
                        > technique you might try is to create a generalized inner loop function:
                        >
                        > def innerloop(itera ble, func, cache):
                        > l = []
                        > for args in iterable:
                        > try:
                        > l.append(cache[args])
                        > except KeyError:
                        > l.append(cache. setdefault(args , func(args))
                        >
                        > return l[/color]

                        OK, I see what you mean.

                        I agree, I can do this, in some places, and toy tests suggest that
                        getting rid of the function call could give me a factor of 3
                        improvement. I'll have to change the code in many places, but that's
                        simpler than going to C.

                        (However, we disagree on the specific formula you give above: I think
                        that map is faster: see other post in the thread. [A quick test
                        suggests that a map is faster than the loop by a factor of 2.])
                        [color=blue]
                        > Yes, it's a performance hack, but I bet it's a lot clearer than C++
                        > code.[/color]

                        No doubt about that :-)


                        Cheers,

                        Comment

                        • Jacek Generowicz

                          #27
                          Re: Speed: bytecode vz C API calls

                          "Terry Reedy" <tjreedy@udel.e du> writes:
                          [color=blue]
                          > You presented an example function as taking a type (including new classes)
                          > as parameter. If your type domain is boundedly finite[/color]

                          It's unboundedly finite :-)
                          [color=blue]
                          > and preknowable*, you could precompute the function for *all* inputs
                          > and replace the call with a dict lookup (or bound method for map).
                          >
                          > (*) If 'preknowable', could either create classes with factory function that
                          > adds them to cache or give metaclass that does same.[/color]

                          Hmmm ... all the non-builtin types in my domain are actually
                          dynamically created, and are themselves instances of a metaclass,
                          which could be instructed to add information about the new classes, to
                          the caches, as the classes are being created.

                          However, there are a number of problems I foresee (which I don't think
                          are particularly interesting to discuss here). Still, it could be put
                          to good use in some of the cases. I'll bear it in mind, thanks.

                          But first, I think that I'll try inlining the exception handler in
                          those cases where it can be done.


                          Cheers,

                          Comment

                          • Francis Avila

                            #28
                            Re: Speed: bytecode vz C API calls

                            Aahz wrote in message ...[color=blue]
                            >In article <tyfu148y6ye.fs f@pcepsft001.ce rn.ch>,
                            >Jacek Generowicz <jacek.generowi cz@cern.ch> wrote:[color=green]
                            >>aahz@pythoncr aft.com (Aahz) writes:
                            >> http://www.python.org/doc/essays/list2str.html[/color]
                            >
                            >All right, what I should have said is that map() isn't enough faster
                            >than a for loop in the cases where it is faster to make much difference,
                            >and there are definitely cases where it's slower than the equivalent for
                            >loop, and it's usually less readable than some other way of getting
                            >speed.[/color]

                            That essay is a bit old, so not everything it says may still be true.

                            But in 2.3 at least, one case in which map is almost always much faster is
                            when the function used is a builtin. In this case, map (which is also
                            builtin) calls the C function from C, and I'm pretty sure it gets this speed
                            by staying outside of the Python eval loop. This can be wickedly fast (in
                            Python terms), and faster than list comps or for-loops with explicit list
                            appending, as a simple timeit will show.

                            For *Python* functions and lambdas (non-builtins), map can very often be
                            slower than a for-loop or list comprehension, for reasons not quite clear to
                            me. And of course, where a for-loop or list comp can avoid calls
                            alltogether, it will always be faster than the equivalent function-using map
                            (unless map uses None as the function).

                            I think in the OP's case, any little drop of speed he can get more than
                            justifies the very, very slightly un-idiomatic use of map instead of a
                            for-loop. Be warned that map() may go away in future Pythons, though....
                            --
                            Francis Avila

                            Comment

                            • Jacek Generowicz

                              #29
                              Re: Speed: bytecode vz C API calls

                              "Francis Avila" <francisgavila@ yahoo.com> writes:
                              [color=blue]
                              > Aahz wrote in message ...[/color]

                              [about map]
                              [color=blue][color=green]
                              > > and it's usually less readable than some other way of getting
                              > > speed.[/color][/color]

                              Actually, I use map in my code because I find it very elegant and very
                              readable[*]. Usually, the speed has nothing to do with it. But if I
                              already have a map, and I'm interested in speed then I'm loathe to
                              replace it with something slower.
                              [color=blue]
                              > I think in the OP's case, any little drop of speed he can get more than
                              > justifies the very, very slightly un-idiomatic use of map instead of a
                              > for-loop.[/color]

                              Why do you consider my use of map to be un-idiomatic? I want to
                              transform a sequence into another one, by applying a function to each
                              element of the original. This is exactly the point of map. What am I missing?
                              [color=blue]
                              > Be warned that map() may go away in future Pythons, though....[/color]

                              Yes, I've heard idle rumours to this effect. I think it would be a
                              great shame.

                              Yes, I realize that there are list comprehensions. I appreciate both,
                              and find that sometimes one looks more natural than the other, and
                              sometimes it's the other way around. I find this ability to pick the
                              right-looking idiom worth more than a One Way To Do it principle.

                              [*] There seems to be a faction in the Python community which
                              considers map, reduce, filter and lambda to be evil infiltrators
                              from those perfidious inventions that are functional languages,
                              and therefore the claim is made that they have no place in
                              Python. Some newbies believe this **** and the myth spreads. Just
                              my 2 cents.

                              Comment

                              • Peter Otten

                                #30
                                Re: Speed: bytecode vz C API calls

                                Jacek Generowicz wrote:
                                [color=blue]
                                > One key feature of this is that I can use it to replace any[*]
                                > function with a functionally equivalent but faster one, without having
                                > to change any of the call sites of the original.[/color]

                                Understood, but not until this post.
                                Strictly speaking, all optimizations that fulfill the above criterium should
                                be built into the compiler, while memoizing will not for its memory
                                tradeoffs and dependancy of the input data.
                                [color=blue]
                                > IIUC, Aahz suggested to replace the proxy function with a
                                > dictionary. This sounds reasonable, as, after all, the proxy maps its
                                > arguments to its return values, while a dictionary maps keys to
                                > values. The whole point of this is that function call overhead is
                                > large compared to dictionary lookup, which is (almost) all that the
                                > proxy does. However, it's the "almost" that creates the problem. If
                                > I'm prepared to mess around with the calls to the original functions
                                > and replace them with your suggestion, then, of course the problem is
                                > solved ... however, I _really_ do not want to replace the calls to the
                                > original ...[/color]

                                See below for my ideas towards a generalized memoizing framework. I've
                                adressed the problem by writing a map() variant that knows about result
                                caches (essentially Aahz' innerloop() I see). You could use it to shade the
                                builtin.
                                [color=blue]
                                > def foo(some_type):
                                > ...
                                > return something
                                >
                                >
                                > foo = memoize(foo)
                                >
                                > data = [int, float, my_class, his_class, str, other_class] * 10**6
                                >
                                > map(foo, data) [+][/color]

                                Do you discard the result for real?
                                [color=blue]
                                > [+] Can you see why I don't want to mess with the call site ?[/color]

                                No.

                                Now to my code. The idea is to make memoized functions available to
                                different modules from a central registry, and to expose the cached results
                                for hand-optimized hotspots. Otherwise it was all mentioned in this thread,
                                if not the original post.

                                <memoize.py>
                                import __builtin__

                                class RegistryItem:
                                """ provide the necessary infrastructure to transparently speed up
                                one function
                                """
                                def __init__(self, func, argc):
                                cache = self.cache = {}
                                def memoized(*args) :
                                try:
                                return cache[args]
                                except KeyError:
                                result = cache[args] = func(*args)
                                return result

                                assert argc > 0
                                if argc == 1:
                                def callseq(seq):
                                print seq[:3]
                                result = []
                                append = result.append
                                for arg in seq:
                                try:
                                append(cache[arg])
                                except KeyError:
                                value = cache[arg] = func(arg)
                                append(value)
                                return result
                                else:
                                def callseq(*seqs):
                                result = []
                                append = result.append
                                if len(seqs) == 1:
                                # multiple args as one tuple in the sequence
                                for args in seqs[0]:
                                try:
                                append(cache[args])
                                except KeyError:
                                value = cache[args] = func(*args)
                                append(value)
                                else:
                                # multiple args in multiple sequences
                                # XXX special case (too lazy for now)
                                return map(memoized, *seqs)
                                return result
                                self.wrapped = memoized
                                self.callseq = callseq
                                self.original = func

                                class Registry:
                                """ Registry for function variants using a result cache
                                fast = register(slow, <number of args>)
                                """
                                def __init__(self):
                                self.data = {}
                                def __call__(self, func, argc):
                                assert argc > 0
                                try:
                                return self.data[func]
                                except KeyError:
                                ri = RegistryItem(fu nc, argc)
                                self.data[func] = ri
                                self.data[ri.wrapped] = ri
                                return ri.wrapped

                                def __getitem__(sel f, func):
                                return self.data[func]

                                def getmap(self):
                                """ provide a cache-enhanced version of map
                                """
                                data = self.data
                                def memomap(func, *seqs):
                                try:
                                callseq = data[func].callseq
                                except KeyError:
                                return __builtin__.map (func, *seqs)
                                return callseq(*seqs)
                                return memomap

                                registry = register = Registry()
                                </memoize.py>

                                <example.py>
                                # for improved clarity of this example I've have omitted
                                # rebinding the original names and instead used fastXXX
                                import memoize, time

                                def slow(arg):
                                print "no cache or cache miss with %r" % arg
                                time.sleep(0.1)
                                return "<%s>" % arg

                                sample = range(10) * 100

                                # preparation code

                                fast = memoize.registe r(slow, True)
                                # typically used like so:
                                # slow = memoize.registe r(slow)

                                fastmap = memoize.registr y.getmap()
                                # could also be used like so:
                                # map = memoize.registr y[slow].getmap()


                                # some tests
                                assert fast is memoize.registr y[fast].wrapped
                                assert fast is memoize.registr y[slow].wrapped
                                assert slow is memoize.registr y[slow].original


                                # stray function call
                                print slow(10)

                                # stray function call using cache
                                print fast(20)

                                #built-in loop support
                                fastmap(slow, sample)
                                fastmap(fast, sample)

                                # hand-crafted code using the cache
                                cache = memoize.registr y[slow].cache
                                for arg in sample:
                                cache[arg]

                                </example.py>

                                Feel free to play with the above. Be warned that there will be bugs.
                                However, before working on the code I will have to profile to make at least
                                sure it does not slow down things... maybe this weekend.

                                Peter

                                Comment

                                Working...