Trivial performance questions

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Brian Patterson

    Trivial performance questions

    I have noticed in the book of words that hasattr works by calling getattr
    and raising an exception if no such attribute exists. If I need the value
    in any case, am I better off using getattr within a try statement myself, or
    is there some clever implementation enhancement which makes this a bad idea?

    i.e. should I prefer:
    if hasattr(self,"d atum"):
    datum=getattr(" datum")
    else:
    datum=None
    self.datum=None

    over:
    try:
    datum=self.geta ttr("datum")
    except:
    self.datum=None
    datum=None

    The concept of deliberately raising an error is still foreign to this python
    newbie, but I really like the trapping facilities. I just worry about the
    performance implications and memory usage of such things, especially since
    I'm writing for Zope.

    And while I'm here: Is there a difference in performance when checking:
    datum is None
    over:
    datum == None

    and similarly:
    if x is None or y is None:
    or:
    if None in (x,y):

    I appreciate that these are trivial in the extreme, but I seem to be writing
    dozens of them, and I may as well use the right one and squeeze what
    performance I can.

    Many thanks,
    Christopher Boomer.


  • Michael Hudson

    #2
    Re: Trivial performance questions

    "Brian Patterson" <bp@computastor e.com> writes:
    [color=blue]
    > I have noticed in the book of words that hasattr works by calling
    > getattr and raising an exception if no such attribute exists. If I
    > need the value in any case, am I better off using getattr within a
    > try statement myself, or is there some clever implementation
    > enhancement which makes this a bad idea?
    >
    > i.e. should I prefer:
    > if hasattr(self,"d atum"):
    > datum=getattr(" datum")
    > else:
    > datum=None
    > self.datum=None
    >
    > over:
    > try:
    > datum=self.geta ttr("datum")
    > except:[/color]

    Don't do that, do "except AttributeError: " instead.
    [color=blue]
    > self.datum=None
    > datum=None
    >
    > The concept of deliberately raising an error is still foreign to
    > this python newbie, but I really like the trapping facilities. I
    > just worry about the performance implications and memory usage of
    > such things, especially since I'm writing for Zope.[/color]

    The answer to which of these runs the fastest depends on how often you
    expect the attribute to exist. If it's not that often, the first form
    will probably be quicker, if not the second.

    However, there's a third form:

    datum = getattr(self, "datum", None)

    which is what you want here.
    [color=blue]
    > And while I'm here: Is there a difference in performance when checking:
    > datum is None
    > over:
    > datum == None[/color]

    Yes (I think...), but you shouldn't care much about it.
    [color=blue]
    > and similarly:
    > if x is None or y is None:
    > or:
    > if None in (x,y):[/color]

    Pass. Time it if you really care (my bet's on the former being
    quicker).
    [color=blue]
    > I appreciate that these are trivial in the extreme, but I seem to be
    > writing dozens of them, and I may as well use the right one and
    > squeeze what performance I can.[/color]

    This is an unhelpful attitude. You're writing in Python after all!

    If profile shows some of this code to be a hotspot, *then* and only
    then is it an appropriate time to worry about such trivial performance
    gains.

    Cheers,
    mwh

    --
    The only problem with Microsoft is they just have no taste.
    -- Steve Jobs, (From _Triumph of the Nerds_ PBS special)
    and quoted by Aahz on comp.lang.pytho n

    Comment

    • Peter Hansen

      #3
      Re: Trivial performance questions

      Michael Hudson wrote:[color=blue]
      >
      > If profile shows some of this code to be a hotspot, *then* and only
      > then is it an appropriate time to worry about such trivial performance
      > gains.[/color]

      And never forget to include the second criterion for bothering to
      worry about performance: the code does not meet its performance
      requirements.

      Even if profiling shows you a hotspot (as it almost always would),
      you are still wasting your time if you don't actually *need* the
      code to be faster. Shaving a few seconds of runtime of a program
      that takes a minute to run is likely to be a waste of your time
      in the long run, especially when you consider how many times the
      program will have to be run to pay back the investment in
      optimization and the resultant increase in maintenance costs.

      And remember that use of Python in the first place includes an
      implicit acceptance that performance is not your biggest concern.

      -Peter

      Comment

      • Duncan Booth

        #4
        Re: Trivial performance questions

        "Brian Patterson" <bp@computastor e.com> wrote in
        news:bmopfb$sb5 $1@titan.btinte rnet.com:
        [color=blue]
        > I have noticed in the book of words that hasattr works by calling
        > getattr and raising an exception if no such attribute exists. If I
        > need the value in any case, am I better off using getattr within a try
        > statement myself, or is there some clever implementation enhancement
        > which makes this a bad idea?[/color]

        If you were to inline the code, but you check the existence of an attribute
        in more than one place, then you would naturally extract the duplicated
        code out into a single function which you call from each location. The
        presence of 'hasattr' means the potentially duplicated code is already
        extracted into a support function, so you should use it *where appropriate*
        to make your code shorter & easier to read.[color=blue]
        >
        > i.e. should I prefer:
        > if hasattr(self,"d atum"):
        > datum=getattr(" datum")
        > else:
        > datum=None
        > self.datum=None
        >
        > over:
        > try:
        > datum=self.geta ttr("datum")
        > except:
        > self.datum=None
        > datum=None
        >[/color]
        Probably you should prefer:

        datum = getattr(self, "datum", None)

        although this doesn't have the side effect of setting self.datum if it was
        unset. Alternatively you could set self.datum every time with:

        self.datum = datum = getattr(self, "datum", None)

        --
        Duncan Booth duncan@rcp.co.u k
        int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
        "\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

        Comment

        • Brian Patterson

          #5
          Re: Trivial performance questions

          >> I appreciate that these are trivial in the extreme, but I seem to be[color=blue][color=green]
          >> writing dozens of them, and I may as well use the right one and
          >> squeeze what performance I can.[/color][/color]
          [color=blue]
          > This is an unhelpful attitude. You're writing in Python after all![/color]

          I have never considered using the fastest available option to be an
          unhelpful attitude, especially when it does not impact on readability. It
          occurred to me that someone more knowledgable might know whether there was a
          'right' answer to these trivial questions.

          However, it appears not. Sorry for wasting your time :(

          Thanks for the tip on the getattr default. This is much cleaner to read,
          almost certainly quicker, and will serve the purpose well. I had convinced
          myself that it was not available in 2.1.3.



          Comment

          • Peter Otten

            #6
            Re: Trivial performance questions

            Brian Patterson wrote:
            [color=blue]
            > I have noticed in the book of words that hasattr works by calling getattr
            > and raising an exception if no such attribute exists. If I need the value
            > in any case, am I better off using getattr within a try statement myself,
            > or is there some clever implementation enhancement which makes this a bad
            > idea?[/color]

            In rare cases, i. e. when attribute access has side effects, there are not
            only differences in performance, but also in the result:

            class AskMeOnce(objec t):
            def __getattribute_ _(self, name):
            result = object.__getatt ribute__(self, name)
            delattr(self, name)
            return result


            t = AskMeOnce()
            t.color = "into the blue"

            #raises an exception
            #if hasattr(t, "color"):
            # print t.color

            #works
            try:
            print t.color
            except AttributeError:
            pass

            :-)
            Peter

            Comment

            • Michael Hudson

              #7
              Re: Trivial performance questions

              "Brian Patterson" <bp@computastor e.com> writes:
              [color=blue][color=green][color=darkred]
              > >> I appreciate that these are trivial in the extreme, but I seem to be
              > >> writing dozens of them, and I may as well use the right one and
              > >> squeeze what performance I can.[/color][/color]
              >[color=green]
              > > This is an unhelpful attitude. You're writing in Python after all![/color]
              >
              > I have never considered using the fastest available option to be an
              > unhelpful attitude, especially when it does not impact on readability.[/color]

              That's not what you said!
              [color=blue]
              > It occurred to me that someone more knowledgable might know whether
              > there was a 'right' answer to these trivial questions.
              >
              > However, it appears not. Sorry for wasting your time :([/color]

              Well, it was hardly a waste.
              [color=blue]
              > Thanks for the tip on the getattr default. This is much cleaner to read,
              > almost certainly quicker, and will serve the purpose well. I had convinced
              > myself that it was not available in 2.1.3.[/color]

              No, I think it's (at least) 1.5.2 vintage...

              Cheers,
              mwh

              --
              ARTHUR: Why should a rock hum?
              FORD: Maybe it feels good about being a rock.
              -- The Hitch-Hikers Guide to the Galaxy, Episode 8

              Comment

              • Alex Martelli

                #8
                Re: Trivial performance questions

                Brian Patterson wrote:
                ...[color=blue]
                > newbie, but I really like the trapping facilities. I just worry about the
                > performance implications and memory usage of such things, especially since
                > I'm writing for Zope.
                >
                > And while I'm here: Is there a difference in performance when checking:
                > datum is None
                > over:
                > datum == None
                >
                > and similarly:
                > if x is None or y is None:
                > or:
                > if None in (x,y):
                >
                > I appreciate that these are trivial in the extreme, but I seem to be
                > writing dozens of them, and I may as well use the right one and squeeze
                > what performance I can.[/color]

                I see you've already been treated to almost all the standard "performanc e
                does not matter" arguments (pretty well presented). They're right (and
                I would have advanced them myself if others hadn't already done so quite
                competently), *BUT*...

                ....but, when you're wondering which of two equivalently readable and
                maintainable idioms is "the one obvious way to do it", there is
                nothing wrong with finding out the performance to help you. After
                all, which one is right is not necessarily obvious unless you're
                Dutch! To put it another way: there is nothing wrong in getting
                into the habit of always using one idiom over another when they appear
                to be equivalent; such stylistic uniformity can indeed often be
                preferable to choosing haphazardly in each case. And all other things
                being equal it IS better to choose, as one's habitual style, the
                microscopically faster one -- why not, after all?

                So, for this kind of tasks as well as for many others, what you
                need is timeit.py from Python 2.3. I'm not sure it's compatible
                with Python 2.1.3, which I understand you're constrained to use
                due to Zope -- I think so, but haven't tried. It's sure quite
                compatible with Python 2.2. I've copied it into my ~/bin and
                done a chmod+x, and now when I wonder about performance it's easy
                to check it (sometimes there are tricky parts, but not often); if
                I need to check for a specific release, I can explicitly say e.g.
                $ python2.2 ~/bin/timeit.py ...
                or whatever.

                So, for Python 2.3 on my machine:

                [alex@lancelot clean]$ timeit.py -c -s'datum=23' 'datum==None'
                1000000 loops, best of 3: 0.47 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'datum=23' 'datum is None'
                1000000 loops, best of 3: 0.29 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'datum=None' 'datum is None'
                1000000 loops, best of 3: 0.29 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'datum=None' 'datum == None'
                1000000 loops, best of 3: 0.41 usec per loop

                no doubt here, then: "datum is None" wins hands-down over
                "datum == None" whether datum is None or not. And indeed,
                it so happens that experienced Pythonistas generally prefer
                'is' for this specific test (this also has other reasons,
                such as the preference for words over punctuation, and the
                fact that if datum is an instance of a user-coded class
                there are no bounds to the complications its __eq__ or
                __cmp__ might cause, while 'is' doesn't run ANY such risk).

                Similarly:

                [alex@lancelot clean]$ timeit.py -c -s'x=1' -s'y='2 'None in (x,y)'
                1000000 loops, best of 3: 1 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'x=1' -s'y='2 'x is None or y is None'
                1000000 loops, best of 3: 0.48 usec per loop

                again, the form with more words and no punctuation (the more readable
                one by Pythonistas' usual tastes) is faster -- confirming it's the
                preferable style.

                These measurements also help put such things in perspective: we ARE
                after all talking about differences of 120 to 500 nanoseconds (on
                my 30-months-old box, a dinosaur by today's standards). Still, if
                they're executed in some busy inner loop, it MIGHT easily pile up to
                several milliseconds' worth, and, who knows - given that after all
                choosing a consistent style IS preferable, and that often the
                indications you get from these measurements will push you towards
                readability and Pythonicity, it doesn't seem a bad idea to me.

                Now, about the hasattr vs getattr issue...:

                [alex@lancelot clean]$ timeit.py -c 'hasattr([], "pop")'
                1000000 loops, best of 3: 0.95 usec per loop
                [alex@lancelot clean]$ timeit.py -c 'getattr([], "pop", None)'
                1000000 loops, best of 3: 1.11 usec per loop
                [alex@lancelot clean]$ timeit.py -c 'hasattr([], "pok")'
                100000 loops, best of 3: 2.4 usec per loop
                [alex@lancelot clean]$ timeit.py -c 'getattr([], "pok", None)'
                100000 loops, best of 3: 2.6 usec per loop

                you can see that three-args getattr always takes a tiny little
                bit longer than hasattr -- about 0.2 microseconds. More time for
                both when getting non-existent attributes, of course, since the
                exception is raised and handled in that case. But in any case,
                given that getattr has already done all the work you needed,
                while hasattr may be just the beginning (you still need to get
                the attribute if it's there), you also need to consider:

                [alex@lancelot clean]$ timeit.py -c '[].pop'
                1000000 loops, best of 3: 0.48 usec per loop

                and that attribute fetch consumes 2-3 times longer than the
                speed-up of hasattr vs 3-args getattr. So, if the attribute
                will be present at least 30%-50% of the time, we could expect
                3-attribute getattr to be a winner; for rarely present
                attributes, though, hasattr may still be faster (by a tiny
                little bit).

                We can also measure the try/except approach:


                [alex@lancelot clean]$ timeit.py -c '
                try: [].pop
                except AttributeError: pass
                '
                1000000 loops, best of 3: 0.6 usec per loop

                [alex@lancelot clean]$ timeit.py -c '
                try: [].pok
                except AttributeError: pass
                '
                100000 loops, best of 3: 8.1 usec per loop

                If the exception doesn't occur try/except is quite fast,
                but, if it does, it's far slower than any of the others.
                So, if performance matters, it should only be considered
                if the attribute is _overwhelmingly _ more likely to be
                present than absent.

                We can put together these solutions in small functions,
                e.g. a.py:

                def hasattr_pop(obj =[]):
                if hasattr(obj, 'pop'):
                return obj.pop
                else:
                return None

                def getattr_pop(obj =[]):
                return getattr(obj, 'pop', None)

                def tryexc_pop(obj=[]):
                try: return obj.pop
                except AttributeError: return None

                and similarly for pok instead of pop. Now:

                [alex@lancelot clean]$ timeit.py -c -s'import a' 'a.hasattr_pop( )'
                100000 loops, best of 3: 2.1 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'import a' 'a.getattr_pop( )'
                100000 loops, best of 3: 1.9 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'import a' 'a.tryexc_pop() '
                1000000 loops, best of 3: 1.46 usec per loop

                for an attribute that's present, small advantage to the try/except,
                getattr by a nose faster than the hasattr check. But:

                [alex@lancelot clean]$ timeit.py -c -s'import a' 'a.hasattr_pok( )'
                100000 loops, best of 3: 3.4 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'import a' 'a.getattr_pok( )'
                100000 loops, best of 3: 3.5 usec per loop
                [alex@lancelot clean]$ timeit.py -c -s'import a' 'a.tryexc_pok() '
                100000 loops, best of 3: 12.3 usec per loop

                here, by the tiniest of margin, hasattr beats getattr -- and
                try/except is the pits.

                So, in this case, I don't think a single approach can be
                universally recommended. getattr is most compact; but you
                should also keep in your quiver the try/except, for those
                extremely performance-sensitive cases where the attribute
                will almost always be present, AND the hasattr as the best
                compromise for attributes that are absent some resonable
                percentage of the time AND the default value takes effort to
                construct (by using None as the default we've favoured the
                getattr approach, which always constructs the default object,
                by giving it a very cheap-to-construct one -- try with some
                default object that DOES take work to build, and you'll see).

                That much being said, you'll almost always see me using
                getattr for this -- it's just too compact, handy, readable --
                I'll optimize it out only when dealing with a real bottleneck,
                or avoid using it when the effort of constructing the default
                object is "obviously" quite big.


                Alex

                Comment

                • Peter Hansen

                  #9
                  Re: Trivial performance questions

                  Alex Martelli wrote:[color=blue]
                  >
                  > So, for this kind of tasks as well as for many others, what you
                  > need is timeit.py from Python 2.3. I'm not sure it's compatible
                  > with Python 2.1.3, which I understand you're constrained to use
                  > due to Zope -- I think so, but haven't tried. It's sure quite
                  > compatible with Python 2.2.[/color]

                  It is also quite compatible with Python 2.0, based on running
                  successfully several of your following examples, so one would
                  assume it will also run fine with Python 2.1.

                  A quick inspection of the code backs up the empirical evidence,
                  showing no Python 2.2+ dependencies that don't have automatic
                  fallbacks (as with the attempt to include itertools).

                  (Thanks for the tutorial on timeit.py Alex. I've finally stuck
                  it in all my older Python installations, after your repeated
                  helpful promptings!)

                  -Peter

                  Comment

                  • Geoff Gerrietts

                    #10
                    Re: Trivial performance questions

                    Quoting Brian Patterson (bp@computastor e.com):[color=blue]
                    > I have noticed in the book of words that hasattr works by calling getattr
                    > and raising an exception if no such attribute exists. If I need the value
                    > in any case, am I better off using getattr within a try statement myself, or
                    > is there some clever implementation enhancement which makes this a bad idea?
                    >
                    > i.e. should I prefer:
                    > if hasattr(self,"d atum"):
                    > datum=getattr(" datum")
                    > else:
                    > datum=None
                    > self.datum=None
                    >
                    > over:
                    > try:
                    > datum=self.geta ttr("datum")
                    > except:
                    > self.datum=None
                    > datum=None
                    >
                    > The concept of deliberately raising an error is still foreign to
                    > this python newbie, but I really like the trapping facilities. I
                    > just worry about the performance implications and memory usage of
                    > such things, especially since I'm writing for Zope.[/color]

                    Generally prefer
                    d = getattr(self, "datum", None)
                    if d is None:
                    self.datum = None

                    This won't always result in the fastest code though. In particular,
                    there's a slight performance edge to be had if you're doing lots of
                    lookups (many tens of thousands) and you expect a low percentage
                    (single digit? is that too many?) to be None. Then try/except becomes
                    faster. If you think you're likely to be in this situation, it should
                    be pretty trivial to write some test cases to find the actual tradeoff
                    point.
                    [color=blue]
                    > And while I'm here: Is there a difference in performance when checking:
                    > datum is None
                    > over:
                    > datum == None[/color]

                    Generally prefer the former, but the difference is likely to be masked
                    by other factors.
                    [color=blue]
                    > and similarly:
                    > if x is None or y is None:
                    > or:
                    > if None in (x,y):[/color]

                    I've preferred the latter thinking it was less work on the
                    interpreter, under the general premise that the code for the "in"
                    operation was one swatch of C, while is / or / is was three different
                    swatches of C with "python internals" gluing them together. My
                    analysis is obviously pretty surface though.
                    [color=blue]
                    > I appreciate that these are trivial in the extreme, but I seem to be
                    > writing dozens of them, and I may as well use the right one and
                    > squeeze what performance I can.[/color]

                    Maybe I'm a heretic, but I think this is a healthy attitude to have.
                    If you can write it optimally the first time with no significant
                    increase in effort, then nobody's going to hafta come back and rewrite
                    it later: that's a big maintenance win.

                    --G.

                    --
                    Geoff Gerrietts <geoff at gerrietts net>
                    "A man can't be too careful in the choice of his enemies." --Oscar Wilde

                    Comment

                    • Peter Hansen

                      #11
                      Re: Trivial performance questions

                      Geoff Gerrietts wrote:[color=blue]
                      >
                      > Maybe I'm a heretic, but I think this is a healthy attitude to have.
                      > If you can write it optimally the first time with no significant
                      > increase in effort, then nobody's going to hafta come back and rewrite
                      > it later: that's a big maintenance win.[/color]

                      Not unless you add Alex' constraint that the two alternatives under
                      consideration are equally readable. Otherwise the less readable one
                      is always going to cost you more at maintenance time. And I'd add
                      my own constraint that you actually have to *need* the speed. Otherwise
                      even the "insignific ant" increase in effort that it will cost you will
                      not be paying for itself.



                      Making it right means making it readable too. Optimization should
                      always come later, and not at all if you don't actually need it.

                      My group has invested almost thirty person-years writing Python code in
                      the last few years. To the best of my ability to recall, only two of
                      the tasks we've worked on in that time was directly related to
                      performance concerns and the resulting optimization for speed. Given
                      that the combined optimization efforts consumed perhaps a few weeks
                      of our time, we spend something like 0.4% of our time focusing on
                      performance. This seems to me a healthy amount.

                      (Curiously enough, when we coded more in C, I suspect we spent a
                      substantially larger amount of time caught up in performance issues.
                      This change is due merely to greater experience, not because of
                      the change in language, though the two are related.)

                      -Peter

                      Comment

                      • Geoff Gerrietts

                        #12
                        Re: Trivial performance questions

                        Quoting Peter Hansen (peter@engcorp. com):[color=blue]
                        >
                        > Not unless you add Alex' constraint that the two alternatives under
                        > consideration are equally readable. Otherwise the less readable one
                        > is always going to cost you more at maintenance time.[/color]

                        Yes to your first sentence, not so sure to the second. The implication
                        is the code will always be touched, and my contention is that if you
                        don't pay at least trivial attention to writing something optimal --
                        includes avoiding geometric algorithms -- then you're significantly
                        increasing the amount of maintenance work necessary.

                        Example: pulling out list.sort(lambd a x, y: cmp(x[0],y[0])) and
                        putting in an abstract transform_sort is /only responsible/. The
                        list.sort(calla ble) idiom might be more readable to a novice -- it has
                        been to the novices I've worked with -- but its performance
                        implications on nontrivial lists are astonishing.
                        [color=blue]
                        > And I'd add my own constraint that you actually have to *need* the
                        > speed. Otherwise even the "insignific ant" increase in effort that
                        > it will cost you will not be paying for itself.[/color]

                        Capitalism has bred a real reliance on "good enough": when you hit
                        your payoff point, you don't go any farther. It's a useful metric to
                        apply, but a dangerous premise to base all your decisions on. "Good
                        enough" needs to be critically evaluated for both the short term and
                        the long term.

                        A half-million micro-optimizations may not pay for themselves
                        individually. But in the long term, when confronted with a total
                        system rewrite because the collected work can no longer perform
                        adequately, and standard optimization techniques have met with
                        diminishing returns, you're going to regret not having paid attention
                        the first time through, when you didn't hafta re-teach yourself what
                        the code is doing. The little bits where you're just /paying
                        attention/ to the performance implications of what you're doing
                        aggregate over time to reduce the maintenance overhead.
                        [color=blue]
                        > http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast[/color]

                        It's an interesting formulation but it stinks of propaganda to me.
                        When generic catchphrases are re-interpreted by almost every viewer
                        its a pretty fair bet they're not precise enough to be really useful.
                        The discussion on this page makes me think of Biblical scholars
                        debating the meaning of ambiguous passages.
                        [color=blue]
                        > Making it right means making it readable too. Optimization should
                        > always come later, and not at all if you don't actually need it.[/color]

                        I won't disagree with that.
                        [color=blue]
                        > My group has invested almost thirty person-years writing Python code in
                        > the last few years. To the best of my ability to recall, only two of
                        > the tasks we've worked on in that time was directly related to
                        > performance concerns and the resulting optimization for speed. Given
                        > that the combined optimization efforts consumed perhaps a few weeks
                        > of our time, we spend something like 0.4% of our time focusing on
                        > performance. This seems to me a healthy amount.[/color]

                        My group has invested probably something like 15 person-years writing
                        Python code in the last few years. We have probably put about one of
                        those person years into trying to account for performance bottlenecks.
                        Management is presently of the opinion that a drastic rewrite is the
                        only way to resolve the remaining issues. Perhaps the most distinct
                        difference between your group and mine is that many of our developers
                        are fairly novice, and prone to select solutions that are not
                        well-informed about performance issues and algorithm complexity. On
                        the other hand, maybe our code is just more heavily used?
                        [color=blue]
                        > (Curiously enough, when we coded more in C, I suspect we spent a
                        > substantially larger amount of time caught up in performance issues.
                        > This change is due merely to greater experience, not because of
                        > the change in language, though the two are related.)[/color]

                        Yes. Younger engineers tend to emphasize performance too much, because
                        it's a huge nebulous area that they don't understand, and which may
                        well bite them in the ass HARD. Older engineers can automatically
                        navigate through the most dangerous fields of landmines, and tend to
                        underemphasize performance too much, because the most important
                        aspects are habit and the less important aspects can be safely
                        ignored.

                        At first blush, I thought "maybe there's an equilibrium that needs to
                        be found". But I don't think so now. I think it's important for
                        younger (intermediate?) developers to be obsessed with performance, so
                        they can learn the dangers of bad algorithms, how to recognize them,
                        how to avoid them. And it's worth building good habits where you
                        choose an optimal idiom rather than a slower one.

                        You can disagree, but I've done a lot of reading and thinking on the
                        matter, in part because my experience and my beliefs have been at odds
                        in the past. Consequently, you're going to hafta try harder than
                        invoking the divine authority of Kent Beck (or even Knuth!) to
                        persuade me. Still, I can yet be persuaded; my mind is quite
                        tractable.

                        --G.

                        --
                        Geoff Gerrietts "I don't think it's immoral to want to
                        <geoff at gerrietts net> make money." -- Guido van Rossum

                        Comment

                        • Peter Hansen

                          #13
                          Re: Trivial performance questions

                          Geoff Gerrietts wrote:[color=blue]
                          >
                          > Quoting Peter Hansen (peter@engcorp. com):[color=green]
                          > >
                          > > Not unless you add Alex' constraint that the two alternatives under
                          > > consideration are equally readable. Otherwise the less readable one
                          > > is always going to cost you more at maintenance time.[/color]
                          >
                          > Yes to your first sentence, not so sure to the second. The implication
                          > is the code will always be touched, and my contention is that if you
                          > don't pay at least trivial attention to writing something optimal --
                          > includes avoiding geometric algorithms -- then you're significantly
                          > increasing the amount of maintenance work necessary.[/color]

                          I won't disagree with most of that (we're rapidly reaching near total
                          agreement here! :-) but I do think that assuming "the code will always
                          be touched" is a very healthy attitude, in the same way you think that
                          at least trivial attention to performance is a healthy attitude.

                          We certainly have code that hasn't been touched during maintenance,
                          but nobody could have predicted which areas of the code that would be.
                          [color=blue]
                          > Capitalism has bred a real reliance on "good enough": when you hit
                          > your payoff point, you don't go any farther. It's a useful metric to
                          > apply, but a dangerous premise to base all your decisions on. "Good
                          > enough" needs to be critically evaluated for both the short term and
                          > the long term.[/color]

                          As an XP team, we tend to consider that critical evaluation to be
                          the domain of the customer, so we basically don't worry about it
                          until there is feedback that we're doing the wrong thing. This,
                          in cooperation with the customer, makes the best use of the our
                          resources (for which the customer is paying, in effect). But,
                          yeah, that's just the XP view of things.
                          [color=blue]
                          > A half-million micro-optimizations may not pay for themselves[/color]

                          Phew! I seriously hope your group hasn't examined that many
                          pieces of code with performance concerns in mind! We don't have
                          even that many lines of code, let alone areas that could be
                          micro-optimized.
                          [color=blue]
                          > individually. But in the long term, when confronted with a total
                          > system rewrite because the collected work can no longer perform
                          > adequately, and standard optimization techniques have met with
                          > diminishing returns, you're going to regret not having paid attention
                          > the first time through,[/color]

                          There's some truth in that, but I can't shake the nagging feeling
                          that simply by using Python, we've moved into a realm where the
                          best way to optimize a serious problem area is to rewrite in C
                          or Pyrex, or get a faster processor. (Like you, I can be
                          persuaded, but this is what _my_ experience has taught me.)
                          [color=blue][color=green]
                          > > http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast[/color]
                          >
                          > It's an interesting formulation but it stinks of propaganda to me.
                          > When generic catchphrases are re-interpreted by almost every viewer
                          > its a pretty fair bet they're not precise enough to be really useful.
                          > The discussion on this page makes me think of Biblical scholars
                          > debating the meaning of ambiguous passages.[/color]

                          Actually, it's probably just that re-interpretation and discussion
                          which proves so very useful, not the phrase itself. Like a Zen
                          koan or something, it's too short (or ambiguous) to have direct,
                          hard meaning, but the meme it carries is a valuable one with which
                          to be infected. ;-)

                          The same probably holds true about ambiguous biblical passages,
                          I hate to admit.
                          [color=blue]
                          > My group has invested probably something like 15 person-years writing
                          > Python code in the last few years. We have probably put about one of
                          > those person years into trying to account for performance bottlenecks.
                          > Management is presently of the opinion that a drastic rewrite is the
                          > only way to resolve the remaining issues. Perhaps the most distinct
                          > difference between your group and mine is that many of our developers
                          > are fairly novice, and prone to select solutions that are not
                          > well-informed about performance issues and algorithm complexity. On
                          > the other hand, maybe our code is just more heavily used?[/color]

                          I'd vote for the latter. My group has been heavily junior in flavour.
                          Perhaps another cause of the difference is our greater (?) emphasis
                          on XP and test-driven development? I doubt anyone could say, but
                          for sure your code is more heavily used. I don't even need to know
                          what it does to say that. :-)

                          Maybe one example: we used += with strings a lot in the early days.
                          Partly junior developers, a greater part due to inexperience with
                          Python. I think only one or two bits of our code has been re-written
                          to use [].append() and ''.join() instead, because only those bits
                          came to the fore when performance was an issue. The rest is still
                          merrily chewing up CPU time doing wasteful += on strings, but nobody
                          cares. We refactor that (for consistency, mainly, I think) when we
                          get to them for other reasons, and new code probably doesn't use +=
                          so much, but that's about the extent of it.
                          [color=blue]
                          > At first blush, I thought "maybe there's an equilibrium that needs to
                          > be found". But I don't think so now. I think it's important for
                          > younger (intermediate?) developers to be obsessed with performance, so
                          > they can learn the dangers of bad algorithms, how to recognize them,
                          > how to avoid them. And it's worth building good habits where you
                          > choose an optimal idiom rather than a slower one.[/color]

                          I would agree that new developers would benefit from that kind of
                          experience. One of the few reasons why a (good) university or
                          college education can be of value to a programmer. So can critical
                          reading of some decent books or web pages on the topic.
                          [color=blue]
                          > You can disagree, but I've done a lot of reading and thinking on the
                          > matter, in part because my experience and my beliefs have been at odds
                          > in the past. Consequently, you're going to hafta try harder than
                          > invoking the divine authority of Kent Beck (or even Knuth!) to
                          > persuade me. Still, I can yet be persuaded; my mind is quite
                          > tractable.[/color]

                          I think Kent is merely on a par with the Pope, but is not Himself
                          divine. ;-) Knuth is another story, perhaps. :-)

                          -Peter

                          Comment

                          • Paul Rubin

                            #14
                            Re: Trivial performance questions

                            Peter Hansen <peter@engcorp. com> writes:[color=blue][color=green]
                            > > individually. But in the long term, when confronted with a total
                            > > system rewrite because the collected work can no longer perform
                            > > adequately, and standard optimization techniques have met with
                            > > diminishing returns, you're going to regret not having paid attention
                            > > the first time through,[/color]
                            >
                            > There's some truth in that, but I can't shake the nagging feeling
                            > that simply by using Python, we've moved into a realm where the
                            > best way to optimize a serious problem area is to rewrite in C
                            > or Pyrex, or get a faster processor. (Like you, I can be
                            > persuaded, but this is what _my_ experience has taught me.)[/color]

                            That's not always either feasible or desirable. For example, I once
                            worked on the user interface of an ATM switch. It had to display a
                            connection list in sorted order, when they were stored in memory in
                            random order. It did this by finding the smallest numbered
                            connection, then the next smallest, etc., an O(N**2) algorithm which
                            worked fine when the switch was originally designed and could handle
                            no more than 16 connections or something like that, but which ate a
                            lot of not-too-plentiful embedded cpu time when hardware enhancements
                            made hundreds of connections possible. OK, you say, rip out that
                            algorithm and put in a better one. The problem is that the "sorting"
                            code was intimately intermixed with the selection code which banged on
                            the hardware registers and dealt with all kinds of fault conditions,
                            and the display code, which was knee deep in formatting cruft, and had
                            grown like a jungle over years of maintenance as new releases of the
                            hardware kept sprouting new features. In short it was typical
                            embedded code written by electrical (i.e. hardware) engineers who,
                            while they were not stupid people, just didn't have much understanding
                            of software technology or methodology. We are not talking about some
                            three-line loop like

                            concat = ''
                            for s in stringlist:
                            concat += s

                            that can be rewritten into a ''.join call. This UI module was 5000 or
                            so lines of extremely crufty code and there was no way to fix it
                            without a total rewrite. And a total rewrite couldn't ever be
                            scheduled, because there were always too many fires to put out in the
                            product. The module therefore got worse and worse. So that's a
                            real-world example of where a little bit more up-front design caution
                            would have saved an incredible amount of headache for years to come.

                            And sure, there are all kinds of methodological platitudes about how
                            to stop that situation from happening, but they are based on wishful
                            thinking. They just do not always fit the real-world constraints that
                            real projects find imposed on them (e.g. that a complicated hardware
                            product is staffed mostly by hardware engineers, who bang out "grunt"
                            code without too much sense of how to organize large programs). All
                            you can do is recognize that you have a little bit of programming
                            sophistication available, and try to maximize your leverage in
                            applying it where it makes the most difference. Regardless of what
                            one thinks of C++, reading Stroustrup's C++ book after being through
                            experiences like the above makes it clear Stroustrup had had similar
                            experiences. It's visible in his book, how various design choices of
                            C++ were motivated by the tensions inherent in those experiences.

                            Comment

                            • Geoff Gerrietts

                              #15
                              Re: Trivial performance questions

                              Quoting Peter Hansen (peter@engcorp. com):[color=blue]
                              >
                              > I won't disagree with most of that (we're rapidly reaching near total
                              > agreement here! :-) but I do think that assuming "the code will always
                              > be touched" is a very healthy attitude, in the same way you think that
                              > at least trivial attention to performance is a healthy attitude.[/color]

                              Yes, I think we're pretty close to in accord here.
                              [color=blue]
                              > As an XP team, we tend to consider that critical evaluation to be
                              > the domain of the customer, so we basically don't worry about it
                              > until there is feedback that we're doing the wrong thing. This,
                              > in cooperation with the customer, makes the best use of the our
                              > resources (for which the customer is paying, in effect). But,
                              > yeah, that's just the XP view of things.[/color]

                              And I'm working from the perspective of an internal customer. But I
                              also think that with an external customer; special care ought to be
                              paid to those pieces of software that you don't plan to live
                              exclusively inside the project.
                              [color=blue][color=green]
                              > > A half-million micro-optimizations may not pay for themselves[/color]
                              >
                              > Phew! I seriously hope your group hasn't examined that many
                              > pieces of code with performance concerns in mind! We don't have
                              > even that many lines of code, let alone areas that could be
                              > micro-optimized.[/color]

                              ....well, no, we haven't. But we are approaching that many lines of
                              code. And a good deal of it is naive code, none of which we will be
                              able to reclaim the lost performance from without more profound reason
                              to refactor. Some of it we probably should, but it's a challenge to
                              effectively profile our code.
                              [color=blue]
                              > There's some truth in that, but I can't shake the nagging feeling
                              > that simply by using Python, we've moved into a realm where the
                              > best way to optimize a serious problem area is to rewrite in C
                              > or Pyrex, or get a faster processor. (Like you, I can be
                              > persuaded, but this is what _my_ experience has taught me.)[/color]

                              Probably some truth in that, too.
                              [color=blue]
                              > Actually, it's probably just that re-interpretation and discussion
                              > which proves so very useful, not the phrase itself. Like a Zen
                              > koan or something, it's too short (or ambiguous) to have direct,
                              > hard meaning, but the meme it carries is a valuable one with which
                              > to be infected. ;-)
                              >
                              > The same probably holds true about ambiguous biblical passages,
                              > I hate to admit.[/color]

                              There's an ambiguous koan-like meme that I like to break out now and
                              again -- I think it's due to Robert Anton Wilson but the years have
                              not been kind to my respect for authority:

                              Any proposition is true in some way, false in some way, and in some
                              way not pertinent to the matter at hand at all.

                              Spend enough time with the meme and it justifies both sides of the
                              discussion.
                              [color=blue]
                              > I'd vote for the latter. My group has been heavily junior in
                              > flavour. Perhaps another cause of the difference is our greater (?)
                              > emphasis on XP and test-driven development? I doubt anyone could
                              > say, but for sure your code is more heavily used. I don't even need
                              > to know what it does to say that. :-)[/color]

                              I'll believe you. :) We've scaled up to the point where we're happy
                              but bursting at the seams.
                              [color=blue]
                              > I would agree that new developers would benefit from that kind of
                              > experience. One of the few reasons why a (good) university or
                              > college education can be of value to a programmer. So can critical
                              > reading of some decent books or web pages on the topic.[/color]

                              Yes. It's something of a rite of passage, in some ways. And maybe the
                              right way to respond to optimization questions is "focus on
                              algorithms, and learn which built-in constructs use lousy ones". I'm
                              not sure, but I find "thinking about optimization before your
                              processors melt is premature" to be more than a little disingenuous.
                              [color=blue]
                              > I think Kent is merely on a par with the Pope, but is not Himself
                              > divine. ;-) Knuth is another story, perhaps. :-)[/color]

                              Great minds, but human -- all too human. ;)

                              --G.

                              --
                              Geoff Gerrietts <geoff at gerrietts net> http://www.gerrietts.net/
                              "Now, now my good man, this is no time for making enemies."
                              --Voltaire, on his deathbed, when asked to renounce Satan

                              Comment

                              Working...