Trivial performance questions

**Michael Hudson** · Jul 18 '05, 03:56 AM

Re: Trivial performance questions

"Brian Patterson" <bp@computastor e.com> writes:
[color=blue]
> I have noticed in the book of words that hasattr works by calling
> getattr and raising an exception if no such attribute exists. If I
> need the value in any case, am I better off using getattr within a
> try statement myself, or is there some clever implementation
> enhancement which makes this a bad idea?
>
> i.e. should I prefer:
> if hasattr(self,"d atum"):
> datum=getattr(" datum")
> else:
> datum=None
> self.datum=None
>
> over:
> try:
> datum=self.geta ttr("datum")
> except:[/color]

Don't do that, do "except AttributeError: " instead.
[color=blue]
> self.datum=None
> datum=None
>
> The concept of deliberately raising an error is still foreign to
> this python newbie, but I really like the trapping facilities. I
> just worry about the performance implications and memory usage of
> such things, especially since I'm writing for Zope.[/color]

The answer to which of these runs the fastest depends on how often you
expect the attribute to exist. If it's not that often, the first form
will probably be quicker, if not the second.

However, there's a third form:

datum = getattr(self, "datum", None)

which is what you want here.
[color=blue]
> And while I'm here: Is there a difference in performance when checking:
> datum is None
> over:
> datum == None[/color]

Yes (I think...), but you shouldn't care much about it.
[color=blue]
> and similarly:
> if x is None or y is None:
> or:
> if None in (x,y):[/color]

Pass. Time it if you really care (my bet's on the former being
quicker).
[color=blue]
> I appreciate that these are trivial in the extreme, but I seem to be
> writing dozens of them, and I may as well use the right one and
> squeeze what performance I can.[/color]

This is an unhelpful attitude. You're writing in Python after all!

If profile shows some of this code to be a hotspot, *then* and only
then is it an appropriate time to worry about such trivial performance
gains.

Cheers,
mwh

--
The only problem with Microsoft is they just have no taste.
-- Steve Jobs, (From _Triumph of the Nerds_ PBS special)
and quoted by Aahz on comp.lang.pytho n

**Peter Hansen** · Jul 18 '05, 03:56 AM

Re: Trivial performance questions

Michael Hudson wrote:[color=blue]
>
> If profile shows some of this code to be a hotspot, *then* and only
> then is it an appropriate time to worry about such trivial performance
> gains.[/color]

And never forget to include the second criterion for bothering to
worry about performance: the code does not meet its performance
requirements.

Even if profiling shows you a hotspot (as it almost always would),
you are still wasting your time if you don't actually *need* the
code to be faster. Shaving a few seconds of runtime of a program
that takes a minute to run is likely to be a waste of your time
in the long run, especially when you consider how many times the
program will have to be run to pay back the investment in
optimization and the resultant increase in maintenance costs.

And remember that use of Python in the first place includes an
implicit acceptance that performance is not your biggest concern.

-Peter

**Duncan Booth** · Jul 18 '05, 03:56 AM

Re: Trivial performance questions

"Brian Patterson" <bp@computastor e.com> wrote in
news:bmopfb$sb5 $1@titan.btinte rnet.com:
[color=blue]
> I have noticed in the book of words that hasattr works by calling
> getattr and raising an exception if no such attribute exists. If I
> need the value in any case, am I better off using getattr within a try
> statement myself, or is there some clever implementation enhancement
> which makes this a bad idea?[/color]

If you were to inline the code, but you check the existence of an attribute
in more than one place, then you would naturally extract the duplicated
code out into a single function which you call from each location. The
presence of 'hasattr' means the potentially duplicated code is already
extracted into a support function, so you should use it *where appropriate*
to make your code shorter & easier to read.[color=blue]
>
> i.e. should I prefer:
> if hasattr(self,"d atum"):
> datum=getattr(" datum")
> else:
> datum=None
> self.datum=None
>
> over:
> try:
> datum=self.geta ttr("datum")
> except:
> self.datum=None
> datum=None
>[/color]
Probably you should prefer:

datum = getattr(self, "datum", None)

although this doesn't have the side effect of setting self.datum if it was
unset. Alternatively you could set self.datum every time with:

self.datum = datum = getattr(self, "datum", None)

--
Duncan Booth duncan@rcp.co.u k
int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
"\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

**Brian Patterson** · Jul 18 '05, 03:56 AM

Re: Trivial performance questions

>> I appreciate that these are trivial in the extreme, but I seem to be[color=blue][color=green]
>> writing dozens of them, and I may as well use the right one and
>> squeeze what performance I can.[/color][/color]
[color=blue]
> This is an unhelpful attitude. You're writing in Python after all![/color]

I have never considered using the fastest available option to be an
unhelpful attitude, especially when it does not impact on readability. It
occurred to me that someone more knowledgable might know whether there was a
'right' answer to these trivial questions.

However, it appears not. Sorry for wasting your time :(

Thanks for the tip on the getattr default. This is much cleaner to read,
almost certainly quicker, and will serve the purpose well. I had convinced
myself that it was not available in 2.1.3.

**Peter Otten** · Jul 18 '05, 03:56 AM

Re: Trivial performance questions

Brian Patterson wrote:
[color=blue]
> I have noticed in the book of words that hasattr works by calling getattr
> and raising an exception if no such attribute exists. If I need the value
> in any case, am I better off using getattr within a try statement myself,
> or is there some clever implementation enhancement which makes this a bad
> idea?[/color]

In rare cases, i. e. when attribute access has side effects, there are not
only differences in performance, but also in the result:

class AskMeOnce(objec t):
def __getattribute_ _(self, name):
result = object.__getatt ribute__(self, name)
delattr(self, name)
return result

t = AskMeOnce()
t.color = "into the blue"

#raises an exception
#if hasattr(t, "color"):
# print t.color

#works
try:
print t.color
except AttributeError:
pass

:-)
Peter

**Michael Hudson** · Jul 18 '05, 03:56 AM

Re: Trivial performance questions

"Brian Patterson" <bp@computastor e.com> writes:
[color=blue][color=green][color=darkred]
> >> I appreciate that these are trivial in the extreme, but I seem to be
> >> writing dozens of them, and I may as well use the right one and
> >> squeeze what performance I can.[/color][/color]
>[color=green]
> > This is an unhelpful attitude. You're writing in Python after all![/color]
>
> I have never considered using the fastest available option to be an
> unhelpful attitude, especially when it does not impact on readability.[/color]

That's not what you said!
[color=blue]
> It occurred to me that someone more knowledgable might know whether
> there was a 'right' answer to these trivial questions.
>
> However, it appears not. Sorry for wasting your time :([/color]

Well, it was hardly a waste.
[color=blue]
> Thanks for the tip on the getattr default. This is much cleaner to read,
> almost certainly quicker, and will serve the purpose well. I had convinced
> myself that it was not available in 2.1.3.[/color]

No, I think it's (at least) 1.5.2 vintage...

Cheers,
mwh

--
ARTHUR: Why should a rock hum?
FORD: Maybe it feels good about being a rock.
-- The Hitch-Hikers Guide to the Galaxy, Episode 8

**Alex Martelli** · Jul 18 '05, 03:56 AM

Re: Trivial performance questions

Brian Patterson wrote:
...[color=blue]
> newbie, but I really like the trapping facilities. I just worry about the
> performance implications and memory usage of such things, especially since
> I'm writing for Zope.
>
> And while I'm here: Is there a difference in performance when checking:
> datum is None
> over:
> datum == None
>
> and similarly:
> if x is None or y is None:
> or:
> if None in (x,y):
>
> I appreciate that these are trivial in the extreme, but I seem to be
> writing dozens of them, and I may as well use the right one and squeeze
> what performance I can.[/color]

I see you've already been treated to almost all the standard "performanc e
does not matter" arguments (pretty well presented). They're right (and
I would have advanced them myself if others hadn't already done so quite
competently), *BUT*...

....but, when you're wondering which of two equivalently readable and
maintainable idioms is "the one obvious way to do it", there is
nothing wrong with finding out the performance to help you. After
all, which one is right is not necessarily obvious unless you're
Dutch! To put it another way: there is nothing wrong in getting
into the habit of always using one idiom over another when they appear
to be equivalent; such stylistic uniformity can indeed often be
preferable to choosing haphazardly in each case. And all other things
being equal it IS better to choose, as one's habitual style, the
microscopically faster one -- why not, after all?

So, for this kind of tasks as well as for many others, what you
need is timeit.py from Python 2.3. I'm not sure it's compatible
with Python 2.1.3, which I understand you're constrained to use
due to Zope -- I think so, but haven't tried. It's sure quite
compatible with Python 2.2. I've copied it into my ~/bin and
done a chmod+x, and now when I wonder about performance it's easy
to check it (sometimes there are tricky parts, but not often); if
I need to check for a specific release, I can explicitly say e.g.
$ python2.2 ~/bin/timeit.py ...
or whatever.

So, for Python 2.3 on my machine:

[alex@lancelot clean]$ timeit.py -c -s'datum=23' 'datum==None'
1000000 loops, best of 3: 0.47 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'datum=23' 'datum is None'
1000000 loops, best of 3: 0.29 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'datum=None' 'datum is None'
1000000 loops, best of 3: 0.29 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'datum=None' 'datum == None'
1000000 loops, best of 3: 0.41 usec per loop

no doubt here, then: "datum is None" wins hands-down over
"datum == None" whether datum is None or not. And indeed,
it so happens that experienced Pythonistas generally prefer
'is' for this specific test (this also has other reasons,
such as the preference for words over punctuation, and the
fact that if datum is an instance of a user-coded class
there are no bounds to the complications its __eq__ or
__cmp__ might cause, while 'is' doesn't run ANY such risk).

Similarly:

[alex@lancelot clean]$ timeit.py -c -s'x=1' -s'y='2 'None in (x,y)'
1000000 loops, best of 3: 1 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'x=1' -s'y='2 'x is None or y is None'
1000000 loops, best of 3: 0.48 usec per loop

again, the form with more words and no punctuation (the more readable
one by Pythonistas' usual tastes) is faster -- confirming it's the
preferable style.

These measurements also help put such things in perspective: we ARE
after all talking about differences of 120 to 500 nanoseconds (on
my 30-months-old box, a dinosaur by today's standards). Still, if
they're executed in some busy inner loop, it MIGHT easily pile up to
several milliseconds' worth, and, who knows - given that after all
choosing a consistent style IS preferable, and that often the
indications you get from these measurements will push you towards
readability and Pythonicity, it doesn't seem a bad idea to me.

Now, about the hasattr vs getattr issue...:

[alex@lancelot clean]$ timeit.py -c 'hasattr([], "pop")'
1000000 loops, best of 3: 0.95 usec per loop
[alex@lancelot clean]$ timeit.py -c 'getattr([], "pop", None)'
1000000 loops, best of 3: 1.11 usec per loop
[alex@lancelot clean]$ timeit.py -c 'hasattr([], "pok")'
100000 loops, best of 3: 2.4 usec per loop
[alex@lancelot clean]$ timeit.py -c 'getattr([], "pok", None)'
100000 loops, best of 3: 2.6 usec per loop

you can see that three-args getattr always takes a tiny little
bit longer than hasattr -- about 0.2 microseconds. More time for
both when getting non-existent attributes, of course, since the
exception is raised and handled in that case. But in any case,
given that getattr has already done all the work you needed,
while hasattr may be just the beginning (you still need to get
the attribute if it's there), you also need to consider:

[alex@lancelot clean]$ timeit.py -c '[].pop'
1000000 loops, best of 3: 0.48 usec per loop

and that attribute fetch consumes 2-3 times longer than the
speed-up of hasattr vs 3-args getattr. So, if the attribute
will be present at least 30%-50% of the time, we could expect
3-attribute getattr to be a winner; for rarely present
attributes, though, hasattr may still be faster (by a tiny
little bit).

We can also measure the try/except approach:

[alex@lancelot clean]$ timeit.py -c '
try: [].pop
except AttributeError: pass
'
1000000 loops, best of 3: 0.6 usec per loop

[alex@lancelot clean]$ timeit.py -c '
try: [].pok
except AttributeError: pass
'
100000 loops, best of 3: 8.1 usec per loop

If the exception doesn't occur try/except is quite fast,
but, if it does, it's far slower than any of the others.
So, if performance matters, it should only be considered
if the attribute is _overwhelmingly _ more likely to be
present than absent.

We can put together these solutions in small functions,
e.g. a.py:

def hasattr_pop(obj =[]):
if hasattr(obj, 'pop'):
return obj.pop
else:
return None

def getattr_pop(obj =[]):
return getattr(obj, 'pop', None)

def tryexc_pop(obj=[]):
try: return obj.pop
except AttributeError: return None

and similarly for pok instead of pop. Now:

[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.hasattr_pop( )'
100000 loops, best of 3: 2.1 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.getattr_pop( )'
100000 loops, best of 3: 1.9 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.tryexc_pop() '
1000000 loops, best of 3: 1.46 usec per loop

for an attribute that's present, small advantage to the try/except,
getattr by a nose faster than the hasattr check. But:

[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.hasattr_pok( )'
100000 loops, best of 3: 3.4 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.getattr_pok( )'
100000 loops, best of 3: 3.5 usec per loop
[alex@lancelot clean]$ timeit.py -c -s'import a' 'a.tryexc_pok() '
100000 loops, best of 3: 12.3 usec per loop

here, by the tiniest of margin, hasattr beats getattr -- and
try/except is the pits.

So, in this case, I don't think a single approach can be
universally recommended. getattr is most compact; but you
should also keep in your quiver the try/except, for those
extremely performance-sensitive cases where the attribute
will almost always be present, AND the hasattr as the best
compromise for attributes that are absent some resonable
percentage of the time AND the default value takes effort to
construct (by using None as the default we've favoured the
getattr approach, which always constructs the default object,
by giving it a very cheap-to-construct one -- try with some
default object that DOES take work to build, and you'll see).

That much being said, you'll almost always see me using
getattr for this -- it's just too compact, handy, readable --
I'll optimize it out only when dealing with a real bottleneck,
or avoid using it when the effort of constructing the default
object is "obviously" quite big.

Alex

**Peter Hansen** · Jul 18 '05, 03:57 AM

Re: Trivial performance questions

Alex Martelli wrote:[color=blue]
>
> So, for this kind of tasks as well as for many others, what you
> need is timeit.py from Python 2.3. I'm not sure it's compatible
> with Python 2.1.3, which I understand you're constrained to use
> due to Zope -- I think so, but haven't tried. It's sure quite
> compatible with Python 2.2.[/color]

It is also quite compatible with Python 2.0, based on running
successfully several of your following examples, so one would
assume it will also run fine with Python 2.1.

A quick inspection of the code backs up the empirical evidence,
showing no Python 2.2+ dependencies that don't have automatic
fallbacks (as with the attempt to include itertools).

(Thanks for the tutorial on timeit.py Alex. I've finally stuck
it in all my older Python installations, after your repeated
helpful promptings!)

-Peter

**Geoff Gerrietts** · Jul 18 '05, 03:57 AM

Re: Trivial performance questions

Quoting Brian Patterson (bp@computastor e.com):[color=blue]
> I have noticed in the book of words that hasattr works by calling getattr
> and raising an exception if no such attribute exists. If I need the value
> in any case, am I better off using getattr within a try statement myself, or
> is there some clever implementation enhancement which makes this a bad idea?
>
> i.e. should I prefer:
> if hasattr(self,"d atum"):
> datum=getattr(" datum")
> else:
> datum=None
> self.datum=None
>
> over:
> try:
> datum=self.geta ttr("datum")
> except:
> self.datum=None
> datum=None
>
> The concept of deliberately raising an error is still foreign to
> this python newbie, but I really like the trapping facilities. I
> just worry about the performance implications and memory usage of
> such things, especially since I'm writing for Zope.[/color]

Generally prefer
d = getattr(self, "datum", None)
if d is None:
self.datum = None

This won't always result in the fastest code though. In particular,
there's a slight performance edge to be had if you're doing lots of
lookups (many tens of thousands) and you expect a low percentage
(single digit? is that too many?) to be None. Then try/except becomes
faster. If you think you're likely to be in this situation, it should
be pretty trivial to write some test cases to find the actual tradeoff
point.
[color=blue]
> And while I'm here: Is there a difference in performance when checking:
> datum is None
> over:
> datum == None[/color]

Generally prefer the former, but the difference is likely to be masked
by other factors.
[color=blue]
> and similarly:
> if x is None or y is None:
> or:
> if None in (x,y):[/color]

I've preferred the latter thinking it was less work on the
interpreter, under the general premise that the code for the "in"
operation was one swatch of C, while is / or / is was three different
swatches of C with "python internals" gluing them together. My
analysis is obviously pretty surface though.
[color=blue]
> I appreciate that these are trivial in the extreme, but I seem to be
> writing dozens of them, and I may as well use the right one and
> squeeze what performance I can.[/color]

Maybe I'm a heretic, but I think this is a healthy attitude to have.
If you can write it optimally the first time with no significant
increase in effort, then nobody's going to hafta come back and rewrite
it later: that's a big maintenance win.

--G.

--
Geoff Gerrietts <geoff at gerrietts net>
"A man can't be too careful in the choice of his enemies." --Oscar Wilde

**Peter Hansen** · Jul 18 '05, 03:57 AM

Re: Trivial performance questions

Geoff Gerrietts wrote:[color=blue]
>
> Maybe I'm a heretic, but I think this is a healthy attitude to have.
> If you can write it optimally the first time with no significant
> increase in effort, then nobody's going to hafta come back and rewrite
> it later: that's a big maintenance win.[/color]

Not unless you add Alex' constraint that the two alternatives under
consideration are equally readable. Otherwise the less readable one
is always going to cost you more at maintenance time. And I'd add
my own constraint that you actually have to *need* the speed. Otherwise
even the "insignific ant" increase in effort that it will cost you will
not be paying for itself.

http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast

Making it right means making it readable too. Optimization should
always come later, and not at all if you don't actually need it.

My group has invested almost thirty person-years writing Python code in
the last few years. To the best of my ability to recall, only two of
the tasks we've worked on in that time was directly related to
performance concerns and the resulting optimization for speed. Given
that the combined optimization efforts consumed perhaps a few weeks
of our time, we spend something like 0.4% of our time focusing on
performance. This seems to me a healthy amount.

(Curiously enough, when we coded more in C, I suspect we spent a
substantially larger amount of time caught up in performance issues.
This change is due merely to greater experience, not because of
the change in language, though the two are related.)

-Peter

**Geoff Gerrietts** · Jul 18 '05, 03:57 AM

Re: Trivial performance questions

Quoting Peter Hansen (peter@engcorp. com):[color=blue]
>
> Not unless you add Alex' constraint that the two alternatives under
> consideration are equally readable. Otherwise the less readable one
> is always going to cost you more at maintenance time.[/color]

Yes to your first sentence, not so sure to the second. The implication
is the code will always be touched, and my contention is that if you
don't pay at least trivial attention to writing something optimal --
includes avoiding geometric algorithms -- then you're significantly
increasing the amount of maintenance work necessary.

Example: pulling out list.sort(lambd a x, y: cmp(x[0],y[0])) and
putting in an abstract transform_sort is /only responsible/. The
list.sort(calla ble) idiom might be more readable to a novice -- it has
been to the novices I've worked with -- but its performance
implications on nontrivial lists are astonishing.
[color=blue]
> And I'd add my own constraint that you actually have to *need* the
> speed. Otherwise even the "insignific ant" increase in effort that
> it will cost you will not be paying for itself.[/color]

Capitalism has bred a real reliance on "good enough": when you hit
your payoff point, you don't go any farther. It's a useful metric to
apply, but a dangerous premise to base all your decisions on. "Good
enough" needs to be critically evaluated for both the short term and
the long term.

A half-million micro-optimizations may not pay for themselves
individually. But in the long term, when confronted with a total
system rewrite because the collected work can no longer perform
adequately, and standard optimization techniques have met with
diminishing returns, you're going to regret not having paid attention
the first time through, when you didn't hafta re-teach yourself what
the code is doing. The little bits where you're just /paying
attention/ to the performance implications of what you're doing
aggregate over time to reduce the maintenance overhead.
[color=blue]
> http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast[/color]

It's an interesting formulation but it stinks of propaganda to me.
When generic catchphrases are re-interpreted by almost every viewer
its a pretty fair bet they're not precise enough to be really useful.
The discussion on this page makes me think of Biblical scholars
debating the meaning of ambiguous passages.
[color=blue]
> Making it right means making it readable too. Optimization should
> always come later, and not at all if you don't actually need it.[/color]

I won't disagree with that.
[color=blue]
> My group has invested almost thirty person-years writing Python code in
> the last few years. To the best of my ability to recall, only two of
> the tasks we've worked on in that time was directly related to
> performance concerns and the resulting optimization for speed. Given
> that the combined optimization efforts consumed perhaps a few weeks
> of our time, we spend something like 0.4% of our time focusing on
> performance. This seems to me a healthy amount.[/color]

My group has invested probably something like 15 person-years writing
Python code in the last few years. We have probably put about one of
those person years into trying to account for performance bottlenecks.
Management is presently of the opinion that a drastic rewrite is the
only way to resolve the remaining issues. Perhaps the most distinct
difference between your group and mine is that many of our developers
are fairly novice, and prone to select solutions that are not
well-informed about performance issues and algorithm complexity. On
the other hand, maybe our code is just more heavily used?
[color=blue]
> (Curiously enough, when we coded more in C, I suspect we spent a
> substantially larger amount of time caught up in performance issues.
> This change is due merely to greater experience, not because of
> the change in language, though the two are related.)[/color]

Yes. Younger engineers tend to emphasize performance too much, because
it's a huge nebulous area that they don't understand, and which may
well bite them in the ass HARD. Older engineers can automatically
navigate through the most dangerous fields of landmines, and tend to
underemphasize performance too much, because the most important
aspects are habit and the less important aspects can be safely
ignored.

At first blush, I thought "maybe there's an equilibrium that needs to
be found". But I don't think so now. I think it's important for
younger (intermediate?) developers to be obsessed with performance, so
they can learn the dangers of bad algorithms, how to recognize them,
how to avoid them. And it's worth building good habits where you
choose an optimal idiom rather than a slower one.

You can disagree, but I've done a lot of reading and thinking on the
matter, in part because my experience and my beliefs have been at odds
in the past. Consequently, you're going to hafta try harder than
invoking the divine authority of Kent Beck (or even Knuth!) to
persuade me. Still, I can yet be persuaded; my mind is quite
tractable.

--G.

--
Geoff Gerrietts "I don't think it's immoral to want to
<geoff at gerrietts net> make money." -- Guido van Rossum

**Peter Hansen** · Jul 18 '05, 03:57 AM

Re: Trivial performance questions

Geoff Gerrietts wrote:[color=blue]
>
> Quoting Peter Hansen (peter@engcorp. com):[color=green]
> >
> > Not unless you add Alex' constraint that the two alternatives under
> > consideration are equally readable. Otherwise the less readable one
> > is always going to cost you more at maintenance time.[/color]
>
> Yes to your first sentence, not so sure to the second. The implication
> is the code will always be touched, and my contention is that if you
> don't pay at least trivial attention to writing something optimal --
> includes avoiding geometric algorithms -- then you're significantly
> increasing the amount of maintenance work necessary.[/color]

I won't disagree with most of that (we're rapidly reaching near total
agreement here! :-) but I do think that assuming "the code will always
be touched" is a very healthy attitude, in the same way you think that
at least trivial attention to performance is a healthy attitude.

We certainly have code that hasn't been touched during maintenance,
but nobody could have predicted which areas of the code that would be.
[color=blue]
> Capitalism has bred a real reliance on "good enough": when you hit
> your payoff point, you don't go any farther. It's a useful metric to
> apply, but a dangerous premise to base all your decisions on. "Good
> enough" needs to be critically evaluated for both the short term and
> the long term.[/color]

As an XP team, we tend to consider that critical evaluation to be
the domain of the customer, so we basically don't worry about it
until there is feedback that we're doing the wrong thing. This,
in cooperation with the customer, makes the best use of the our
resources (for which the customer is paying, in effect). But,
yeah, that's just the XP view of things.
[color=blue]
> A half-million micro-optimizations may not pay for themselves[/color]

Phew! I seriously hope your group hasn't examined that many
pieces of code with performance concerns in mind! We don't have
even that many lines of code, let alone areas that could be
micro-optimized.
[color=blue]
> individually. But in the long term, when confronted with a total
> system rewrite because the collected work can no longer perform
> adequately, and standard optimization techniques have met with
> diminishing returns, you're going to regret not having paid attention
> the first time through,[/color]

There's some truth in that, but I can't shake the nagging feeling
that simply by using Python, we've moved into a realm where the
best way to optimize a serious problem area is to rewrite in C
or Pyrex, or get a faster processor. (Like you, I can be
persuaded, but this is what _my_ experience has taught me.)
[color=blue][color=green]
> > http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast[/color]
>
> It's an interesting formulation but it stinks of propaganda to me.
> When generic catchphrases are re-interpreted by almost every viewer
> its a pretty fair bet they're not precise enough to be really useful.
> The discussion on this page makes me think of Biblical scholars
> debating the meaning of ambiguous passages.[/color]

Actually, it's probably just that re-interpretation and discussion
which proves so very useful, not the phrase itself. Like a Zen
koan or something, it's too short (or ambiguous) to have direct,
hard meaning, but the meme it carries is a valuable one with which
to be infected. ;-)

The same probably holds true about ambiguous biblical passages,
I hate to admit.
[color=blue]
> My group has invested probably something like 15 person-years writing
> Python code in the last few years. We have probably put about one of
> those person years into trying to account for performance bottlenecks.
> Management is presently of the opinion that a drastic rewrite is the
> only way to resolve the remaining issues. Perhaps the most distinct
> difference between your group and mine is that many of our developers
> are fairly novice, and prone to select solutions that are not
> well-informed about performance issues and algorithm complexity. On
> the other hand, maybe our code is just more heavily used?[/color]

I'd vote for the latter. My group has been heavily junior in flavour.
Perhaps another cause of the difference is our greater (?) emphasis
on XP and test-driven development? I doubt anyone could say, but
for sure your code is more heavily used. I don't even need to know
what it does to say that. :-)

Maybe one example: we used += with strings a lot in the early days.
Partly junior developers, a greater part due to inexperience with
Python. I think only one or two bits of our code has been re-written
to use [].append() and ''.join() instead, because only those bits
came to the fore when performance was an issue. The rest is still
merrily chewing up CPU time doing wasteful += on strings, but nobody
cares. We refactor that (for consistency, mainly, I think) when we
get to them for other reasons, and new code probably doesn't use +=
so much, but that's about the extent of it.
[color=blue]
> At first blush, I thought "maybe there's an equilibrium that needs to
> be found". But I don't think so now. I think it's important for
> younger (intermediate?) developers to be obsessed with performance, so
> they can learn the dangers of bad algorithms, how to recognize them,
> how to avoid them. And it's worth building good habits where you
> choose an optimal idiom rather than a slower one.[/color]

I would agree that new developers would benefit from that kind of
experience. One of the few reasons why a (good) university or
college education can be of value to a programmer. So can critical
reading of some decent books or web pages on the topic.
[color=blue]
> You can disagree, but I've done a lot of reading and thinking on the
> matter, in part because my experience and my beliefs have been at odds
> in the past. Consequently, you're going to hafta try harder than
> invoking the divine authority of Kent Beck (or even Knuth!) to
> persuade me. Still, I can yet be persuaded; my mind is quite
> tractable.[/color]

I think Kent is merely on a par with the Pope, but is not Himself
divine. ;-) Knuth is another story, perhaps. :-)

-Peter

**Paul Rubin** · Jul 18 '05, 03:57 AM

Re: Trivial performance questions

Peter Hansen <peter@engcorp. com> writes:[color=blue][color=green]
> > individually. But in the long term, when confronted with a total
> > system rewrite because the collected work can no longer perform
> > adequately, and standard optimization techniques have met with
> > diminishing returns, you're going to regret not having paid attention
> > the first time through,[/color]
>
> There's some truth in that, but I can't shake the nagging feeling
> that simply by using Python, we've moved into a realm where the
> best way to optimize a serious problem area is to rewrite in C
> or Pyrex, or get a faster processor. (Like you, I can be
> persuaded, but this is what _my_ experience has taught me.)[/color]

That's not always either feasible or desirable. For example, I once
worked on the user interface of an ATM switch. It had to display a
connection list in sorted order, when they were stored in memory in
random order. It did this by finding the smallest numbered
connection, then the next smallest, etc., an O(N**2) algorithm which
worked fine when the switch was originally designed and could handle
no more than 16 connections or something like that, but which ate a
lot of not-too-plentiful embedded cpu time when hardware enhancements
made hundreds of connections possible. OK, you say, rip out that
algorithm and put in a better one. The problem is that the "sorting"
code was intimately intermixed with the selection code which banged on
the hardware registers and dealt with all kinds of fault conditions,
and the display code, which was knee deep in formatting cruft, and had
grown like a jungle over years of maintenance as new releases of the
hardware kept sprouting new features. In short it was typical
embedded code written by electrical (i.e. hardware) engineers who,
while they were not stupid people, just didn't have much understanding
of software technology or methodology. We are not talking about some
three-line loop like

concat = ''
for s in stringlist:
concat += s

that can be rewritten into a ''.join call. This UI module was 5000 or
so lines of extremely crufty code and there was no way to fix it
without a total rewrite. And a total rewrite couldn't ever be
scheduled, because there were always too many fires to put out in the
product. The module therefore got worse and worse. So that's a
real-world example of where a little bit more up-front design caution
would have saved an incredible amount of headache for years to come.

And sure, there are all kinds of methodological platitudes about how
to stop that situation from happening, but they are based on wishful
thinking. They just do not always fit the real-world constraints that
real projects find imposed on them (e.g. that a complicated hardware
product is staffed mostly by hardware engineers, who bang out "grunt"
code without too much sense of how to organize large programs). All
you can do is recognize that you have a little bit of programming
sophistication available, and try to maximize your leverage in
applying it where it makes the most difference. Regardless of what
one thinks of C++, reading Stroustrup's C++ book after being through
experiences like the above makes it clear Stroustrup had had similar
experiences. It's visible in his book, how various design choices of
C++ were motivated by the tensions inherent in those experiences.

**Geoff Gerrietts** · Jul 18 '05, 03:58 AM

Re: Trivial performance questions

Quoting Peter Hansen (peter@engcorp. com):[color=blue]
>
> I won't disagree with most of that (we're rapidly reaching near total
> agreement here! :-) but I do think that assuming "the code will always
> be touched" is a very healthy attitude, in the same way you think that
> at least trivial attention to performance is a healthy attitude.[/color]

Yes, I think we're pretty close to in accord here.
[color=blue]
> As an XP team, we tend to consider that critical evaluation to be
> the domain of the customer, so we basically don't worry about it
> until there is feedback that we're doing the wrong thing. This,
> in cooperation with the customer, makes the best use of the our
> resources (for which the customer is paying, in effect). But,
> yeah, that's just the XP view of things.[/color]

And I'm working from the perspective of an internal customer. But I
also think that with an external customer; special care ought to be
paid to those pieces of software that you don't plan to live
exclusively inside the project.
[color=blue][color=green]
> > A half-million micro-optimizations may not pay for themselves[/color]
>
> Phew! I seriously hope your group hasn't examined that many
> pieces of code with performance concerns in mind! We don't have
> even that many lines of code, let alone areas that could be
> micro-optimized.[/color]

....well, no, we haven't. But we are approaching that many lines of
code. And a good deal of it is naive code, none of which we will be
able to reclaim the lost performance from without more profound reason
to refactor. Some of it we probably should, but it's a challenge to
effectively profile our code.
[color=blue]
> There's some truth in that, but I can't shake the nagging feeling
> that simply by using Python, we've moved into a realm where the
> best way to optimize a serious problem area is to rewrite in C
> or Pyrex, or get a faster processor. (Like you, I can be
> persuaded, but this is what _my_ experience has taught me.)[/color]

Probably some truth in that, too.
[color=blue]
> Actually, it's probably just that re-interpretation and discussion
> which proves so very useful, not the phrase itself. Like a Zen
> koan or something, it's too short (or ambiguous) to have direct,
> hard meaning, but the meme it carries is a valuable one with which
> to be infected. ;-)
>
> The same probably holds true about ambiguous biblical passages,
> I hate to admit.[/color]

There's an ambiguous koan-like meme that I like to break out now and
again -- I think it's due to Robert Anton Wilson but the years have
not been kind to my respect for authority:

Any proposition is true in some way, false in some way, and in some
way not pertinent to the matter at hand at all.

Spend enough time with the meme and it justifies both sides of the
discussion.
[color=blue]
> I'd vote for the latter. My group has been heavily junior in
> flavour. Perhaps another cause of the difference is our greater (?)
> emphasis on XP and test-driven development? I doubt anyone could
> say, but for sure your code is more heavily used. I don't even need
> to know what it does to say that. :-)[/color]

I'll believe you. :) We've scaled up to the point where we're happy
but bursting at the seams.
[color=blue]
> I would agree that new developers would benefit from that kind of
> experience. One of the few reasons why a (good) university or
> college education can be of value to a programmer. So can critical
> reading of some decent books or web pages on the topic.[/color]

Yes. It's something of a rite of passage, in some ways. And maybe the
right way to respond to optimization questions is "focus on
algorithms, and learn which built-in constructs use lousy ones". I'm
not sure, but I find "thinking about optimization before your
processors melt is premature" to be more than a little disingenuous.
[color=blue]
> I think Kent is merely on a par with the Pope, but is not Himself
> divine. ;-) Knuth is another story, perhaps. :-)[/color]

Great minds, but human -- all too human. ;)

--G.

--
Geoff Gerrietts <geoff at gerrietts net> http://www.gerrietts.net/
"Now, now my good man, this is no time for making enemies."
--Voltaire, on his deathbed, when asked to renounce Satan

Trivial performance questions

Trivial performance questions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment