reduce() anomaly?

**Alex Martelli** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

Andrew Dalke wrote:
...[color=blue]
> The other three are in csv.py and look something like
> quotechar = reduce(lambda a, b, quotes = quotes:
> (quotes[a] > quotes[b]) and a or b,
> quotes.keys())
> and are identical in intent to your use case -- select the
> object from a list which maximizes some f(obj).[/color]

I have already suggested, in a post on this thread's direct ancestor 9 days
ago, the 100%-equivalent substitution in pure, simple, faster Python:

quotechar = None
for k in quotes:
if not quotechar or quotes[k]>quotes[quotechar]:
quotechar = k

or, for those who consider fast, simple, obviously correct code too boring,

quotechar = max([ (v,k) for k,v in quotes.iteritem s() ])[-1]

which is more concise and at least as clever as the original.

[color=blue]
> This suggests the usefulness of a new function or two,
> perhaps in itertools, which applies a function to a list[/color]

Nope -- itertools is not about CONSUMERS of iterators, which this one would
be. All itertools entries RETURN iterators.
[color=blue]
> of values and returns the first object which has the
> maximum value, as in[/color]

Given that the sort method of lists now has an optional key= argument, I
think the obvious approach would be to add the same optional argument to min
and max with exactly the same semantics. I.e., just as today:

x = max(somelist)
somelist.sort()
assert x == somelist[-1]

we'd also have

x = max(somelist, key=whatever)
somelist.sort(k ey=whatever)
assert x == somelist[-1]

a perfectly natural extension, it seems to me. I've found such total and
extreme opposition to this perfectly natural extension in private
correspondence about it with another Python committer that I've so far
delayed proposing it on python-dev -- for reasons that escape me it would
appear to be highly controversial rather than perfectly obvious.
[color=blue]
> def longest(seq):
> return imax(seq, len)[/color]

That would be max(seq, key=len) in my proposal.
[color=blue]
> lines. There were very few, and the paucity suggests
> that 'sum' isn't needed all that often. Then again, I'm
> not one who suggested that that be a builtin function ;)[/color]

Right, that was my own responsibility. I did identify about
10 spots in the standard library then (including substitutions
for reduce); that's more than the typical built-in has, even though
the tasks handled by the standard library are heavily slanted
to string and text processing, networking &c, and (of course)
"pure infrastructure" , rather than typical application tasks.

Alex

**Andrew Dalke** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

Alex:[color=blue]
> I have already suggested, in a post on this thread's direct ancestor 9[/color]
days[color=blue]
> ago, the 100%-equivalent substitution in pure, simple, faster Python:[/color]

I've only been skimming this thread. Now upon reread I see you also
looked at the standard library. No fair -- I'm supposed to be the only
one who does that! :)
[color=blue]
> Nope -- itertools is not about CONSUMERS of iterators, which this one[/color]
would[color=blue]
> be. All itertools entries RETURN iterators.[/color]

Yup. There's no good, existing, natural place for such a function, that
I can tell.
[color=blue]
> Given that the sort method of lists now has an optional key= argument, I
> think the obvious approach would be to add the same optional argument to[/color]
min[color=blue]
> and max with exactly the same semantics. I.e., just as today:[/color]

That follows Moshe's comment that a "Decorate Max Undecorate",
in parallel to Decorate Sort Undecorate. Nice.
[color=blue]
> we'd also have
>
> x = max(somelist, key=whatever)
> somelist.sort(k ey=whatever)
> assert x == somelist[-1][/color]

Is 'key' also an iterable? ...
[color=blue]
> That would be max(seq, key=len) in my proposal.[/color]

Ahh, it's the function used to get the keys. What about 'getkey'
as a name?

I looked through the back thread but didn't see your function
definition. Is it backwards compatible to the existing max
function?... You know, I never realized how strange max was
in the first place, so that
max(1, 2)
and
max( (1, 2) )
are both valid. It must assume that if there's only one argument
then the latter form is appropriate. Then your form must say
that if there's only one non-keyword argument then it's of
the form

max(seq, **args)

But
max(a, b, key=len)
seems reasonable as well. Then again, so does
max(seq, len)
[color=blue]
> a perfectly natural extension, it seems to me. I've found such total and
> extreme opposition to this perfectly natural extension in private
> correspondence about it with another Python committer that I've so far
> delayed proposing it on python-dev -- for reasons that escape me it would
> appear to be highly controversial rather than perfectly obvious.[/color]

My guess is that reaction is based on the need to have both forms
max(a, b)
and
max(seq)
and allow a key function to be passed in, and make it feel
natural without running into problems when the two might be
mixed up.
[color=blue]
> even though
> the tasks handled by the standard library are heavily slanted
> to string and text processing, networking &c, and (of course)
> "pure infrastructure" , rather than typical application tasks.[/color]

Yeah, I worry about that bias error when I do my analysis.
I really should also scan some of the other (proprietary) code
bases I have access to, but then it leads to claims of being
irreproducible. Besides, despite being a scientific programmer,
I mostly do "string and text processing, networking &c."

Andrew
dalke@dalkescie ntific.com

**Bengt Richter** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

On Sun, 16 Nov 2003 15:15:45 GMT, Alex Martelli <aleaxit@yahoo. com> wrote:
[color=blue]
>Andrew Dalke wrote:
> ...[color=green]
>> The other three are in csv.py and look something like
>> quotechar = reduce(lambda a, b, quotes = quotes:
>> (quotes[a] > quotes[b]) and a or b,
>> quotes.keys())
>> and are identical in intent to your use case -- select the
>> object from a list which maximizes some f(obj).[/color][/color]
IMO the use cases in csv.py derive from a particular approach to a particular
problem, and don't mean much beyond that, except as a trigger for ideas. I.e., any
ideas have to stand on their own, without reference to the csv use cases.
(BTW I'm guessing one could do better with "for q in data.split(cand idate_quote): ..." and
looking at q[0] and q[-1] and maybe q.strip()[0] and q.strip()[-1] as appropriate
to gather the statistics for the two candidate quotes. This would also let you check
for escaped quotes. But this is another rabbit trail here ;-)[color=blue]
>
>I have already suggested, in a post on this thread's direct ancestor 9 days
>ago, the 100%-equivalent substitution in pure, simple, faster Python:
>
>quotechar = None
>for k in quotes:
> if not quotechar or quotes[k]>quotes[quotechar]:
> quotechar = k
>
>or, for those who consider fast, simple, obviously correct code too boring,
>
>quotechar = max([ (v,k) for k,v in quotes.iteritem s() ])[-1]
>
>which is more concise and at least as clever as the original.
>
>[color=green]
>> This suggests the usefulness of a new function or two,
>> perhaps in itertools, which applies a function to a list[/color]
>
>Nope -- itertools is not about CONSUMERS of iterators, which this one would
>be. All itertools entries RETURN iterators.[/color]
Ok, but what about returning an iterator -- e.g., funumerate(f, seq) -- that supplies f(x),x pairs
like enumerate(seq) supplies i,x?

[I'd suggest extending enumerate, but I already want to pass optional range parameters there,
so one could control the numeric values returned, e.g., enumerate(seq,< params>) corresponding to
zip(xrange(<par ams>),seq))]. [BTW, should xrange() default to xrange(0,sys.ma xint,1)?]
[color=blue]
>[color=green]
>> of values and returns the first object which has the
>> maximum value, as in[/color]
>
>Given that the sort method of lists now has an optional key= argument, I[/color]
This is a new one on me:[color=blue][color=green][color=darkred]
>>> seq.sort(key=la mbda x:x)[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sort() takes no keyword arguments

Do you mean the comparison function? Or is there something else now too?
I'm beginning to infer that key= is actually a keyword arg for a _function_
to get a "key" value from a composite object (in which case ISTM "getkeyvalu e" or "valuefunc"
would be a better name). But IMO "key" suggests it will be used on elements x like x[key],
not passing a definition key=whatever and then using key(x) to get values.
[color=blue]
>think the obvious approach would be to add the same optional argument to min
>and max with exactly the same semantics. I.e., just as today:
>
>x = max(somelist)
>somelist.sort( )
>assert x == somelist[-1]
>
>we'd also have
>
>x = max(somelist, key=whatever)
>somelist.sort( key=whatever)
>assert x == somelist[-1][/color]
I think I like it, other than the name. Maybe s/key/valuefunc/ ;-)
[color=blue]
>
>a perfectly natural extension, it seems to me. I've found such total and
>extreme opposition to this perfectly natural extension in private
>corresponden ce about it with another Python committer that I've so far
>delayed proposing it on python-dev -- for reasons that escape me it would
>appear to be highly controversial rather than perfectly obvious.
>[color=green]
>> def longest(seq):
>> return imax(seq, len)[/color]
>
>That would be max(seq, key=len) in my proposal.[/color]

That's a nice option for max (and min, and ??), but ISTM that it would
also be nice to have a factory for efficient iterators of this kind.
It would probably be pretty efficient then to write

maxlen, maxitem = max(funumerate( len,seq))

or

def longest(seq):
return max(funumerate( len,seq))[-1]

and it would be available as infrastructure for other efficient loops in
addition to being tied in to specific sequence processors like max and min.

Of course we can roll our own:
[color=blue][color=green][color=darkred]
>>> def funumerate(fun, seq):[/color][/color][/color]
... for x in seq: yield fun(x),x
...[color=blue][color=green][color=darkred]
>>> seq = 'ab cde f gh ijk'.split()
>>> seq[/color][/color][/color]
['ab', 'cde', 'f', 'gh', 'ijk'][color=blue][color=green][color=darkred]
>>> max(funumerate( len,seq))[/color][/color][/color]
(3, 'ijk')[color=blue][color=green][color=darkred]
>>> min(funumerate( len,seq))[/color][/color][/color]
(1, 'f')[color=blue][color=green][color=darkred]
>>> max(funumerate( len,seq))[-1][/color][/color][/color]
'ijk'[color=blue][color=green][color=darkred]
>>> longest(seq)[/color][/color][/color]
'ijk'[color=blue]
>[color=green]
>> lines. There were very few, and the paucity suggests
>> that 'sum' isn't needed all that often. Then again, I'm
>> not one who suggested that that be a builtin function ;)[/color]
>
>Right, that was my own responsibility. I did identify about
>10 spots in the standard library then (including substitutions
>for reduce); that's more than the typical built-in has, even though
>the tasks handled by the standard library are heavily slanted
>to string and text processing, networking &c, and (of course)
>"pure infrastructure" , rather than typical application tasks.
>[/color]

Regards,
Bengt Richter

**Andrew Dalke** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

Alex:[color=blue][color=green]
> >Given that the sort method of lists now has an optional key= argument, I[/color][/color]

Bengt Richter[color=blue]
> This is a new one on me:[color=green][color=darkred]
> >>> seq.sort(key=la mbda x:x)[/color][/color]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: sort() takes no keyword arguments
> Do you mean the comparison function? Or is there something else now too?[/color]

Quoting from Brett Cannon's most excellent
python-dev Summary for 2003-10-01 through 2003-10-15

=============== =============== ===============
Decorate-sort-undecorate eye for the list.sort guy
--------------------------------------------------
Raymond Hettinger suggested adding built-in support for the
decorate-sort-undecorate (DSU) sorting idiom to list.sort (see the
Python Cookbook recipe at
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52234 which is
recipe 2.3 in the dead tree version or Tim Peters' intro to chapter 2
for details on the idiom). After a long discussion on the technical
merits of various ways to do this, list.sort gained the keyword
arguments 'key' and 'reverse'.

'key' takes in a function that accepts one argument and returns what the
item should be sorted based on. So running ``[(0,0), (0,2),
(0,1)].sort(key=lambd a x: x[1])`` will sort the list based on the second
item in each tuple. Technically the sort algorithm just runs the item
it is currently looking at through the function and then handles the
sorting. This avoids having to actually allocate another list. If
'key' and 'cmp' are both provided then 'key' is called on the items to
be sorted and the function's return values are then passed to 'cmp'.

'reverse' does what it sounds like based on whether its argument is true
or false.

list.sort also became guaranteed to be stable (this include 'reverse').

A discussion of whether list.sort should return self came up and was
*very* quickly squashed by Guido. The idea of having a second method,
though, that did sort and returned a copy of the sorted list is still
being considered.

Contributing threads:
`decorate-sort-undecorate
<http://mail.python.org/pipermail/python-dev/2003-October/038652.html>`
`list.sort
<http://mail.python.org/pipermail/python-dev/2003-October/038772.html>`
=============== =============== ===============
[color=blue]
> I'm beginning to infer that key= is actually a keyword arg for a[/color]
_function_[color=blue]
> to get a "key" value from a composite object (in which case ISTM[/color]
"getkeyvalu e" or "valuefunc"[color=blue]
> would be a better name). But IMO "key" suggests it will be used on[/color]
elements x like x[key],[color=blue]
> not passing a definition key=whatever and then using key(x) to get values.[/color]

I concur. (Ooops, should have said "Me too!" :)

[color=blue][color=green]
> >That would be max(seq, key=len) in my proposal.[/color]
>
> That's a nice option for max (and min, and ??), but ISTM that it would
> also be nice to have a factory for efficient iterators of this kind.
> It would probably be pretty efficient then to write
>
> maxlen, maxitem = max(funumerate( len,seq))[/color]

With generator expressions this is

maxlen, maxitem = max( ((len(x), x) for x in seq) )

It still has the (slight) problem that it assumes x is comparable
when there are multiple items with the same length. Nearly
all of these quicky versions make that assumption. The
quicky will-work-for-any-item version would look more like

maxlen, pos, maxitem = max( ((len(x), i, x) for i, x in enumerate(seq)) )

Andrew
dalke@dalkescie ntific.com

**Paul Rubin** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

"Andrew Dalke" <adalke@mindspr ing.com> writes:[color=blue]
> list.sort also became guaranteed to be stable (this include 'reverse').[/color]

I don't see the reason for that. It seems like a needless restriction
on implementers.
[color=blue]
> A discussion of whether list.sort should return self came up and was
> *very* quickly squashed by Guido. The idea of having a second method,
> though, that did sort and returned a copy of the sorted list is still
> being considered.[/color]

Changing the behavior of list.sort would be really bad, but I
certainly favor adding a new method (maybe list.nsort). There should
also be list.nuniq which takes a sorted list and strips duplicate
elements.

**David Eppstein** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

In article <KLRtb.2585$sb4 .1651@newsread2 .news.pas.earth link.net>,
"Andrew Dalke" <adalke@mindspr ing.com> wrote:
[color=blue][color=green][color=darkred]
> > >That would be max(seq, key=len) in my proposal.[/color]
> >
> > That's a nice option for max (and min, and ??), but ISTM that it would
> > also be nice to have a factory for efficient iterators of this kind.
> > It would probably be pretty efficient then to write
> >
> > maxlen, maxitem = max(funumerate( len,seq))[/color]
>
> With generator expressions this is
>
> maxlen, maxitem = max( ((len(x), x) for x in seq) )
>
> It still has the (slight) problem that it assumes x is comparable
> when there are multiple items with the same length. Nearly
> all of these quicky versions make that assumption. The
> quicky will-work-for-any-item version would look more like
>
> maxlen, pos, maxitem = max( ((len(x), i, x) for i, x in enumerate(seq)) )[/color]

The new sort(key=...) option works even when the underlying objects are
incomparable. I'd expect the same of max(key=...)

So (supposing such a change ever happens) you could instead write
maxitem = max(seq, key=len)
maxlen = len(maxitem)

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

**Terry Reedy** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

"Paul Rubin" <http://phr.cx@NOSPAM.i nvalid> wrote in message
news:7xoevcf1u2 .fsf@ruckus.bro uhaha.com...[color=blue]
> "Andrew Dalke" <adalke@mindspr ing.com> writes:[color=green]
> > list.sort also became guaranteed to be stable (this include[/color][/color]
'reverse').[color=blue]
>
> I don't see the reason for that. It seems like a needless[/color]
restriction[color=blue]
> on implementers.[/color]

Which is why Guido long resisted making such a guarantee, in spite of
many requests. (The previous list.sort was not stable.) However,
timsort is stable, tested for about a year, and so fast for lists both
random and with various types of order, and so close to optimal in
terms of number of comparisons, that Guido was willing to 'annoint' it
as list.sort for the indefinite future. If there were a generally
useful non-stable sort discovered proposed in the future, is could be
added under a different name.

Terry J. Reedy

**Paul Rubin** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

"Terry Reedy" <tjreedy@udel.e du> writes:[color=blue]
> Which is why Guido long resisted making such a guarantee, in spite of
> many requests. (The previous list.sort was not stable.) However,
> timsort is stable, tested for about a year, and so fast for lists both
> random and with various types of order, and so close to optimal in
> terms of number of comparisons, that Guido was willing to 'annoint' it
> as list.sort for the indefinite future. If there were a generally
> useful non-stable sort discovered proposed in the future, is could be
> added under a different name.[/color]

I think that decision is a little too CPython-centric. Some other
Python implementation (maybe even Jython) might like to implement the
list.sort method by calling some other sort routine (like the one
already in the library for that language) rather than by porting
timsort, even if timsort is better. Also, someone might want to
implement an indexable class that isn't a list (maybe it's a disk file
or something) and want to give it a sort method that cannot work by
sucking the keys into memory and calling timsort (maybe there are too
many keys to fit in memory, so external methods must be used). It may
turn out to work best to use some nonstable sorting method, but then
the behavior will be inconsistent with list.sort and that makes more
cruft that the application programmer has to remember.

Most sorting applications don't care about stability. I think if a
new sorting method is going to get added so there's separate methods
for guaranteed-stable and possibly-nonstable sorting, it's best to let
the new method be the stable one (maybe list.ssort) and leave the
existing one alone.

Of course then you want variants that return self instead of None, so
you have list.sort, list.ssort, list.nsort, and list.nssort. It gets
out of hand.

Maybe the best way to do this kind of thing is with keyword arguments
rather than new methods.

**Andrew Dalke** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

Paul Rubin:[color=blue]
> "Andrew Dalke" <adalke@mindspr ing.com> writes:[/color]
[actually, I was quoting from the python-dev summary of Brett C's.

Paul:[color=blue]
> Changing the behavior of list.sort would be really bad, but I
> certainly favor adding a new method (maybe list.nsort). There should
> also be list.nuniq which takes a sorted list and strips duplicate
> elements.[/color]

Do you want unix-style unique where repeats are merged into
one, so that 1 2 2 1 -> 1 2 1 or do you want it to return
items which only occur once, and you don't care about the
order (as in dict.from_keys([1, 2, 2, 1]).keys() which can
return [1, 2] or [2, 1]) or do you want the order of the keys
in the final list to maintain the same order as the initial list,
so that [2, 1, 1, 2] -> [2, 1] always?

Andrew
dalke@dalkescie ntific.com

**Andrew Dalke** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

Paul Rubin:[color=blue]
> I think that decision is a little too CPython-centric. Some other
> Python implementation (maybe even Jython) might like to implement the
> list.sort method by calling some other sort routine (like the one
> already in the library for that language)[/color]

That's not a problem with Jython, since Java has a stable sort, see

http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java

..util.List)

There was a OCaml implemenation of a Python variant, called
Vyper. OCaml has a stable sort

404 Not Found

http://caml.inria.fr/devtools/doc_ocaml_classic/Array.html

C++'s STL include a stable_sort

Page not found | CodeGuru

http://www.codeguru.com/cpp/stlguide/stable_sort.shtml

C# doesn't appear to have a stable sort.

WWPyPyD? (What Would PyPy do?) I don't know.
[color=blue]
> Also, someone might want to
> implement an indexable class that isn't a list (maybe it's a disk file
> or something) and want to give it a sort method that cannot work by
> sucking the keys into memory and calling timsort (maybe there are too
> many keys to fit in memory, so external methods must be used).[/color]

There are a couple of misconceptions here:

- sort is a method on lists. It only works on lists. Unlike STL,
which distinguishes between a container and an algorithm, there
is no way to apply sort directly to anything which isn't a list. Your
case (using sort on "a disk file or something") cannot occur.

- All the keys must fit in memory. This is a consequence of being
a list. (The keys may depend on other resource to do the
comparison.) C++ is different in this regard as well.

- The sort only rearranges references so there's no way to use
sort to directly sort the values on disk as you could in C++.
[color=blue]
> It may
> turn out to work best to use some nonstable sorting method, but then
> the behavior will be inconsistent with list.sort and that makes more
> cruft that the application programmer has to remember.[/color]

It may have, but now with the restriction on what the sort must do
there's been a redefinition of what 'best' means for Python. No
implementation of Python is now allowed to do so, so the application
programmer won't have to remember it. All it does it make it
slightly harder for a C# programmer to write a Python implementation.
And I assume there's plenty of 3rd party libraries which can help out.
[color=blue]
> Most sorting applications don't care about stability.[/color]

Really? I prefer my sorts to be stable since it best agrees
with what a person would do, assuming infinite patience.
It's nice because it's well defined and platform independent --
which is important because I've seen bugs crop up in some
programs which assumed a sort (3C) under IRIX will give
the same order as under Solaris -- C's sort is not stable.
[color=blue]
> I think if a
> new sorting method is going to get added so there's separate methods
> for guaranteed-stable and possibly-nonstable sorting, it's best to let
> the new method be the stable one (maybe list.ssort) and leave the
> existing one alone.[/color]

Humbug. You want an extra method (and possibly a couple of
independent algorithms which can be used on your disk-based
but list-like data structures) which for the most widely used version
of Python (CPython) will not do anything different and for which
the second most widely used version (Jython) is a trivial change,
all so that *someday* someone who implements a Python in a
language which doesn't have a fast stable search doesn't have
to write one (copying out of any standard text book), find a
3rd party version, or translate one?

Why not worry about something more complex, like regular
expressions. CPython and Jython almost certainly have
differences in edge cases, and what will the version of Python
written in Prolog use?

Andrew
dalke@dalkescie ntific.com

**Magnus Lie Hetland** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

In article <61Ztb.3481$sb4 .1708@newsread2 .news.pas.earth link.net>,
Andrew Dalke wrote:
[snip][color=blue]
> - sort is a method on lists. It only works on lists. Unlike STL,
> which distinguishes between a container and an algorithm, there
> is no way to apply sort directly to anything which isn't a list. Your
> case (using sort on "a disk file or something") cannot occur.[/color]

Surely he was talking about implementing "list-alikes"...? I.e.
sequences polymorphically equivalent to lists, or at least wrt.
sequence-ness and sorting...? The stability requirement does make this
sort of thing a tad more difficult. (Not that I find it very
problematic; I'm just trying to interpret the meaning of the post.)

--
Magnus Lie Hetland "In this house we obey the laws of
http://hetland.org thermodynamics! " Homer Simpson

**SUZUKI Hisao** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

All in all, I agree with you. You have pointed out precisely
what has been making Python more attractive than, say, Perl.

In <lc65ho5g0n.fsf @gaffa.mit.edu> , you, Douglas Alan, wrote:[color=blue]
> The argument that some programmers might be too lazy to want to learn
> powerful, simple, and elegant features that can be taught in seconds,
> is no good reason to remove such features from Python and bloat Python
> by replacing them with a plethora of less powerful, less elegant
> features.[/color]

In <lcad73ye93.fsf @gaffa.mit.edu> , you also wrote:[color=blue]
> I like Python because it is a very simple, generic, easy-to-learn
> programming language, where I didn't have to learn much of anything
> new. It was just like all the other myriad of programming
> languages I have programmed in, only less cluttered than most, and
> "light-weight" in its implementation and accessibility.[/color]

So do I.

As to sum(), when learning string addition (concatenation) ,
one may wonder why sum() does not handle it:
[color=blue][color=green][color=darkred]
>>> sum(['a', 'b', 'c'])[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'

while general reduce() does it just as _expected_:
[color=blue][color=green][color=darkred]
>>> reduce(str.__ad d__, ['a', 'b', 'c'])[/color][/color][/color]
'abc'

It may be sum() that is more difficult to learn...

For this particular problem, it is better to use
''.join(['a', 'b', 'c']), you know.
However, it is important for Python to have an easy and generic
way to do things. If we have to read the manual through to solve
anything, what is the point to use Python instead of Perl (or Ruby,
to some extent)?
[color=blue]
> I having nothing against learning new stuff, by the way, but when I
> want to learn new stuff, I'm not particularly interested in the
> idiosyncrasies of Python -- I'd spend my effort learning something a
> bit more exotic, like OCaml, or Haskell, that might stretch my brain a
> bit. Learning a bunch of idiosyncratic detail is just tedious. This
> is one of the most important reasons that Perl sucks so badly.[/color]

Again in <lc65ho5g0n.fsf @gaffa.mit.edu>[color=blue]
> And as I have pointed out, it goes against the principle of simplicity
> and expressiveness to remove an easy to use and easy to learn simple
> and powerful feature with a slew of specific, tailored features. If
> reduce() can be relegated to a library or for the user to implement
> for himself, then so can sum(). If the language is to only have one,
> it should be reduce().[/color]

I agree with you.

However, It may be better to give reduce() some nice notation.
Someone (sorry, but I do not remember) suggested such one:
[s: s+c for c in ['a', 'b', 'c']]
or
[s='': s+c for c in ['a', 'b', 'c']]
though it may not be the nicest.

-- SUZUKI Hisao

**Andrew Dalke** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

Magnus Lie Hetland:[color=blue]
> Surely he was talking about implementing "list-alikes"...?[/color]

Yes, I think you're right about that, and I misinterpreted
his statement.

That being the case, an alternative is to have that 'sort'
implements the Python required semantics, and that
'fsort' or somesuch ('f' for 'fast'?) implement the
appropriate data-structure specific one.

Then again, is the change that "all Python list-alikes must
implement stable sort" or that "Python native lists shall
now and forever implement stable sort"?

The official statement from the BDFL is

[Python-Dev] decorate-sort-undecorate

http://mail.python.org/pipermail/python-dev/2003-October/038773.html

] OK, I pronounce on this: Python's list.sort() shall be stable.

That's a statement only that list.sort shall be stable and
not that all .sort() methods must be stable.

BTW, there was a *HUGH* amount of discussion about
sort and stability and keys and DSU on the python-dev list.
See the archive

The Python-Dev October 2003 Archive by thread

http://mail.python.org/pipermail/python-dev/2003-October/thread.html#38772

and look for "decorate-sort-undecorate" as well as subjects
with the word 'sort' in them.

Andrew
dalke@dalkescie ntific.com

**Ville Vainio** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

SUZUKI Hisao <suzuki611@oki. com> writes:
[color=blue]
> However, It may be better to give reduce() some nice notation.[/color]

I think the idea is to hide the thing somewhere far away from
builtins, not to *increase* its prominence in the language by
introducing new syntax.

--
Ville Vainio http://www.students.tut.fi/~vainio24

**Alex Martelli** · Jul 18 '05, 05:38 AM

Re: Python's simplicity philosophy

SUZUKI Hisao wrote:
...[color=blue]
> As to sum(), when learning string addition (concatenation) ,
> one may wonder why sum() does not handle it:[/color]

I originally had sum try to detect the "summing strings" case and
delegate under the covers to ''.join -- Guido vetoed that as too
"clever" (which has a BAD connotation in Python) and had me forbid
the "summing strings" case instead, for simplicity.
[color=blue][color=green][color=darkred]
> >>> sum(['a', 'b', 'c'])[/color][/color]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: unsupported operand type(s) for +: 'int' and 'str'[/color]

whereupon, one hopes, the user checks sum's doc:
[color=blue][color=green][color=darkred]
>>> print sum.__doc__[/color][/color][/color]
sum(sequence, start=0) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start'. When the sequence is empty, returns start.[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]

and if one tries anyway:
[color=blue][color=green][color=darkred]
>>> sum(['a','b','c'],'')[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]

the error message should direct the user to the proper way to sum
strings. Having more than one "obvious way to do it" was avoided.

[color=blue]
> while general reduce() does it just as _expected_:
>[color=green][color=darkred]
> >>> reduce(str.__ad d__, ['a', 'b', 'c'])[/color][/color]
> 'abc'[/color]

Well, if what you "expect" is the following performance...:

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(999))'
'reduce(str.__a dd__, x)'
1000 loops, best of 3: 1.82e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(999))' "''.join(x) "
10000 loops, best of 3: 68 usec per loop

i.e., a slowdown by about 2700% for a 999-items sequence,

[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(1999))'
'reduce(str.__a dd__, x)'
100 loops, best of 3: 5e+03 usec per loop
[alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(1999))' "''.join(x) "
10000 loops, best of 3: 143 usec per loop

growing to 3500% for a 1999-items sequence, and so on without bounds,
then no doubt ``reduce does it just as expected'' by YOU.

Most people, however, EXPECT sensible performance, not slow-downs by
factors of tens or hundreds of times, when they use constructs that are
considered "normal and supported" in the language and its built-ins.

This makes reduce a terrible performance trap just waiting to catch the
unwary. It SEEMS to work all right, but in fact it's doing nothing of
the kind, nor can it -- it's defined to iteratively run N repetitions
of whatever function you pass as the first argument, therefore it can
never have O(N) performance when used to add up a sequence of strings,
but always, necessarily O(N squared). It's _conceivable_ (although it
currently appears unlikely that Guido will ever countenance it) that
'sum' can be specialcased (to use faster approaches to summation when it
is dealing with sequences, not numbers) to give the O(N) performance
most people (sensibly) DO expect; that just depends on loosening its
current specs, while maintaining the concept of "sum of a sequence".
No such avenue is open for 'reduce': it will always be a terrible
performance trap just waiting to pounce on the unwary.

[color=blue]
> It may be sum() that is more difficult to learn...[/color]

I have enough experience teaching both built-in functions, by now,
that I can rule this hypothesis out entirely.

[color=blue]
> For this particular problem, it is better to use
> ''.join(['a', 'b', 'c']), you know.[/color]

Yes, and sum's error messages tells you so, so, if you DON'T know,
you learn immediately.
[color=blue]
> However, it is important for Python to have an easy and generic
> way to do things. If we have to read the manual through to solve
> anything, what is the point to use Python instead of Perl (or Ruby,
> to some extent)?[/color]

Why do you need to read the manual after getting an error message
that tells you to """ use ''.join(seq) instead """? As for the
importance of "easy and generic", I would agree -- I'd FAR rather
have 'sum' be able to handle _well_ sums of any kinds -- but Guido
doesn't, so far. If you have arguments that might convince him, make
then. But 'reduce' just isn't a candidate, as the "easy" part simply
cannot apply.

[color=blue]
> However, It may be better to give reduce() some nice notation.[/color]

"Syntax sugar causes cancer of the semicolon". The nicer the
notation, the more superficially attractive you make it to use
a construct with unacceptable performance implications, the
craziest and most terrible the performance trap you're building
for the poor unwary users. I think it's an appalling direction
to want to move the language in.

Alex

reduce() anomaly?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment