reduce() anomaly?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alex Martelli

    Re: Python's simplicity philosophy

    Andrew Dalke wrote:
    ...[color=blue]
    > The other three are in csv.py and look something like
    > quotechar = reduce(lambda a, b, quotes = quotes:
    > (quotes[a] > quotes[b]) and a or b,
    > quotes.keys())
    > and are identical in intent to your use case -- select the
    > object from a list which maximizes some f(obj).[/color]

    I have already suggested, in a post on this thread's direct ancestor 9 days
    ago, the 100%-equivalent substitution in pure, simple, faster Python:

    quotechar = None
    for k in quotes:
    if not quotechar or quotes[k]>quotes[quotechar]:
    quotechar = k

    or, for those who consider fast, simple, obviously correct code too boring,

    quotechar = max([ (v,k) for k,v in quotes.iteritem s() ])[-1]

    which is more concise and at least as clever as the original.

    [color=blue]
    > This suggests the usefulness of a new function or two,
    > perhaps in itertools, which applies a function to a list[/color]

    Nope -- itertools is not about CONSUMERS of iterators, which this one would
    be. All itertools entries RETURN iterators.
    [color=blue]
    > of values and returns the first object which has the
    > maximum value, as in[/color]

    Given that the sort method of lists now has an optional key= argument, I
    think the obvious approach would be to add the same optional argument to min
    and max with exactly the same semantics. I.e., just as today:

    x = max(somelist)
    somelist.sort()
    assert x == somelist[-1]

    we'd also have

    x = max(somelist, key=whatever)
    somelist.sort(k ey=whatever)
    assert x == somelist[-1]

    a perfectly natural extension, it seems to me. I've found such total and
    extreme opposition to this perfectly natural extension in private
    correspondence about it with another Python committer that I've so far
    delayed proposing it on python-dev -- for reasons that escape me it would
    appear to be highly controversial rather than perfectly obvious.
    [color=blue]
    > def longest(seq):
    > return imax(seq, len)[/color]

    That would be max(seq, key=len) in my proposal.
    [color=blue]
    > lines. There were very few, and the paucity suggests
    > that 'sum' isn't needed all that often. Then again, I'm
    > not one who suggested that that be a builtin function ;)[/color]

    Right, that was my own responsibility. I did identify about
    10 spots in the standard library then (including substitutions
    for reduce); that's more than the typical built-in has, even though
    the tasks handled by the standard library are heavily slanted
    to string and text processing, networking &c, and (of course)
    "pure infrastructure" , rather than typical application tasks.


    Alex

    Comment

    • Andrew Dalke

      Re: Python's simplicity philosophy

      Alex:[color=blue]
      > I have already suggested, in a post on this thread's direct ancestor 9[/color]
      days[color=blue]
      > ago, the 100%-equivalent substitution in pure, simple, faster Python:[/color]

      I've only been skimming this thread. Now upon reread I see you also
      looked at the standard library. No fair -- I'm supposed to be the only
      one who does that! :)
      [color=blue]
      > Nope -- itertools is not about CONSUMERS of iterators, which this one[/color]
      would[color=blue]
      > be. All itertools entries RETURN iterators.[/color]

      Yup. There's no good, existing, natural place for such a function, that
      I can tell.
      [color=blue]
      > Given that the sort method of lists now has an optional key= argument, I
      > think the obvious approach would be to add the same optional argument to[/color]
      min[color=blue]
      > and max with exactly the same semantics. I.e., just as today:[/color]

      That follows Moshe's comment that a "Decorate Max Undecorate",
      in parallel to Decorate Sort Undecorate. Nice.
      [color=blue]
      > we'd also have
      >
      > x = max(somelist, key=whatever)
      > somelist.sort(k ey=whatever)
      > assert x == somelist[-1][/color]

      Is 'key' also an iterable? ...
      [color=blue]
      > That would be max(seq, key=len) in my proposal.[/color]

      Ahh, it's the function used to get the keys. What about 'getkey'
      as a name?

      I looked through the back thread but didn't see your function
      definition. Is it backwards compatible to the existing max
      function?... You know, I never realized how strange max was
      in the first place, so that
      max(1, 2)
      and
      max( (1, 2) )
      are both valid. It must assume that if there's only one argument
      then the latter form is appropriate. Then your form must say
      that if there's only one non-keyword argument then it's of
      the form

      max(seq, **args)

      But
      max(a, b, key=len)
      seems reasonable as well. Then again, so does
      max(seq, len)
      [color=blue]
      > a perfectly natural extension, it seems to me. I've found such total and
      > extreme opposition to this perfectly natural extension in private
      > correspondence about it with another Python committer that I've so far
      > delayed proposing it on python-dev -- for reasons that escape me it would
      > appear to be highly controversial rather than perfectly obvious.[/color]

      My guess is that reaction is based on the need to have both forms
      max(a, b)
      and
      max(seq)
      and allow a key function to be passed in, and make it feel
      natural without running into problems when the two might be
      mixed up.
      [color=blue]
      > even though
      > the tasks handled by the standard library are heavily slanted
      > to string and text processing, networking &c, and (of course)
      > "pure infrastructure" , rather than typical application tasks.[/color]

      Yeah, I worry about that bias error when I do my analysis.
      I really should also scan some of the other (proprietary) code
      bases I have access to, but then it leads to claims of being
      irreproducible. Besides, despite being a scientific programmer,
      I mostly do "string and text processing, networking &c."

      Andrew
      dalke@dalkescie ntific.com


      Comment

      • Bengt Richter

        Re: Python's simplicity philosophy

        On Sun, 16 Nov 2003 15:15:45 GMT, Alex Martelli <aleaxit@yahoo. com> wrote:
        [color=blue]
        >Andrew Dalke wrote:
        > ...[color=green]
        >> The other three are in csv.py and look something like
        >> quotechar = reduce(lambda a, b, quotes = quotes:
        >> (quotes[a] > quotes[b]) and a or b,
        >> quotes.keys())
        >> and are identical in intent to your use case -- select the
        >> object from a list which maximizes some f(obj).[/color][/color]
        IMO the use cases in csv.py derive from a particular approach to a particular
        problem, and don't mean much beyond that, except as a trigger for ideas. I.e., any
        ideas have to stand on their own, without reference to the csv use cases.
        (BTW I'm guessing one could do better with "for q in data.split(cand idate_quote): ..." and
        looking at q[0] and q[-1] and maybe q.strip()[0] and q.strip()[-1] as appropriate
        to gather the statistics for the two candidate quotes. This would also let you check
        for escaped quotes. But this is another rabbit trail here ;-)[color=blue]
        >
        >I have already suggested, in a post on this thread's direct ancestor 9 days
        >ago, the 100%-equivalent substitution in pure, simple, faster Python:
        >
        >quotechar = None
        >for k in quotes:
        > if not quotechar or quotes[k]>quotes[quotechar]:
        > quotechar = k
        >
        >or, for those who consider fast, simple, obviously correct code too boring,
        >
        >quotechar = max([ (v,k) for k,v in quotes.iteritem s() ])[-1]
        >
        >which is more concise and at least as clever as the original.
        >
        >[color=green]
        >> This suggests the usefulness of a new function or two,
        >> perhaps in itertools, which applies a function to a list[/color]
        >
        >Nope -- itertools is not about CONSUMERS of iterators, which this one would
        >be. All itertools entries RETURN iterators.[/color]
        Ok, but what about returning an iterator -- e.g., funumerate(f, seq) -- that supplies f(x),x pairs
        like enumerate(seq) supplies i,x?

        [I'd suggest extending enumerate, but I already want to pass optional range parameters there,
        so one could control the numeric values returned, e.g., enumerate(seq,< params>) corresponding to
        zip(xrange(<par ams>),seq))]. [BTW, should xrange() default to xrange(0,sys.ma xint,1)?]
        [color=blue]
        >[color=green]
        >> of values and returns the first object which has the
        >> maximum value, as in[/color]
        >
        >Given that the sort method of lists now has an optional key= argument, I[/color]
        This is a new one on me:[color=blue][color=green][color=darkred]
        >>> seq.sort(key=la mbda x:x)[/color][/color][/color]
        Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        TypeError: sort() takes no keyword arguments

        Do you mean the comparison function? Or is there something else now too?
        I'm beginning to infer that key= is actually a keyword arg for a _function_
        to get a "key" value from a composite object (in which case ISTM "getkeyvalu e" or "valuefunc"
        would be a better name). But IMO "key" suggests it will be used on elements x like x[key],
        not passing a definition key=whatever and then using key(x) to get values.
        [color=blue]
        >think the obvious approach would be to add the same optional argument to min
        >and max with exactly the same semantics. I.e., just as today:
        >
        >x = max(somelist)
        >somelist.sort( )
        >assert x == somelist[-1]
        >
        >we'd also have
        >
        >x = max(somelist, key=whatever)
        >somelist.sort( key=whatever)
        >assert x == somelist[-1][/color]
        I think I like it, other than the name. Maybe s/key/valuefunc/ ;-)
        [color=blue]
        >
        >a perfectly natural extension, it seems to me. I've found such total and
        >extreme opposition to this perfectly natural extension in private
        >corresponden ce about it with another Python committer that I've so far
        >delayed proposing it on python-dev -- for reasons that escape me it would
        >appear to be highly controversial rather than perfectly obvious.
        >[color=green]
        >> def longest(seq):
        >> return imax(seq, len)[/color]
        >
        >That would be max(seq, key=len) in my proposal.[/color]

        That's a nice option for max (and min, and ??), but ISTM that it would
        also be nice to have a factory for efficient iterators of this kind.
        It would probably be pretty efficient then to write

        maxlen, maxitem = max(funumerate( len,seq))

        or

        def longest(seq):
        return max(funumerate( len,seq))[-1]

        and it would be available as infrastructure for other efficient loops in
        addition to being tied in to specific sequence processors like max and min.

        Of course we can roll our own:
        [color=blue][color=green][color=darkred]
        >>> def funumerate(fun, seq):[/color][/color][/color]
        ... for x in seq: yield fun(x),x
        ...[color=blue][color=green][color=darkred]
        >>> seq = 'ab cde f gh ijk'.split()
        >>> seq[/color][/color][/color]
        ['ab', 'cde', 'f', 'gh', 'ijk'][color=blue][color=green][color=darkred]
        >>> max(funumerate( len,seq))[/color][/color][/color]
        (3, 'ijk')[color=blue][color=green][color=darkred]
        >>> min(funumerate( len,seq))[/color][/color][/color]
        (1, 'f')[color=blue][color=green][color=darkred]
        >>> max(funumerate( len,seq))[-1][/color][/color][/color]
        'ijk'[color=blue][color=green][color=darkred]
        >>> longest(seq)[/color][/color][/color]
        'ijk'[color=blue]
        >[color=green]
        >> lines. There were very few, and the paucity suggests
        >> that 'sum' isn't needed all that often. Then again, I'm
        >> not one who suggested that that be a builtin function ;)[/color]
        >
        >Right, that was my own responsibility. I did identify about
        >10 spots in the standard library then (including substitutions
        >for reduce); that's more than the typical built-in has, even though
        >the tasks handled by the standard library are heavily slanted
        >to string and text processing, networking &c, and (of course)
        >"pure infrastructure" , rather than typical application tasks.
        >[/color]

        Regards,
        Bengt Richter

        Comment

        • Andrew Dalke

          Re: Python's simplicity philosophy

          Alex:[color=blue][color=green]
          > >Given that the sort method of lists now has an optional key= argument, I[/color][/color]

          Bengt Richter[color=blue]
          > This is a new one on me:[color=green][color=darkred]
          > >>> seq.sort(key=la mbda x:x)[/color][/color]
          > Traceback (most recent call last):
          > File "<stdin>", line 1, in ?
          > TypeError: sort() takes no keyword arguments
          > Do you mean the comparison function? Or is there something else now too?[/color]

          Quoting from Brett Cannon's most excellent
          python-dev Summary for 2003-10-01 through 2003-10-15

          =============== =============== ===============
          Decorate-sort-undecorate eye for the list.sort guy
          --------------------------------------------------
          Raymond Hettinger suggested adding built-in support for the
          decorate-sort-undecorate (DSU) sorting idiom to list.sort (see the
          Python Cookbook recipe at
          http://aspn.activestate.com/ASPN/Coo...n/Recipe/52234 which is
          recipe 2.3 in the dead tree version or Tim Peters' intro to chapter 2
          for details on the idiom). After a long discussion on the technical
          merits of various ways to do this, list.sort gained the keyword
          arguments 'key' and 'reverse'.

          'key' takes in a function that accepts one argument and returns what the
          item should be sorted based on. So running ``[(0,0), (0,2),
          (0,1)].sort(key=lambd a x: x[1])`` will sort the list based on the second
          item in each tuple. Technically the sort algorithm just runs the item
          it is currently looking at through the function and then handles the
          sorting. This avoids having to actually allocate another list. If
          'key' and 'cmp' are both provided then 'key' is called on the items to
          be sorted and the function's return values are then passed to 'cmp'.

          'reverse' does what it sounds like based on whether its argument is true
          or false.

          list.sort also became guaranteed to be stable (this include 'reverse').

          A discussion of whether list.sort should return self came up and was
          *very* quickly squashed by Guido. The idea of having a second method,
          though, that did sort and returned a copy of the sorted list is still
          being considered.

          Contributing threads:
          `decorate-sort-undecorate
          <http://mail.python.org/pipermail/python-dev/2003-October/038652.html>`
          `list.sort
          <http://mail.python.org/pipermail/python-dev/2003-October/038772.html>`
          =============== =============== ===============
          [color=blue]
          > I'm beginning to infer that key= is actually a keyword arg for a[/color]
          _function_[color=blue]
          > to get a "key" value from a composite object (in which case ISTM[/color]
          "getkeyvalu e" or "valuefunc"[color=blue]
          > would be a better name). But IMO "key" suggests it will be used on[/color]
          elements x like x[key],[color=blue]
          > not passing a definition key=whatever and then using key(x) to get values.[/color]

          I concur. (Ooops, should have said "Me too!" :)

          [color=blue][color=green]
          > >That would be max(seq, key=len) in my proposal.[/color]
          >
          > That's a nice option for max (and min, and ??), but ISTM that it would
          > also be nice to have a factory for efficient iterators of this kind.
          > It would probably be pretty efficient then to write
          >
          > maxlen, maxitem = max(funumerate( len,seq))[/color]

          With generator expressions this is

          maxlen, maxitem = max( ((len(x), x) for x in seq) )

          It still has the (slight) problem that it assumes x is comparable
          when there are multiple items with the same length. Nearly
          all of these quicky versions make that assumption. The
          quicky will-work-for-any-item version would look more like

          maxlen, pos, maxitem = max( ((len(x), i, x) for i, x in enumerate(seq)) )


          Andrew
          dalke@dalkescie ntific.com


          Comment

          • Paul Rubin

            Re: Python's simplicity philosophy

            "Andrew Dalke" <adalke@mindspr ing.com> writes:[color=blue]
            > list.sort also became guaranteed to be stable (this include 'reverse').[/color]

            I don't see the reason for that. It seems like a needless restriction
            on implementers.
            [color=blue]
            > A discussion of whether list.sort should return self came up and was
            > *very* quickly squashed by Guido. The idea of having a second method,
            > though, that did sort and returned a copy of the sorted list is still
            > being considered.[/color]

            Changing the behavior of list.sort would be really bad, but I
            certainly favor adding a new method (maybe list.nsort). There should
            also be list.nuniq which takes a sorted list and strips duplicate
            elements.

            Comment

            • David Eppstein

              Re: Python's simplicity philosophy

              In article <KLRtb.2585$sb4 .1651@newsread2 .news.pas.earth link.net>,
              "Andrew Dalke" <adalke@mindspr ing.com> wrote:
              [color=blue][color=green][color=darkred]
              > > >That would be max(seq, key=len) in my proposal.[/color]
              > >
              > > That's a nice option for max (and min, and ??), but ISTM that it would
              > > also be nice to have a factory for efficient iterators of this kind.
              > > It would probably be pretty efficient then to write
              > >
              > > maxlen, maxitem = max(funumerate( len,seq))[/color]
              >
              > With generator expressions this is
              >
              > maxlen, maxitem = max( ((len(x), x) for x in seq) )
              >
              > It still has the (slight) problem that it assumes x is comparable
              > when there are multiple items with the same length. Nearly
              > all of these quicky versions make that assumption. The
              > quicky will-work-for-any-item version would look more like
              >
              > maxlen, pos, maxitem = max( ((len(x), i, x) for i, x in enumerate(seq)) )[/color]

              The new sort(key=...) option works even when the underlying objects are
              incomparable. I'd expect the same of max(key=...)

              So (supposing such a change ever happens) you could instead write
              maxitem = max(seq, key=len)
              maxlen = len(maxitem)

              --
              David Eppstein http://www.ics.uci.edu/~eppstein/
              Univ. of California, Irvine, School of Information & Computer Science

              Comment

              • Terry Reedy

                Re: Python's simplicity philosophy


                "Paul Rubin" <http://phr.cx@NOSPAM.i nvalid> wrote in message
                news:7xoevcf1u2 .fsf@ruckus.bro uhaha.com...[color=blue]
                > "Andrew Dalke" <adalke@mindspr ing.com> writes:[color=green]
                > > list.sort also became guaranteed to be stable (this include[/color][/color]
                'reverse').[color=blue]
                >
                > I don't see the reason for that. It seems like a needless[/color]
                restriction[color=blue]
                > on implementers.[/color]

                Which is why Guido long resisted making such a guarantee, in spite of
                many requests. (The previous list.sort was not stable.) However,
                timsort is stable, tested for about a year, and so fast for lists both
                random and with various types of order, and so close to optimal in
                terms of number of comparisons, that Guido was willing to 'annoint' it
                as list.sort for the indefinite future. If there were a generally
                useful non-stable sort discovered proposed in the future, is could be
                added under a different name.

                Terry J. Reedy


                Comment

                • Paul Rubin

                  Re: Python's simplicity philosophy

                  "Terry Reedy" <tjreedy@udel.e du> writes:[color=blue]
                  > Which is why Guido long resisted making such a guarantee, in spite of
                  > many requests. (The previous list.sort was not stable.) However,
                  > timsort is stable, tested for about a year, and so fast for lists both
                  > random and with various types of order, and so close to optimal in
                  > terms of number of comparisons, that Guido was willing to 'annoint' it
                  > as list.sort for the indefinite future. If there were a generally
                  > useful non-stable sort discovered proposed in the future, is could be
                  > added under a different name.[/color]

                  I think that decision is a little too CPython-centric. Some other
                  Python implementation (maybe even Jython) might like to implement the
                  list.sort method by calling some other sort routine (like the one
                  already in the library for that language) rather than by porting
                  timsort, even if timsort is better. Also, someone might want to
                  implement an indexable class that isn't a list (maybe it's a disk file
                  or something) and want to give it a sort method that cannot work by
                  sucking the keys into memory and calling timsort (maybe there are too
                  many keys to fit in memory, so external methods must be used). It may
                  turn out to work best to use some nonstable sorting method, but then
                  the behavior will be inconsistent with list.sort and that makes more
                  cruft that the application programmer has to remember.

                  Most sorting applications don't care about stability. I think if a
                  new sorting method is going to get added so there's separate methods
                  for guaranteed-stable and possibly-nonstable sorting, it's best to let
                  the new method be the stable one (maybe list.ssort) and leave the
                  existing one alone.

                  Of course then you want variants that return self instead of None, so
                  you have list.sort, list.ssort, list.nsort, and list.nssort. It gets
                  out of hand.

                  Maybe the best way to do this kind of thing is with keyword arguments
                  rather than new methods.

                  Comment

                  • Andrew Dalke

                    Re: Python's simplicity philosophy

                    Paul Rubin:[color=blue]
                    > "Andrew Dalke" <adalke@mindspr ing.com> writes:[/color]
                    [actually, I was quoting from the python-dev summary of Brett C's.

                    Paul:[color=blue]
                    > Changing the behavior of list.sort would be really bad, but I
                    > certainly favor adding a new method (maybe list.nsort). There should
                    > also be list.nuniq which takes a sorted list and strips duplicate
                    > elements.[/color]

                    Do you want unix-style unique where repeats are merged into
                    one, so that 1 2 2 1 -> 1 2 1 or do you want it to return
                    items which only occur once, and you don't care about the
                    order (as in dict.from_keys([1, 2, 2, 1]).keys() which can
                    return [1, 2] or [2, 1]) or do you want the order of the keys
                    in the final list to maintain the same order as the initial list,
                    so that [2, 1, 1, 2] -> [2, 1] always?

                    Andrew
                    dalke@dalkescie ntific.com


                    Comment

                    • Andrew Dalke

                      Re: Python's simplicity philosophy

                      Paul Rubin:[color=blue]
                      > I think that decision is a little too CPython-centric. Some other
                      > Python implementation (maybe even Jython) might like to implement the
                      > list.sort method by calling some other sort routine (like the one
                      > already in the library for that language)[/color]

                      That's not a problem with Jython, since Java has a stable sort, see

                      ..util.List)

                      There was a OCaml implemenation of a Python variant, called
                      Vyper. OCaml has a stable sort


                      C++'s STL include a stable_sort


                      C# doesn't appear to have a stable sort.

                      WWPyPyD? (What Would PyPy do?) I don't know.
                      [color=blue]
                      > Also, someone might want to
                      > implement an indexable class that isn't a list (maybe it's a disk file
                      > or something) and want to give it a sort method that cannot work by
                      > sucking the keys into memory and calling timsort (maybe there are too
                      > many keys to fit in memory, so external methods must be used).[/color]

                      There are a couple of misconceptions here:

                      - sort is a method on lists. It only works on lists. Unlike STL,
                      which distinguishes between a container and an algorithm, there
                      is no way to apply sort directly to anything which isn't a list. Your
                      case (using sort on "a disk file or something") cannot occur.

                      - All the keys must fit in memory. This is a consequence of being
                      a list. (The keys may depend on other resource to do the
                      comparison.) C++ is different in this regard as well.

                      - The sort only rearranges references so there's no way to use
                      sort to directly sort the values on disk as you could in C++.
                      [color=blue]
                      > It may
                      > turn out to work best to use some nonstable sorting method, but then
                      > the behavior will be inconsistent with list.sort and that makes more
                      > cruft that the application programmer has to remember.[/color]

                      It may have, but now with the restriction on what the sort must do
                      there's been a redefinition of what 'best' means for Python. No
                      implementation of Python is now allowed to do so, so the application
                      programmer won't have to remember it. All it does it make it
                      slightly harder for a C# programmer to write a Python implementation.
                      And I assume there's plenty of 3rd party libraries which can help out.
                      [color=blue]
                      > Most sorting applications don't care about stability.[/color]

                      Really? I prefer my sorts to be stable since it best agrees
                      with what a person would do, assuming infinite patience.
                      It's nice because it's well defined and platform independent --
                      which is important because I've seen bugs crop up in some
                      programs which assumed a sort (3C) under IRIX will give
                      the same order as under Solaris -- C's sort is not stable.
                      [color=blue]
                      > I think if a
                      > new sorting method is going to get added so there's separate methods
                      > for guaranteed-stable and possibly-nonstable sorting, it's best to let
                      > the new method be the stable one (maybe list.ssort) and leave the
                      > existing one alone.[/color]

                      Humbug. You want an extra method (and possibly a couple of
                      independent algorithms which can be used on your disk-based
                      but list-like data structures) which for the most widely used version
                      of Python (CPython) will not do anything different and for which
                      the second most widely used version (Jython) is a trivial change,
                      all so that *someday* someone who implements a Python in a
                      language which doesn't have a fast stable search doesn't have
                      to write one (copying out of any standard text book), find a
                      3rd party version, or translate one?

                      Why not worry about something more complex, like regular
                      expressions. CPython and Jython almost certainly have
                      differences in edge cases, and what will the version of Python
                      written in Prolog use?

                      Andrew
                      dalke@dalkescie ntific.com


                      Comment

                      • Magnus Lie Hetland

                        Re: Python's simplicity philosophy

                        In article <61Ztb.3481$sb4 .1708@newsread2 .news.pas.earth link.net>,
                        Andrew Dalke wrote:
                        [snip][color=blue]
                        > - sort is a method on lists. It only works on lists. Unlike STL,
                        > which distinguishes between a container and an algorithm, there
                        > is no way to apply sort directly to anything which isn't a list. Your
                        > case (using sort on "a disk file or something") cannot occur.[/color]

                        Surely he was talking about implementing "list-alikes"...? I.e.
                        sequences polymorphically equivalent to lists, or at least wrt.
                        sequence-ness and sorting...? The stability requirement does make this
                        sort of thing a tad more difficult. (Not that I find it very
                        problematic; I'm just trying to interpret the meaning of the post.)

                        --
                        Magnus Lie Hetland "In this house we obey the laws of
                        http://hetland.org thermodynamics! " Homer Simpson

                        Comment

                        • SUZUKI Hisao

                          Re: Python's simplicity philosophy

                          All in all, I agree with you. You have pointed out precisely
                          what has been making Python more attractive than, say, Perl.

                          In <lc65ho5g0n.fsf @gaffa.mit.edu> , you, Douglas Alan, wrote:[color=blue]
                          > The argument that some programmers might be too lazy to want to learn
                          > powerful, simple, and elegant features that can be taught in seconds,
                          > is no good reason to remove such features from Python and bloat Python
                          > by replacing them with a plethora of less powerful, less elegant
                          > features.[/color]

                          In <lcad73ye93.fsf @gaffa.mit.edu> , you also wrote:[color=blue]
                          > I like Python because it is a very simple, generic, easy-to-learn
                          > programming language, where I didn't have to learn much of anything
                          > new. It was just like all the other myriad of programming
                          > languages I have programmed in, only less cluttered than most, and
                          > "light-weight" in its implementation and accessibility.[/color]

                          So do I.

                          As to sum(), when learning string addition (concatenation) ,
                          one may wonder why sum() does not handle it:
                          [color=blue][color=green][color=darkred]
                          >>> sum(['a', 'b', 'c'])[/color][/color][/color]
                          Traceback (most recent call last):
                          File "<stdin>", line 1, in ?
                          TypeError: unsupported operand type(s) for +: 'int' and 'str'

                          while general reduce() does it just as _expected_:
                          [color=blue][color=green][color=darkred]
                          >>> reduce(str.__ad d__, ['a', 'b', 'c'])[/color][/color][/color]
                          'abc'

                          It may be sum() that is more difficult to learn...

                          For this particular problem, it is better to use
                          ''.join(['a', 'b', 'c']), you know.
                          However, it is important for Python to have an easy and generic
                          way to do things. If we have to read the manual through to solve
                          anything, what is the point to use Python instead of Perl (or Ruby,
                          to some extent)?
                          [color=blue]
                          > I having nothing against learning new stuff, by the way, but when I
                          > want to learn new stuff, I'm not particularly interested in the
                          > idiosyncrasies of Python -- I'd spend my effort learning something a
                          > bit more exotic, like OCaml, or Haskell, that might stretch my brain a
                          > bit. Learning a bunch of idiosyncratic detail is just tedious. This
                          > is one of the most important reasons that Perl sucks so badly.[/color]

                          Again in <lc65ho5g0n.fsf @gaffa.mit.edu>[color=blue]
                          > And as I have pointed out, it goes against the principle of simplicity
                          > and expressiveness to remove an easy to use and easy to learn simple
                          > and powerful feature with a slew of specific, tailored features. If
                          > reduce() can be relegated to a library or for the user to implement
                          > for himself, then so can sum(). If the language is to only have one,
                          > it should be reduce().[/color]

                          I agree with you.

                          However, It may be better to give reduce() some nice notation.
                          Someone (sorry, but I do not remember) suggested such one:
                          [s: s+c for c in ['a', 'b', 'c']]
                          or
                          [s='': s+c for c in ['a', 'b', 'c']]
                          though it may not be the nicest.

                          -- SUZUKI Hisao



                          Comment

                          • Andrew Dalke

                            Re: Python's simplicity philosophy

                            Magnus Lie Hetland:[color=blue]
                            > Surely he was talking about implementing "list-alikes"...?[/color]

                            Yes, I think you're right about that, and I misinterpreted
                            his statement.

                            That being the case, an alternative is to have that 'sort'
                            implements the Python required semantics, and that
                            'fsort' or somesuch ('f' for 'fast'?) implement the
                            appropriate data-structure specific one.

                            Then again, is the change that "all Python list-alikes must
                            implement stable sort" or that "Python native lists shall
                            now and forever implement stable sort"?

                            The official statement from the BDFL is

                            ] OK, I pronounce on this: Python's list.sort() shall be stable.

                            That's a statement only that list.sort shall be stable and
                            not that all .sort() methods must be stable.

                            BTW, there was a *HUGH* amount of discussion about
                            sort and stability and keys and DSU on the python-dev list.
                            See the archive

                            and look for "decorate-sort-undecorate" as well as subjects
                            with the word 'sort' in them.

                            Andrew
                            dalke@dalkescie ntific.com


                            Comment

                            • Ville Vainio

                              Re: Python's simplicity philosophy

                              SUZUKI Hisao <suzuki611@oki. com> writes:
                              [color=blue]
                              > However, It may be better to give reduce() some nice notation.[/color]

                              I think the idea is to hide the thing somewhere far away from
                              builtins, not to *increase* its prominence in the language by
                              introducing new syntax.

                              --
                              Ville Vainio http://www.students.tut.fi/~vainio24

                              Comment

                              • Alex Martelli

                                Re: Python's simplicity philosophy

                                SUZUKI Hisao wrote:
                                ...[color=blue]
                                > As to sum(), when learning string addition (concatenation) ,
                                > one may wonder why sum() does not handle it:[/color]

                                I originally had sum try to detect the "summing strings" case and
                                delegate under the covers to ''.join -- Guido vetoed that as too
                                "clever" (which has a BAD connotation in Python) and had me forbid
                                the "summing strings" case instead, for simplicity.
                                [color=blue][color=green][color=darkred]
                                > >>> sum(['a', 'b', 'c'])[/color][/color]
                                > Traceback (most recent call last):
                                > File "<stdin>", line 1, in ?
                                > TypeError: unsupported operand type(s) for +: 'int' and 'str'[/color]

                                whereupon, one hopes, the user checks sum's doc:
                                [color=blue][color=green][color=darkred]
                                >>> print sum.__doc__[/color][/color][/color]
                                sum(sequence, start=0) -> value

                                Returns the sum of a sequence of numbers (NOT strings) plus the value
                                of parameter 'start'. When the sequence is empty, returns start.[color=blue][color=green][color=darkred]
                                >>>[/color][/color][/color]

                                and if one tries anyway:
                                [color=blue][color=green][color=darkred]
                                >>> sum(['a','b','c'],'')[/color][/color][/color]
                                Traceback (most recent call last):
                                File "<stdin>", line 1, in ?
                                TypeError: sum() can't sum strings [use ''.join(seq) instead]

                                the error message should direct the user to the proper way to sum
                                strings. Having more than one "obvious way to do it" was avoided.

                                [color=blue]
                                > while general reduce() does it just as _expected_:
                                >[color=green][color=darkred]
                                > >>> reduce(str.__ad d__, ['a', 'b', 'c'])[/color][/color]
                                > 'abc'[/color]

                                Well, if what you "expect" is the following performance...:

                                [alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(999))'
                                'reduce(str.__a dd__, x)'
                                1000 loops, best of 3: 1.82e+03 usec per loop
                                [alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(999))' "''.join(x) "
                                10000 loops, best of 3: 68 usec per loop

                                i.e., a slowdown by about 2700% for a 999-items sequence,

                                [alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(1999))'
                                'reduce(str.__a dd__, x)'
                                100 loops, best of 3: 5e+03 usec per loop
                                [alex@lancelot tmp]$ timeit.py -c -s'x=map(str,ran ge(1999))' "''.join(x) "
                                10000 loops, best of 3: 143 usec per loop

                                growing to 3500% for a 1999-items sequence, and so on without bounds,
                                then no doubt ``reduce does it just as expected'' by YOU.

                                Most people, however, EXPECT sensible performance, not slow-downs by
                                factors of tens or hundreds of times, when they use constructs that are
                                considered "normal and supported" in the language and its built-ins.

                                This makes reduce a terrible performance trap just waiting to catch the
                                unwary. It SEEMS to work all right, but in fact it's doing nothing of
                                the kind, nor can it -- it's defined to iteratively run N repetitions
                                of whatever function you pass as the first argument, therefore it can
                                never have O(N) performance when used to add up a sequence of strings,
                                but always, necessarily O(N squared). It's _conceivable_ (although it
                                currently appears unlikely that Guido will ever countenance it) that
                                'sum' can be specialcased (to use faster approaches to summation when it
                                is dealing with sequences, not numbers) to give the O(N) performance
                                most people (sensibly) DO expect; that just depends on loosening its
                                current specs, while maintaining the concept of "sum of a sequence".
                                No such avenue is open for 'reduce': it will always be a terrible
                                performance trap just waiting to pounce on the unwary.

                                [color=blue]
                                > It may be sum() that is more difficult to learn...[/color]

                                I have enough experience teaching both built-in functions, by now,
                                that I can rule this hypothesis out entirely.

                                [color=blue]
                                > For this particular problem, it is better to use
                                > ''.join(['a', 'b', 'c']), you know.[/color]

                                Yes, and sum's error messages tells you so, so, if you DON'T know,
                                you learn immediately.
                                [color=blue]
                                > However, it is important for Python to have an easy and generic
                                > way to do things. If we have to read the manual through to solve
                                > anything, what is the point to use Python instead of Perl (or Ruby,
                                > to some extent)?[/color]

                                Why do you need to read the manual after getting an error message
                                that tells you to """ use ''.join(seq) instead """? As for the
                                importance of "easy and generic", I would agree -- I'd FAR rather
                                have 'sum' be able to handle _well_ sums of any kinds -- but Guido
                                doesn't, so far. If you have arguments that might convince him, make
                                then. But 'reduce' just isn't a candidate, as the "easy" part simply
                                cannot apply.

                                [color=blue]
                                > However, It may be better to give reduce() some nice notation.[/color]

                                "Syntax sugar causes cancer of the semicolon". The nicer the
                                notation, the more superficially attractive you make it to use
                                a construct with unacceptable performance implications, the
                                craziest and most terrible the performance trap you're building
                                for the poor unwary users. I think it's an appalling direction
                                to want to move the language in.


                                Alex

                                Comment

                                Working...