Feature request: sorting a list slice

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • George Sakkis

    Feature request: sorting a list slice

    It would be useful if list.sort() accepted two more optional
    parameters, start and stop, so that you can sort a slice in place. In
    other words,

    x = range(1000000)
    x.sort(start=3, stop=-1)

    would be equivalent to

    x[3:-1] = sorted(x[3:-1])

    but more efficient and without repeating the slice twice. If it's not
    too late for feature additions, I'd love to see this in 2.5.


    George

  • John Machin

    #2
    Re: Feature request: sorting a list slice

    Use case?

    Comment

    • Fredrik Lundh

      #3
      Re: Feature request: sorting a list slice

      George Sakkis wrote:
      [color=blue]
      > It would be useful if list.sort() accepted two more optional
      > parameters[/color]

      useful for what? what's the use case ?

      </F>

      Comment

      • Raymond Hettinger

        #4
        Re: Feature request: sorting a list slice

        George Sakkis wrote:[color=blue]
        > It would be useful if list.sort() accepted two more optional
        > parameters, start and stop, so that you can sort a slice in place. In
        > other words,
        >
        > x = range(1000000)
        > x.sort(start=3, stop=-1)
        >
        > would be equivalent to
        >
        > x[3:-1] = sorted(x[3:-1])
        >
        > but more efficient and without repeating the slice twice. If it's not
        > too late for feature additions, I'd love to see this in 2.5.[/color]

        This is a false optimization. The slicing steps are O(n) and the sort
        step is O(n log n) unless the data has some internal structure that
        Timsort can use to get closer to O(n).

        If you implemented this and timed in it real apps, I would be surprised
        to find that the slice extraction and assignments were the performance
        bottleneck. IMO, a worthwhile performance gain is unlikely.


        Raymond

        Comment

        • George Sakkis

          #5
          Re: Feature request: sorting a list slice

          Fredrik Lundh wrote:
          [color=blue]
          > George Sakkis wrote:
          >[color=green]
          >> It would be useful if list.sort() accepted two more optional
          >> parameters[/color]
          >
          >
          > useful for what? what's the use case ?
          >
          > </F>
          >[/color]

          Although there is a use case (see below), I don't think it's strictly
          necessary in this case. Arguments to add it:
          - Don't Repeat Yourself.
          - It doesn't introduce new syntax, builtin or standard library
          function; only two optional keywords in an existing method.
          - These keywords are already used in several list and string methods
          (index,find,..) and their meaning is instantly obvious.
          - Increased (even if small) efficiency. Yes, I know that in practice,
          and especially in apps that involve I/O, network, DB connections and so
          on, it's unlikely to be the bottleneck but this isn't a general
          argument for being sloppy. The fact that having a "for i in
          xrange(100000): pass" at the top of every file costs me only a few msec
          is not a reason to keep it there.

          IMO the only reason not to add it would be if it makes sort's
          implementation much more complicate that it already is. I don't know
          Timsort's details, but I doubt that this is the case; please correct me
          if I am wrong.

          And here's (a cut-down version of) the use case: a recursive
          generalization of groupby. On each recursive call, the argument list is
          sorted and regrouped. If sort() accepted a (start,end) range I could
          pass this over instead of creating a slice of the input each time. It's
          not hard to see that there can be an exponential number of calls wrt to
          the input's length, so cutting down the average cost of each call may
          be worthwhile for medium-largish sized inputs (of course it's still
          exponential asymptotically, so the constant factor does not make much
          difference for really large inputs).


          import itertools as it

          def groupBy(iterabl e, *keys):
          return _groupBy(list(i terable), keys, len(keys))

          def _groupBy(lst, keys, depth):
          if depth == 0:
          return lst
          key = keys[-depth]
          # sort group before subgrouping
          lst.sort(key=ke y)
          # group by key and then recursively do the same for each subgroup
          return [(k, _groupBy(list(s ubgroup),keys,d epth-1))
          for k,subgroup in it.groupby(lst, key)]


          #==== example =============== =============== ==========

          class Struct(object):
          def __init__(self, args):
          self.args = args
          def __getitem__(sel f,index):
          return self.args[index]

          data = map(Struct, [
          ('a', 2, True),
          ('b', 1, False),
          ('a', 1, True),
          ('a', 1, False),
          ('b', 2, True),
          ('b', 2, True),
          ('a', 2, True),
          ])

          from pprint import pprint
          from operator import itemgetter
          pprint(groupBy( data, itemgetter(2), itemgetter(0), itemgetter(1)))


          #======= output =============== =============== ===

          [(False,
          [('a', [(1, [<__main__.Struc t object at 0xb7dc128c>])]),
          ('b', [(1, [<__main__.Struc t object at 0xb7dc124c>])])]),
          (True,
          [('a',
          [(1, [<__main__.Struc t object at 0xb7dc126c>]),
          (2,
          [<__main__.Struc t object at 0xb7dc122c>,
          <__main__.Struc t object at 0xb7dc12ec>])]),
          ('b',
          [(2,
          [<__main__.Struc t object at 0xb7dc12ac>,
          <__main__.Struc t object at 0xb7dc12cc>])])])]


          George

          Comment

          • Heiko Wundram

            #6
            Re: Feature request: sorting a list slice

            Am Donnerstag 18 Mai 2006 22:13 schrieb Raymond Hettinger:[color=blue]
            > This is a false optimization. The slicing steps are O(n) and the sort
            > step is O(n log n) unless the data has some internal structure that
            > Timsort can use to get closer to O(n).
            >
            > If you implemented this and timed in it real apps, I would be surprised
            > to find that the slice extraction and assignments were the performance
            > bottleneck. IMO, a worthwhile performance gain is unlikely.[/color]

            I personally wouldn't find this to be a false optimization, at least not
            memory-wise. If you sort really large lists, and only want to sort part of
            the list, the memory gains of not having to do a slice should be enormous,
            and there should be some time-gains too. And, additionally, implementing this
            (if I understand timsort correctly, just had a look at the sources) isn't
            hard.

            Rather, I'd pose the usage question: why are you only sorting a part of a
            list? lists are basically meant for homogenous data. And sorting only a part
            of that should be a pretty meaningless operation, mostly...

            Anyway, I've just written up a patch to extend sort() with a start/stop
            parameter (which behaves just like the default slice notation). Generally,
            this will make sort() setup slightly slower in the "standard" case (because
            there's some more pointer arithmetic involved, but this should be negligible,
            and is O(1)), but for the actual use case, the following numbers can be seen:

            modelnine@phoen ix ~/mercurial/python-modelnine $ ./python test.py
            New average time: 13.7254650593 ms per loop
            Old average time: 14.839854002 ms per loop
            [10198 refs]
            modelnine@phoen ix ~/mercurial/python-modelnine $

            This is just a very, very simple test (testing the case of a list with one
            million entries, where 99000 are sorted):
            [color=blue][color=green][color=darkred]
            >>>[/color][/color][/color]
            from random import randrange
            from time import time

            x = [randrange(256) for x in xrange(1000000)]
            timesnew, timesold = [], []

            for reps in range(1000):
            y = x[:]
            timesnew.append (time())
            y.sort(start=10 00,stop=10000)
            timesnew[-1] = time() - timesnew[-1]

            for reps in range(1000):
            y = x[:]
            timesold.append (time())
            y[1000:10000] = sorted(y[1000:10000])
            timesold[-1] = time() - timesold[-1]

            print "New average time:", sum(timesnew), "ms per loop"
            print "Old average time:", sum(timesold), "ms per loop"[color=blue][color=green][color=darkred]
            >>>[/color][/color][/color]

            I'll post the patch to the SF bugtracker tomorrow, it's too late today for me
            to review it, but generally, I wouldn't find it to be a bad addition, if
            there's actually a general use case to only sort parts of a list.

            --- Heiko.

            Comment

            • Harold Fellermann

              #7
              Re: Feature request: sorting a list slice

              Fredrik Lundh wrote:[color=blue]
              > George Sakkis wrote:
              >[color=green]
              > > It would be useful if list.sort() accepted two more optional
              > > parameters[/color][/color]

              +1
              [color=blue]
              > useful for what? what's the use case ?[/color]

              Actually, I was in need of such a construct only few weeks ago. The
              task was to organize playlists
              of an mp3 player. I wanted to sort future entries but not past ones
              (according to an index in the
              list that seperates past from future). I can imagine many more
              situations in which sorting part of
              a list is usefull.

              While I agree that the improvement of the additional keywords is minor,
              I think that the additional
              load they impose on the sort function is also minor. In my oppinion,
              this is a straight-forward
              extension of what we already find in other places in the library. I
              would like to see it in a future version.

              - harold -

              Comment

              • Heiko Wundram

                #8
                Re: Feature request: sorting a list slice

                Am Donnerstag 18 Mai 2006 19:27 schrieb George Sakkis:[color=blue]
                > It would be useful if list.sort() accepted two more optional
                > parameters, start and stop, so that you can sort a slice in place.[/color]

                I've just submitted:



                to the bugtracker, which extends the (start, stop) keyword arguments to
                list.reverse() (which I've needed more than once). The patch updates the test
                suite, documentation, list object, and sorted() builtin to accept (or
                specify) the new arguments.

                Any comment/feedback would be appreciated.

                --- Heiko.

                Comment

                • George Sakkis

                  #9
                  Re: Feature request: sorting a list slice

                  Heiko Wundram wrote:
                  [color=blue]
                  > Am Donnerstag 18 Mai 2006 19:27 schrieb George Sakkis:[color=green]
                  > > It would be useful if list.sort() accepted two more optional
                  > > parameters, start and stop, so that you can sort a slice in place.[/color]
                  >
                  > I've just submitted:
                  >
                  > http://sourceforge.net/tracker/index...70&atid=305470
                  >
                  > to the bugtracker, which extends the (start, stop) keyword arguments to
                  > list.reverse() (which I've needed more than once). The patch updates the test
                  > suite, documentation, list object, and sorted() builtin to accept (or
                  > specify) the new arguments.
                  >
                  > Any comment/feedback would be appreciated.
                  >
                  > --- Heiko.[/color]

                  This is great, thanks Heiko ! Any idea on the chances of being
                  considered for inclusion in 2.5 ?

                  George

                  Comment

                  • Heiko Wundram

                    #10
                    Re: Feature request: sorting a list slice

                    Am Freitag 19 Mai 2006 23:24 schrieb George Sakkis:[color=blue]
                    > This is great, thanks Heiko ! Any idea on the chances of being
                    > considered for inclusion in 2.5 ?[/color]

                    Don't ask me, I'm not one of the core developers... ;-) But, anyway, the
                    people on python-dev are doing their best to review patches. Just: I rather
                    write them, than review them... ;-)

                    --- Heiko.

                    Comment

                    • Edward Elliott

                      #11
                      Re: Feature request: sorting a list slice

                      John Machin wrote:
                      [color=blue]
                      > Use case?[/color]

                      quicksort! :)

                      --
                      Edward Elliott
                      UC Berkeley School of Law (Boalt Hall)
                      complangpython at eddeye dot net

                      Comment

                      • Steve Holden

                        #12
                        Re: Feature request: sorting a list slice

                        The effbot wrote:
                        [color=blue]
                        > George Sakkis wrote:
                        >[color=green][color=darkred]
                        >>> It would be useful if list.sort() accepted two more optional
                        >>> parameters[/color][/color]
                        >
                        > useful for what? what's the use case ?
                        >[/color]
                        John Machin then wrote, without quoting any context at all:[color=blue]
                        > Use case?
                        >[/color]
                        He means "under what circumstances do you see someone actually using the
                        proposed feature rather than just nodding and saying "that looks
                        useful". IOW, who would use the new feature, and why?

                        regards
                        Steve
                        --
                        Steve Holden +44 150 684 7255 +1 800 494 3119
                        Holden Web LLC/Ltd http://www.holdenweb.com
                        Love me, love my blog http://holdenweb.blogspot.com
                        Recent Ramblings http://del.icio.us/steve.holden

                        Comment

                        • Robert Kern

                          #13
                          Re: Feature request: sorting a list slice

                          Steve Holden wrote:[color=blue]
                          > The effbot wrote:
                          >[color=green]
                          >>George Sakkis wrote:
                          >>[color=darkred]
                          >>>>It would be useful if list.sort() accepted two more optional
                          >>>>parameter s[/color]
                          >>
                          >>useful for what? what's the use case ?[/color]
                          >
                          > John Machin then wrote, without quoting any context at all:
                          >[color=green]
                          >>Use case?[/color]
                          >
                          > He means "under what circumstances do you see someone actually using the
                          > proposed feature rather than just nodding and saying "that looks
                          > useful". IOW, who would use the new feature, and why?[/color]

                          I believe that John was asking George for a use case rather than asking Fredrik
                          what a use case was.

                          --
                          Robert Kern

                          "I have come to believe that the whole world is an enigma, a harmless enigma
                          that is made terrible by our own mad attempt to interpret it as though it had
                          an underlying truth."
                          -- Umberto Eco

                          Comment

                          • Steve Holden

                            #14
                            Re: Feature request: sorting a list slice

                            Robert Kern wrote:[color=blue]
                            > Steve Holden wrote:
                            >[color=green]
                            >>The effbot wrote:
                            >>
                            >>[color=darkred]
                            >>>George Sakkis wrote:
                            >>>
                            >>>
                            >>>>>It would be useful if list.sort() accepted two more optional
                            >>>>>paramete rs
                            >>>
                            >>>useful for what? what's the use case ?[/color]
                            >>
                            >>John Machin then wrote, without quoting any context at all:
                            >>
                            >>[color=darkred]
                            >>>Use case?[/color]
                            >>
                            >>He means "under what circumstances do you see someone actually using the
                            >>proposed feature rather than just nodding and saying "that looks
                            >>useful". IOW, who would use the new feature, and why?[/color]
                            >
                            >
                            > I believe that John was asking George for a use case rather than asking Fredrik
                            > what a use case was.
                            >[/color]
                            In which case, as I pointed out, John would have done better to quote a
                            little more context.

                            regards
                            Steve
                            --
                            Steve Holden +44 150 684 7255 +1 800 494 3119
                            Holden Web LLC/Ltd http://www.holdenweb.com
                            Love me, love my blog http://holdenweb.blogspot.com
                            Recent Ramblings http://del.icio.us/steve.holden

                            Comment

                            • Robert Kern

                              #15
                              Re: Feature request: sorting a list slice

                              Steve Holden wrote:[color=blue]
                              > Robert Kern wrote:[/color]
                              [color=blue][color=green]
                              >>I believe that John was asking George for a use case rather than asking Fredrik
                              >>what a use case was.[/color]
                              >
                              > In which case, as I pointed out, John would have done better to quote a
                              > little more context.[/color]

                              No argument here.

                              --
                              Robert Kern

                              "I have come to believe that the whole world is an enigma, a harmless enigma
                              that is made terrible by our own mad attempt to interpret it as though it had
                              an underlying truth."
                              -- Umberto Eco

                              Comment

                              Working...