"Collapsing" a list into a list of changes

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alan McIntyre

    #16
    Re: "Collapsin g" a list into a list of changes

    Tony,

    Actually I only want to remove a certain kind of duplication; if an item
    occurs twice - say like this: [1,1,1,2,2,2,1,1 ,1], then I need to keep
    the order and occurrence of the individual values: [1,2,1]. Using a
    dict as you proposed loses the order of occurrence, as well as multiple
    occurrences of groups of the same item.

    If I didn't need those two qualities of the list to be preserved,
    though, I think I'd use something like your solution (if I was using a
    Python older than 2.3) or Steve Coats' solution posted above using Set.

    Thanks!
    Alan

    Tony wrote:[color=blue]
    > Here is a version using dictionary properties, ie no duplication of
    > keys.
    >
    > def condense(l):
    > d={}
    > for item in l:
    > d[item]=1
    > l=d.keys()
    > return l
    >
    > Cheers
    > Tony
    >[/color]

    Comment

    • Alan McIntyre

      #17
      Re: "Collapsin g" a list into a list of changes

      Alex,

      Wow, that method turns out to be the fastest so far in a simple
      benchmark on Python2.3 (on my machine, of course, YMMV); it takes 14%
      less time than the one that I deemed most straightforward . :)

      Thanks,
      Alan

      Alex Martelli wrote:[color=blue]
      > Hmmmm, what role does the enumeration play here? I don't see how you're
      > using it, at all. Why not just:
      >
      > def collapse(iterab le):
      > it = iter(iterable)
      > lastitem = it.next()
      > yield lastitem
      > for item in it:
      > if item != lastitem:
      > yield item
      > lastitem = item
      >
      > that's basically just the same as your code but without the strangeness
      > of making an enumerate for the purpose of ignoring what the enumerate
      > adds to an ordinary iterator.
      >
      >
      > Alex[/color]

      Comment

      • Jack Diederich

        #18
        Re: "Collapsin g" a list into a list of changes

        On Sat, Feb 05, 2005 at 02:31:08PM +1000, Nick Coghlan wrote:[color=blue]
        > Jack Diederich wrote:[color=green]
        > >Since this is 2.4 you could also return a generator expression.
        > >
        > >[color=darkred]
        > >>>>def iter_collapse(m yList):[/color]
        > >
        > >... return (x[0] for (x) in
        > >it.groupby([0,0,1,1,1,2,2,3 ,3,3,2,2,2,4,4, 4,5]))
        > >...[/color]
        >
        > But why write a function that returns a generator expression, when you
        > could just turn the function itself into a generator?
        >
        > Py>def iter_collapse(m yList):
        > ... for x in it.groupby(myLi st):
        > ... yield x[0]
        >[/color]
        Here is where I say, "because it is faster!"
        except it isn't. maybe. The results are unexpected, anyway.

        import timeit
        import itertools as it

        def collapse1(myl):
        for x in it.groupby(myl) :
        yield x[0]

        def collapse2(myl):
        return (x[0] for (x) in it.groupby(myl) )

        list_str = '[0,0,1,1,1,2,2,3 ,3,3,2,2,2,4,4, 4,5]'

        print "collapse1" , timeit.Timer('c ollapse1(%s)' % (list_str), 'from __main__ import collapse1').tim eit()
        print "collapse2" , timeit.Timer('c ollapse2(%s)' % (list_str), 'from __main__ import collapse2').tim eit()
        print "list(collapse1 )", timeit.Timer('l ist(collapse1(% s))' % (list_str), 'from __main__ import collapse1').tim eit()
        print "list(collapse2 )", timeit.Timer('l ist(collapse2(% s))' % (list_str), 'from __main__ import collapse2').tim eit()

        collapse1 1.06855082512
        collapse2 3.40627384186
        list(collapse1) 8.31489896774
        list(collapse2) 9.49903011322

        The overhead of creating the generator expression seems much higher than
        creating the equivalent function. If you subtract our the setup difference
        actually running through the whole iterator is slightly faster for genexps
        than functions that yield. At a guess it has something to do with how
        they handle lexical scope and some byte code differences.

        I said guess, right?

        -Jack

        Comment

        • Tony

          #19
          Re: "Collapsin g" a list into a list of changes


          Alan McIntyre wrote:[color=blue]
          > Tony,
          >
          > Actually I only want to remove a certain kind of duplication; <snip>[/color]

          How about this one liner?
          def condense(m):
          print [m[0]]+[m[k] for k in range(1,len(m)) if
          m[k]!=m[k-1]]
          b=[1,1,1,2,2,2,1,1 ,1]
          condense(b)
          Tony Clarke

          Comment

          • Steven Bethard

            #20
            Re: &quot;Collapsin g&quot; a list into a list of changes

            Alex Martelli wrote:[color=blue]
            > Steven Bethard <steven.bethard @gmail.com> wrote:
            >
            >[color=green]
            >>Here's a solution that works for iterables other than lists:
            >>
            >>py> def collapse(iterab le):
            >>... enumeration = enumerate(itera ble)
            >>... _, lastitem = enumeration.nex t()
            >>... yield lastitem
            >>... for i, item in enumeration:
            >>... if item != lastitem:
            >>... yield item
            >>... lastitem = item
            >>...
            >>py> lst = [0,0,1,1,1,2,2,3 ,3,3,2,2,2,4,4, 4,5]
            >>py> list(collapse(l st))
            >>[0, 1, 2, 3, 2, 4, 5]
            >>
            >>Again, I'm still not sure I'd call this more elegant...[/color]
            >
            >
            > Hmmmm, what role does the enumeration play here?[/color]

            None =) Sorry, it was an accidental carry-over from the list solution
            that looked at lst[i-1]. I thought about correcting it when I realized
            this (after posting), but I was too lazy. ;)

            Steve

            Comment

            • Alan McIntyre

              #21
              Re: &quot;Collapsin g&quot; a list into a list of changes

              Thanks to everybody that responded; I appreciate all the input, even if
              I didn't respond to all of it individually. :)

              Comment

              • Francis Girard

                #22
                Re: &quot;Collapsin g&quot; a list into a list of changes

                This is a prefect use case for the good old "reduce" function:

                --BEGIN SNAP

                a_lst = [None,0,0,1,1,1, 2,2,3,3,3,2,2,2 ,4,4,4,5]

                def straightforward _collapse(lst):
                return reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:], [lst[0]])

                def straightforward _collapse_secu( lst):
                return lst and reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:],
                [lst[0]]) or []

                print straightforward _collapse(a_lst )
                print straightforward _collapse_secu([])

                --END SNAP

                Regards

                Francis Girard

                Le vendredi 4 Février 2005 20:08, Steven Bethard a écrit :[color=blue]
                > Mike C. Fletcher wrote:[color=green]
                > > Alan McIntyre wrote:
                > > ...
                > >[color=darkred]
                > >> I have a list of items that has contiguous repetitions of values, but
                > >> the number and location of the repetitions is not important, so I just
                > >> need to strip them out. For example, if my original list is
                > >> [0,0,1,1,1,2,2,3 ,3,3,2,2,2,4,4, 4,5], I want to end up with
                > >> [0,1,2,3,2,4,5].[/color]
                > >
                > > ...
                > >[color=darkred]
                > >> Is there an elegant way to do this, or should I just stick with the
                > >> code above?
                > >>
                > > >>> def changes( dataset ):[/color]
                > >
                > > ... last = None
                > > ... for value in dataset:
                > > ... if value != last:
                > > ... yield value
                > > ... last = value
                > > ... >>> print list(changes(da ta ))
                > >
                > > which is quite readable/elegant IMO.[/color]
                >
                > But fails if the list starts with None:
                >
                > py> lst = [None,0,0,1,1,1, 2,2,3,3,3,2,2,2 ,4,4,4,5]
                > py> def changes(dataset ):
                > ... last = None
                > ... for value in dataset:
                > ... if value != last:
                > ... yield value
                > ... last = value
                > ...
                > py> list(changes(ls t))
                > [0, 1, 2, 3, 2, 4, 5]
                >
                > A minor modification that does appear to work:
                >
                > py> def changes(dataset ):
                > ... last = object()
                > ... for value in dataset:
                > ... if value != last:
                > ... yield value
                > ... last = value
                > ...
                > py> list(changes(ls t))
                > [None, 0, 1, 2, 3, 2, 4, 5]
                >
                > STeVe[/color]

                Comment

                • Peter Otten

                  #23
                  Re: &quot;Collapsin g&quot; a list into a list of changes

                  Francis Girard wrote:
                  [color=blue]
                  > This is a prefect use case for the good old "reduce" function:
                  >
                  > --BEGIN SNAP
                  >
                  > a_lst = [None,0,0,1,1,1, 2,2,3,3,3,2,2,2 ,4,4,4,5]
                  >
                  > def straightforward _collapse(lst):
                  > return reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:], [lst[0]])[/color]

                  reduce() magically increases the number of function calls by len(a_list).
                  There will be no good use cases for reduce() until Python can inline
                  functions.

                  Apodictically Yours
                  Peter

                  Comment

                  • Adam Przybyla

                    #24
                    Re: &quot;Collapsin g&quot; a list into a list of changes

                    Alan McIntyre <alan.mcintyre@ esrgtech.com> wrote:[color=blue]
                    > Hi all,
                    >
                    > I have a list of items that has contiguous repetitions of values, but
                    > the number and location of the repetitions is not important, so I just
                    > need to strip them out. For example, if my original list is
                    > [0,0,1,1,1,2,2,3 ,3,3,2,2,2,4,4, 4,5], I want to end up with [0,1,2,3,2,4,5].
                    >
                    > Here is the way I'm doing this now:
                    >
                    > def straightforward _collapse(myLis t):
                    > collapsed = [myList[0]]
                    > for n in myList[1:]:
                    > if n != collapsed[-1]:
                    > collapsed.appen d(n)
                    >
                    > return collapsed
                    >
                    > Is there an elegant way to do this, or should I just stick with the code
                    > above?[color=green][color=darkred]
                    >>> p=[1,1,1,1,1,4,4,4 ,8,8,9]
                    >>> filter(lambda y: y>0, map(lambda x,y: x==y and -1 or x,[0]+p,p+[0]))[/color][/color][/color]
                    [1, 4, 8, 9][color=blue][color=green][color=darkred]
                    >>>[/color][/color][/color]
                    Z powazaniem
                    Adam Przybyla

                    Comment

                    • Francis Girard

                      #25
                      Re: &quot;Collapsin g&quot; a list into a list of changes

                      Zut !

                      I'm very sorry that there is no good use case for the "reduce" function in
                      Python, like Peter Otten pretends. That's an otherwise very useful tool for
                      many use cases. At least on paper.

                      Python documentation should say "There is no good use case for the reduce
                      function in Python and we don't know why we bother you offering it."

                      Francis Girard

                      Le lundi 7 Février 2005 08:24, Peter Otten a écrit :[color=blue]
                      > Francis Girard wrote:[color=green]
                      > > This is a prefect use case for the good old "reduce" function:
                      > >
                      > > --BEGIN SNAP
                      > >
                      > > a_lst = [None,0,0,1,1,1, 2,2,3,3,3,2,2,2 ,4,4,4,5]
                      > >
                      > > def straightforward _collapse(lst):
                      > > return reduce(lambda v,e: v[-1]!=e and v+[e] or v, lst[1:], [lst[0]])[/color]
                      >
                      > reduce() magically increases the number of function calls by len(a_list).
                      > There will be no good use cases for reduce() until Python can inline
                      > functions.
                      >
                      > Apodictically Yours
                      > Peter[/color]

                      Comment

                      • Peter Otten

                        #26
                        Re: &quot;Collapsin g&quot; a list into a list of changes

                        Francis Girard wrote:
                        [color=blue]
                        > Python documentation should say "There is no good use case for the reduce
                        > function in Python and we don't know why we bother you offering it."[/color]

                        Submit a patch :-)

                        Peter


                        Comment

                        • Steven Bethard

                          #27
                          Re: &quot;Collapsin g&quot; a list into a list of changes

                          Francis Girard wrote:[color=blue]
                          > I'm very sorry that there is no good use case for the "reduce" function in
                          > Python, like Peter Otten pretends. That's an otherwise very useful tool for
                          > many use cases. At least on paper.[/color]

                          Clarity aside[1], can you give an example of where reduce is as
                          efficient as the eqivalent loop in Python?

                          Steve

                          [1] I generally find most reduce solutions much harder to read, but
                          we'll set aside my personal limitations for the moment. ;)

                          Comment

                          • John Lenton

                            #28
                            Re: &quot;Collapsin g&quot; a list into a list of changes

                            On Mon, Feb 07, 2005 at 07:07:10PM +0100, Francis Girard wrote:[color=blue]
                            > Zut !
                            >
                            > I'm very sorry that there is no good use case for the "reduce" function in
                            > Python, like Peter Otten pretends. That's an otherwise very useful tool for
                            > many use cases. At least on paper.
                            >
                            > Python documentation should say "There is no good use case for the reduce
                            > function in Python and we don't know why we bother you offering it."[/color]

                            I am guessing you are joking, right? I think Peter exaggerates when he
                            says that "there will be no good use cases for reduce"; it is very
                            useful, in writing very compact code when it does exactly what you
                            want (and you have to go through hoops to do it otherwise). It also
                            can be the fastest way to do something. For example, the fastest way
                            to get the factorial of a (small enough) number in pure python is

                            factorial = lambda n: reduce(operator .mul, range(1, n+1))

                            --
                            John Lenton (john@grulic.or g.ar) -- Random fortune:
                            Laugh and the world thinks you're an idiot.

                            -----BEGIN PGP SIGNATURE-----
                            Version: GnuPG v1.2.5 (GNU/Linux)

                            iD8DBQFCB7i6gPq u395ykGsRAvzgAK CnnxlkuwfR5sHRD d8zCimRTFi5rQCg oCwB
                            oRkSfburxlGnGrb QbhCZpho=
                            =2pvS
                            -----END PGP SIGNATURE-----

                            Comment

                            • Francis Girard

                              #29
                              Re: &quot;Collapsin g&quot; a list into a list of changes

                              No. Just wanted to naively use one of the base tool Python had to offer. I
                              tought it was very usable and very readable. I might be wrong.

                              Is there someone on this list using this tool and happy with it ? Or is my
                              mind too much targeted on FP paradigm and most of you really think that all
                              the functions that apply another function to each and every elements of a
                              list are bad (like "reduce", "map", "filter") ?

                              Francis Girard

                              Le lundi 7 Février 2005 19:25, Steven Bethard a écrit :[color=blue]
                              > Francis Girard wrote:[color=green]
                              > > I'm very sorry that there is no good use case for the "reduce" function
                              > > in Python, like Peter Otten pretends. That's an otherwise very useful
                              > > tool for many use cases. At least on paper.[/color]
                              >
                              > Clarity aside[1], can you give an example of where reduce is as
                              > efficient as the eqivalent loop in Python?
                              >
                              > Steve
                              >
                              > [1] I generally find most reduce solutions much harder to read, but
                              > we'll set aside my personal limitations for the moment. ;)[/color]

                              Comment

                              • Steven Bethard

                                #30
                                Re: &quot;Collapsin g&quot; a list into a list of changes

                                Francis Girard wrote:[color=blue]
                                > Is there someone on this list using this tool and happy with it ? Or is my
                                > mind too much targeted on FP paradigm and most of you really think that all
                                > the functions that apply another function to each and every elements of a
                                > list are bad (like "reduce", "map", "filter") ?[/color]

                                I know there are definitely proponents for map and filter, especially
                                for simple cases like:

                                map(int, lst)
                                filter(str.stri p, lst)

                                Note that, unlike reduce, map and filter aren't really going to increase
                                the number of function calls. Consider the equivalent list comprehensions:

                                [int(x) for x in lst]
                                [x for x in lst if str.strip(x)] [1]

                                The list comprehensions also require the same number of function calls
                                in these cases. Of course, in cases where a function does not already
                                exist, map and filter will require more function calls. Compare:

                                map(lambda x: x**2 + 1, lst)

                                with

                                [x**2 + 1 for x in lst]

                                Where the LC allows you to essentially "inline" the function. (You can
                                dis.dis these to see the difference if you like.)


                                As far as my personal preferences go, while the simple cases of map and
                                filter (the ones using existing functions) are certainly easy enough for
                                me to read, for more complicated cases, I find things like:

                                [x**2 + 1 for x in lst]
                                [x for x in lst if (x**2 + 1) % 3 == 1]

                                much more readable than the corresponding:

                                map(lambda x: x**2 + 1, lst)
                                filter(lambda x: (x**2 + 1) % 3 == 1, lst)

                                especially since I avoid lambda usage, and would have to write these as:

                                def f1(x):
                                return x**2 + 1
                                map(f1, lst)

                                def f2(x):
                                return (x**2 + 1) % 3 == 1
                                map(f2, lst)

                                (I actually find the non-lambda code clearer, but still more complicated
                                than the list comprehensions. )

                                Given that I use list comprehensions for the complicated cases, I find
                                it to be more consistent if I use list comprehensions in all cases, so I
                                even tend to avoid the simple map and filter cases.


                                As far as reduce goes, I've never seen code that I thought was clearer
                                using reduce than using a for-loop. I have some experience with FP, and
                                I can certainly figure out what a given reduce call is doing given
                                enough time, but I can always understand the for-loop version much more
                                quickly.


                                Of course, YMMV.

                                STeVe


                                [1] While it's not technically equivalent, this would almost certainly
                                be written as:
                                [x for x in lst if x.strip()]
                                which in fact takes better advantage of Python's duck-typing -- it will
                                work for unicode objects too.

                                Comment

                                Working...