finding items that occur more than once in a list

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Simon Forman

    finding items that occur more than once in a list

    Is there a more efficient way to do this?

    def f(L):
    '''Return a set of the items that occur more than once in L.'''
    L = list(L)
    for item in set(L):
    L.remove(item)
    return set(L)


    |>f([0, 0, 1, 1, 2, 2, 3])
    set([0, 1, 2])



  • Chris

    #2
    Re: finding items that occur more than once in a list

    On Mar 18, 11:57 am, Simon Forman <sajmik...@gmai l.comwrote:
    Is there a more efficient way to do this?
    >
    def f(L):
        '''Return a set of the items that occur more than once in L.'''
        L = list(L)
        for item in set(L):
            L.remove(item)
        return set(L)
    >
    |>f([0, 0, 1, 1, 2, 2, 3])
    set([0, 1, 2])
    def f(L):
    D = dict()
    for item in L:
    if item in D:
    D[item] += 1
    else:
    D[item] = 1
    return [i for i,j in D.items() if j 1]

    That would be my way to do it, would need to test it via several
    thousand iterations to see which one is most efficient though.

    Comment

    • Paul Rubin

      #3
      Re: finding items that occur more than once in a list

      Simon Forman <sajmikins@gmai l.comwrites:
      Is there a more efficient way to do this?

      Comment

      • Bryan Olson

        #4
        Re: finding items that occur more than once in a list

        Simon Forman wrote:
        Is there a more efficient way to do this?
        >
        def f(L):
        '''Return a set of the items that occur more than once in L.'''
        L = list(L)
        for item in set(L):
        L.remove(item)
        return set(L)
        That's neat, but quadratic time because list.remove() requires
        a linear search. We can make an efficient variant by using
        remove on a set rather than a list:

        def multiples(lst):
        singles = set(lst)
        mults = set()
        for x in lst:
        if x in singles:
        singles.remove( x)
        else:
        mults.add(x)
        return mults

        Though probably better is:

        def multiples(lst):
        seen = set()
        mults = set()
        for x in lst:
        if x in seen:
        mults.add(x)
        else:
        seen.add(x)
        return mults


        I've typically used dicts for such things, as in:

        def multiples(lst):
        h = {}
        for x in lst:
        h[x] = h.get(x, 0) + 1
        return set([x for x in h if h[x] 1])


        --
        --Bryan

        Comment

        • Raymond Hettinger

          #5
          Re: finding items that occur more than once in a list

          On Mar 18, 2:57 am, Simon Forman <sajmik...@gmai l.comwrote:
          Is there a more efficient way to do this?
          >
          def f(L):
          '''Return a set of the items that occur more than once in L.'''
          L = list(L)
          for item in set(L):
          L.remove(item)
          return set(L)
          >
          |>f([0, 0, 1, 1, 2, 2, 3])
          set([0, 1, 2])

          def f(L):
          seen = set()
          dups = set()
          for e in L:
          if e in seen:
          dups.add(e)
          else:
          seen.add(e)
          return dups



          Raymond

          Comment

          • Hrvoje Niksic

            #6
            Re: finding items that occur more than once in a list

            Ninereeds <stephenhorne10 0@aol.comwrites :
            As for the growth pattern, each time you grow the table you have to
            redistribute all the items previously inserted to new locations.
            Resizes would get rarer as more items are added due to the
            exponential growth, but every table resize would take longer too
            since there are more items to move. Inserting n items still
            intuitively looks like O(n^2) to me.
            >
            That said, it does remind me of that old exponential realloc trick
            for array resizing. Same thing, I suppose, since a hash table is
            basically an array. Maybe my math "intuition" is just wrong.
            That's it. While table resizes grow linearly in complexity, they
            become geometrically rarer. This is exactly what happens when
            resizing dynamic arrays such as Python lists. Applying your
            intuition, appending n elements to a Python list would also have
            O(n^2) complexity, which it doesn't. See, for example,

            Comment

            • sturlamolden

              #7
              Re: finding items that occur more than once in a list

              On 18 Mar, 10:57, Simon Forman <sajmik...@gmai l.comwrote:
              def f(L):
              '''Return a set of the items that occur more than once in L.'''
              L = list(L)
              for item in set(L):
              L.remove(item)
              return set(L)

              def nonunique(lst):
              slst = sorted(lst)
              return list(set([s[0] for s in
              filter(lambda t : not(t[0]-t[1]), zip(slst[:-1],slst[1:]))]))




              Comment

              • sturlamolden

                #8
                Re: finding items that occur more than once in a list

                On 18 Mar, 22:22, sturlamolden <sturlamol...@y ahoo.nowrote:
                def nonunique(lst):
                slst = sorted(lst)
                return list(set([s[0] for s in
                filter(lambda t : not(t[0]-t[1]), zip(slst[:-1],slst[1:]))]))

                Or perhaps better:

                def nonunique(lst):
                slst = sorted(lst)
                return list(set([s[0] for s in
                filter(lambda t : t[0] != t[1], zip(slst[:-1],slst[1:]))]))



                Comment

                • sturlamolden

                  #9
                  Re: finding items that occur more than once in a list

                  On 18 Mar, 22:25, sturlamolden <sturlamol...@y ahoo.nowrote:
                  def nonunique(lst):
                  slst = sorted(lst)
                  return list(set([s[0] for s in
                  filter(lambda t : t[0] != t[1], zip(slst[:-1],slst[1:]))]))
                  Obviously that should be 'lambda t : t[0] == t[1]'. Instead of using
                  the set function, there is more structure to exploit when the list is
                  sorted:


                  def nonunique(lst):
                  slst = sorted(lst)
                  dups = [s[0] for s in
                  filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
                  return [dups[0]] + [s[1] for s in
                  filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]





                  Comment

                  • Arnaud Delobelle

                    #10
                    Re: finding items that occur more than once in a list

                    On Mar 18, 9:56 pm, sturlamolden <sturlamol...@y ahoo.nowrote:
                    On 18 Mar, 22:25, sturlamolden <sturlamol...@y ahoo.nowrote:
                    >
                    def nonunique(lst):
                       slst = sorted(lst)
                       return list(set([s[0] for s in
                           filter(lambda t : t[0] != t[1], zip(slst[:-1],slst[1:]))]))
                    >
                    Obviously that should be 'lambda t : t[0] == t[1]'. Instead of using
                    the set function, there is more structure to exploit when the list is
                    sorted:
                    >
                    def nonunique(lst):
                       slst = sorted(lst)
                       dups = [s[0] for s in
                            filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
                       return [dups[0]] + [s[1] for s in
                            filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
                    Argh! What's wrong with something like:

                    def duplicates(l):
                    i = j = object()
                    for k in sorted(l):
                    if i != j == k: yield k
                    i, j = j, k
                    >>list(duplicat es([1, 2, 3, 2, 2, 4, 1]))
                    [1, 2]

                    --
                    Arnaud

                    Comment

                    • sturlamolden

                      #11
                      Re: finding items that occur more than once in a list

                      On 18 Mar, 23:45, Arnaud Delobelle <arno...@google mail.comwrote:
                      def nonunique(lst):
                      slst = sorted(lst)
                      dups = [s[0] for s in
                      filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
                      return [dups[0]] + [s[1] for s in
                      filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
                      >
                      Argh! What's wrong with something like:
                      >
                      def duplicates(l):
                      i = j = object()
                      for k in sorted(l):
                      if i != j == k: yield k
                      i, j = j, k

                      Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
                      as opposed to ours which are O(N log N).






                      Comment

                      • John Machin

                        #12
                        Re: finding items that occur more than once in a list

                        On Mar 19, 10:08 am, sturlamolden <sturlamol...@y ahoo.nowrote:
                        On 18 Mar, 23:45, Arnaud Delobelle <arno...@google mail.comwrote:
                        >
                        def nonunique(lst):
                        slst = sorted(lst)
                        dups = [s[0] for s in
                        filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
                        return [dups[0]] + [s[1] for s in
                        filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
                        >
                        Argh! What's wrong with something like:
                        >
                        def duplicates(l):
                        i = j = object()
                        for k in sorted(l):
                        if i != j == k: yield k
                        i, j = j, k
                        >
                        Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
                        as opposed to ours which are O(N log N).
                        I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
                        and is IMHO more readable than Paul's.

                        Comment

                        • sturlamolden

                          #13
                          Re: finding items that occur more than once in a list

                          On 19 Mar, 22:48, John Machin <sjmac...@lexic on.netwrote:
                          I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
                          and is IMHO more readable than Paul's.
                          Is a Python set implemented using a hash table?

                          Comment

                          • Justin Bozonier

                            #14
                            Re: finding items that occur more than once in a list

                            On Mar 19, 2:48 pm, John Machin <sjmac...@lexic on.netwrote:
                            On Mar 19, 10:08 am, sturlamolden <sturlamol...@y ahoo.nowrote:
                            >
                            >
                            >
                            On 18 Mar, 23:45, Arnaud Delobelle <arno...@google mail.comwrote:
                            >
                            def nonunique(lst):
                            slst = sorted(lst)
                            dups = [s[0] for s in
                            filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
                            return [dups[0]] + [s[1] for s in
                            filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
                            >
                            Argh! What's wrong with something like:
                            >
                            def duplicates(l):
                            i = j = object()
                            for k in sorted(l):
                            if i != j == k: yield k
                            i, j = j, k
                            >
                            Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
                            as opposed to ours which are O(N log N).
                            >
                            I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
                            and is IMHO more readable than Paul's.
                            It's not as much O(N)... Paul Robin's uses a sort first which is
                            definitely not O(N). Paul's could be prettied up a bit but the general
                            principle is sound.

                            Comment

                            • John Machin

                              #15
                              Re: finding items that occur more than once in a list

                              On Mar 20, 9:57 am, Justin Bozonier <darkxant...@gm ail.comwrote:
                              On Mar 19, 2:48 pm, John Machin <sjmac...@lexic on.netwrote:
                              >
                              >
                              >
                              On Mar 19, 10:08 am, sturlamolden <sturlamol...@y ahoo.nowrote:
                              >
                              On 18 Mar, 23:45, Arnaud Delobelle <arno...@google mail.comwrote:
                              >
                              def nonunique(lst):
                              slst = sorted(lst)
                              dups = [s[0] for s in
                              filter(lambda t : t[0] == t[1], zip(slst[:-1],slst[1:]))]
                              return [dups[0]] + [s[1] for s in
                              filter(lambda t : t[0] != t[1], zip(dups[:-1],dups[1:]))]
                              >
                              Argh! What's wrong with something like:
                              >
                              def duplicates(l):
                              i = j = object()
                              for k in sorted(l):
                              if i != j == k: yield k
                              i, j = j, k
                              >
                              Nice, and more readable. But I'd use Paul Robin's solution. It is O(N)
                              as opposed to ours which are O(N log N).
                              >
                              I'd use Raymond Hettinger's solution. It is as much O(N) as Paul's,
                              and is IMHO more readable than Paul's.
                              >
                              It's not as much O(N)... Paul Robin's uses a sort first which is
                              definitely not O(N). Paul's could be prettied up a bit but the general
                              principle is sound.
                              """
                              from collections import defaultdict
                              def nonunique(lst):
                              d = defaultdict(int )
                              for x in lst: d[x] += 1
                              return [x for x,n in d.iterkeys() if n 1]
                              """

                              I see no sort here.

                              Comment

                              Working...