Iterator length

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bearophileHUGS@lycos.com

    Iterator length

    Often I need to tell the len of an iterator, this is a stupid example:
    >>l = (i for i in xrange(100) if i&1)
    len isn't able to tell it:
    >>len(l)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: object of type 'generator' has no len()

    This is a bad solution, it may need too much memory, etc:
    >>len(list(l) )
    This is a simple solution in a modern Python:
    >>sum(1 for _ in l)
    50

    This is a faster solution (and Psyco helps even more):

    def leniter(iterato r):
    """leniter(iter ator): return the length of an iterator,
    consuming it."""
    if hasattr(iterato r, "__len__"):
    return len(iterator)
    nelements = 0
    for _ in iterator:
    nelements += 1
    return nelements

    Is it a good idea to extend the functionalities of the built-in len
    function to cover such situation too?

    Bye,
    bearophile

  • George Sakkis

    #2
    Re: Iterator length

    bearophileHUGS@ lycos.com wrote:
    Often I need to tell the len of an iterator, this is a stupid example:
    >
    >l = (i for i in xrange(100) if i&1)
    >
    len isn't able to tell it:
    >
    >len(l)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: object of type 'generator' has no len()
    >
    This is a bad solution, it may need too much memory, etc:
    >
    >len(list(l))
    >
    This is a simple solution in a modern Python:
    >
    >sum(1 for _ in l)
    50
    >
    This is a faster solution (and Psyco helps even more):
    >
    def leniter(iterato r):
    """leniter(iter ator): return the length of an iterator,
    consuming it."""
    if hasattr(iterato r, "__len__"):
    return len(iterator)
    nelements = 0
    for _ in iterator:
    nelements += 1
    return nelements
    >
    Is it a good idea to extend the functionalities of the built-in len
    function to cover such situation too?
    >
    Bye,
    bearophile
    Is this a rhetorical question ? If not, try this:
    >>x = (i for i in xrange(100) if i&1)
    >>if leniter(x): print x.next()
    George

    Comment

    • bearophileHUGS@lycos.com

      #3
      Re: Iterator length

      George Sakkis:
      Is this a rhetorical question ? If not, try this:
      It wasn't a rhetorical question.

      >x = (i for i in xrange(100) if i&1)
      >if leniter(x): print x.next()
      What's your point? Maybe you mean that it consumes the given iterator?
      I am aware of that, it's written in the function docstring too. But
      sometimes you don't need the elements of a given iterator, you just
      need to know how many elements it has. A very simple example:

      s = "aaabbbbbaabbbb bb"
      from itertools import groupby
      print [(h,leniter(g)) for h,g in groupby(s)]

      Bye,
      bearophile

      Comment

      • Ben Finney

        #4
        Re: Iterator length

        bearophileHUGS@ lycos.com writes:
        But sometimes you don't need the elements of a given iterator, you
        just need to know how many elements it has.
        AFAIK, the iterator protocol doesn't allow for that.

        Bear in mind, too, that there's no way to tell from outside that an
        iterater even has a finite length; also, many finite-length iterators
        have termination conditions that preclude knowing the number of
        iterations until the termination condition actually happens.

        --
        \ "When a well-packaged web of lies has been sold to the masses |
        `\ over generations, the truth will seem utterly preposterous and |
        _o__) its speaker a raving lunatic." -- Dresden James |
        Ben Finney

        Comment

        • Gabriel Genellina

          #5
          Re: Iterator length

          At Thursday 18/1/2007 20:26, bearophileHUGS@ lycos.com wrote:
          >def leniter(iterato r):
          """leniter(iter ator): return the length of an iterator,
          consuming it."""
          if hasattr(iterato r, "__len__"):
          return len(iterator)
          nelements = 0
          for _ in iterator:
          nelements += 1
          return nelements
          >
          >Is it a good idea to extend the functionalities of the built-in len
          >function to cover such situation too?
          I don't think so, because it may consume the iterator, and that's a
          big side effect that one would not expect from builtin len()


          --
          Gabriel Genellina
          Softlab SRL






          _______________ _______________ _______________ _____
          Preguntá. Respondé. Descubrí.
          Todo lo que querías saber, y lo que ni imaginabas,
          está en Yahoo! Respuestas (Beta).
          ¡Probalo ya!


          Comment

          • Steven D'Aprano

            #6
            Re: Iterator length

            On Thu, 18 Jan 2007 16:55:39 -0800, bearophileHUGS wrote:
            What's your point? Maybe you mean that it consumes the given iterator?
            I am aware of that, it's written in the function docstring too. But
            sometimes you don't need the elements of a given iterator, you just
            need to know how many elements it has. A very simple example:
            >
            s = "aaabbbbbaabbbb bb"
            from itertools import groupby
            print [(h,leniter(g)) for h,g in groupby(s)]
            s isn't an iterator. It's a sequence, a string, and an iterable, but not
            an iterator.

            I hope you know what sequences and strings are :-)

            An iterable is anything that can be iterated over -- it includes sequences
            and iterators.

            An iterator, on the other hand, is something with the iterator protocol,
            that is, it has a next() method and raises StopIteration when it's done.
            >>s = "aaabbbbbaabbbb bb"
            >>s.next()
            Traceback (most recent call last):
            File "<stdin>", line 1, in ?
            AttributeError: 'str' object has no attribute 'next'

            An iterator should return itself if you pass it to iter():
            >>iter(s) is s
            False
            >>it = iter(s); iter(it) is it
            True

            You've said that you understand len of an iterator will consume the
            iterator, and that you don't think that matters. It might not matter in
            a tiny percentage of cases, but it will certainly matter all the rest
            of the time!

            And let's not forget, in general you CAN'T calculate the length of an
            iterator, not even in theory:

            def randnums():
            while random.random != 0.123456789:
            yield "Not finished yet"
            yield "Finished"

            What should the length of randnums() return?

            One last thing which people forget... iterators can have a length, the
            same as any other object, if they have a __len__ method:
            >>s = "aaabbbbbaabbbb bb"
            >>it = iter(s)
            >>len(it)
            16

            So, if you want the length of an arbitrary iterator, just call len()
            and deal with the exception.



            --
            Steven.

            Comment

            • bearophileHUGS@lycos.com

              #7
              Re: Iterator length

              Steven D'Aprano:
              s = "aaabbbbbaabbbb bb"
              from itertools import groupby
              print [(h,leniter(g)) for h,g in groupby(s)]
              >
              s isn't an iterator. It's a sequence, a string, and an iterable, but not
              an iterator.
              If you look better you can see that I use the leniter() on g, not on s.
              g is the iterator I need to compute the len of.

              I hope you know what sequences and strings are :-)
              Well, I know little still about the C implementation of CPython
              iterators :-)

              But I agree with the successive things you say, iterators may be very
              general things, and there are too many drawbacks/dangers, so it's
              better to keep leniter() as a function separated from len(), with
              specialized use.

              Bye and thank you,
              bearophile

              Comment

              • Steven D'Aprano

                #8
                Re: Iterator length

                On Fri, 19 Jan 2007 05:04:01 -0800, bearophileHUGS wrote:
                Steven D'Aprano:
                s = "aaabbbbbaabbbb bb"
                from itertools import groupby
                print [(h,leniter(g)) for h,g in groupby(s)]
                >>
                >s isn't an iterator. It's a sequence, a string, and an iterable, but not
                >an iterator.
                >
                If you look better you can see that I use the leniter() on g, not on s.
                g is the iterator I need to compute the len of.

                Oops, yes you're right. But since g is not an arbitrary iterator, one can
                easily do this:

                print [(h,len(list(g)) ) for h,g in groupby(s)]

                No need for a special function.


                >I hope you know what sequences and strings are :-)
                >
                Well, I know little still about the C implementation of CPython
                iterators :-)
                >
                But I agree with the successive things you say, iterators may be very
                general things, and there are too many drawbacks/dangers, so it's
                better to keep leniter() as a function separated from len(), with
                specialized use.
                I don't think it's better to have leniter() at all. If you, the iterator
                creator, know enough about the iterator to be sure it has a predictable
                length, you know how to calculate it. Otherwise, iterators in general
                don't have a predictable length even in principle.



                --
                Steven.

                Comment

                • bearophileHUGS@lycos.com

                  #9
                  Re: Iterator length

                  Steven D'Aprano:
                  since g is not an arbitrary iterator, one can easily do this:
                  print [(h,len(list(g)) ) for h,g in groupby(s)]
                  No need for a special function.
                  If you look at my first post you can see that I have shown that
                  solution too, but it creates a list that may be long, that may use a
                  lot of of memory, and then throws it away each time. I think that's a
                  bad solution. It goes against the phylosophy of iterators too, they are
                  things created to avoid managing true lists of items too.

                  If you, the iterator
                  creator, know enough about the iterator to be sure it has a predictable
                  length, you know how to calculate it.
                  I don't agree, sometimes I know I have a finite iterator, but I may
                  ignore how many elements it gives (and sometimes they may be a lot).
                  See the simple example with the groupby.

                  Bye,
                  bearophile

                  Comment

                  Working...