Re: Peek inside iterator (is there a PEP about this?)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Terry Reedy

    Re: Peek inside iterator (is there a PEP about this?)

    Luis Zarrabeitia wrote:
    Hi there.
    >
    For most use cases I think about, the iterator protocol is more than enough.
    However, on a few cases, I've needed some ugly hacks.
    >
    Ex 1:
    >
    a = iter([1,2,3,4,5]) # assume you got the iterator from a function and
    b = iter([1,2,3]) # these two are just examples.
    >
    then,
    >
    zip(a,b)
    >
    has a different side effect from
    >
    zip(b,a)
    >
    After the excecution, in the first case, iterator a contains just [5], on the
    second, it contains [4,5]. I think the second one is correct (the 5 was never
    used, after all). I tried to implement my 'own' zip, but there is no way to
    know the length of the iterator (obviously), and there is also no way
    to 'rewind' a value after calling 'next'.
    Interesting observation. Iterators are intended for 'iterate through
    once and discard' usages. To zip a long sequence with several short
    sequences, either use itertools.chain (short sequences) or put the short
    sequences as the first zip arg.
    Ex 2:
    >
    Will this iterator yield any value? Like with most iterables, a construct
    >
    if iterator:
    # do something
    >
    would be a very convenient thing to have, instead of wrapping a 'next' call on
    a try...except and consuming the first item.
    To test without consuming, wrap the iterator in a trivial-to-write
    one_ahead or peek class such as has been posted before.
    Ex 3:
    >
    if any(iterator):
    # do something ... but the first true value was already consumed and
    # cannot be reused. "Any" cannot peek inside the iterator without
    # consuming the value.
    If you are going to do something with the true value, use a for loop and
    break. If you just want to peek inside, use a sequence (list(iterator) ).
    Instead,
    >
    i1, i2 = tee(iterator)
    if any(i1):
    # do something with i2
    This effectively makes two partial lists and tosses one. That may or
    may not be a better idea.
    Question/Proposal:
    >
    Has there been any PEP regarding the problem of 'peeking' inside an iterator?
    Iterators are not sequences and, in general, cannot be made to act like
    them. The iterator protocol is a bare-minimum, least-common-denominator
    requirement for inter-operability. You can, of course, add methods to
    iterators that you write for the cases where one-ahead or random access
    *is* possible.
    Knowing if the iteration will end or not, and/or accessing the next value,
    without consuming it? Is there any (simple, elegant) way around it?
    That much is trivial. As suggested above, write a wrapper with the
    exact behavior you want. A sample (untested)

    class one_ahead():
    "Self.peek is the next item or undefined"
    def __init__(self, iterator):
    try:
    self.peek = next(iterator)
    self._it = iterator
    except StopIteration:
    pass
    def __bool__(self):
    return hasattr(self, 'peek')
    def __next__(self): # 3.0, 2.6?
    try:
    next = self.peek
    try:
    self.peek = next(self._it)
    except StopIteration:
    del self.peek
    return next
    except AttrError:
    raise StopIteration

    Terry Jan Reedy

  • Aaron \Castironpi\ Brady

    #2
    Re: Peek inside iterator (is there a PEP about this?)

    On Oct 1, 3:14 pm, Terry Reedy <tjre...@udel.e duwrote:
    Luis Zarrabeitia wrote:
    Hi there.
    >
    For most use cases I think about, the iterator protocol is more than enough.
    However, on a few cases, I've needed some ugly hacks.
    >
    Ex 1:
    >
    a = iter([1,2,3,4,5]) # assume you got the iterator from a function and
    b = iter([1,2,3])     # these two are just examples.
    >
    then,
    >
    zip(a,b)
    >
    has a different side effect from
    >
    zip(b,a)
    >
    After the excecution, in the first case, iterator a contains just [5], on the
    second, it contains [4,5]. I think the second one is correct (the 5 wasnever
    used, after all). I tried to implement my 'own' zip, but there is no way to
    know the length of the iterator (obviously), and there is also no way
    to 'rewind' a value after calling 'next'.
    >
    Interesting observation.  Iterators are intended for 'iterate through
    once and discard' usages.  To zip a long sequence with several short
    sequences, either use itertools.chain (short sequences) or put the short
    sequences as the first zip arg.
    >
    Ex 2:
    >
    Will this iterator yield any value? Like with most iterables, a construct
    >
    if iterator:
       # do something
    >
    would be a very convenient thing to have, instead of wrapping a 'next' call on
    a try...except and consuming the first item.
    >
    To test without consuming, wrap the iterator in a trivial-to-write
    one_ahead or peek class such as has been posted before.
    >
    Ex 3:
    >
    if any(iterator):
       # do something ... but the first true value was already consumedand
       # cannot be reused. "Any" cannot peek inside the iterator without
       # consuming the value.
    >
    If you are going to do something with the true value, use a for loop and
    break.  If you just want to peek inside, use a sequence (list(iterator) ).
    >
    Instead,
    >
    i1, i2 = tee(iterator)
    if any(i1):
       # do something with i2
    >
    This effectively makes two partial lists and tosses one.  That may or
    may not be a better idea.
    >
    Question/Proposal:
    >
    Has there been any PEP regarding the problem of 'peeking' inside an iterator?
    >
    Iterators are not sequences and, in general, cannot be made to act like
    them.  The iterator protocol is a bare-minimum, least-common-denominator
    requirement for inter-operability.  You can, of course, add methods to
    iterators that you write for the cases where one-ahead or random access
    *is* possible.
    >
    Knowing if the iteration will end or not, and/or accessing the next value,
    without consuming it? Is there any (simple, elegant) way around it?
    >
    That much is trivial.  As suggested above, write a wrapper with the
    exact behavior you want.  A sample (untested)
    >
    class one_ahead():
       "Self.peek is the next item or undefined"
       def __init__(self, iterator):
         try:
           self.peek = next(iterator)
           self._it = iterator
         except StopIteration:
           pass
       def __bool__(self):
         return hasattr(self, 'peek')
       def __next__(self): # 3.0, 2.6?
         try:
           next = self.peek
           try:
             self.peek = next(self._it)
           except StopIteration:
             del self.peek
           return next
         except AttrError:
           raise StopIteration
    >
    Terry Jan Reedy
    Terry's is close. '__nonzero__' instead of '__bool__', missing
    '__iter__', 'next', 'self._it.next( )' in 2.5.

    Then just define your own 'peekzip'. Short:

    def peekzip( *itrs ):
    while 1:
    if not all( itrs ):
    raise StopIteration
    yield tuple( [ itr.next( ) for itr in itrs ] )

    In some cases, you could require 'one_ahead' instances in peekzip, or
    create them yourself in new iterators.

    Here is your output: The first part uses zip, the second uses peekzip.

    [(1, 1), (2, 2), (3, 3)]
    5
    [(1, 1), (2, 2), (3, 3)]
    4

    4 is what you expect.

    Here's the full code.

    class one_ahead(objec t):
    "Self.peek is the next item or undefined"
    def __init__(self, iterator):
    try:
    self.peek = iterator.next( )
    self._it = iterator
    except StopIteration:
    pass
    def __nonzero__(sel f):
    return hasattr(self, 'peek')
    def __iter__(self):
    return self
    def next(self): # 3.0, 2.6?
    try:
    next = self.peek
    try:
    self.peek = self._it.next( )
    except StopIteration:
    del self.peek
    return next
    except AttributeError:
    raise StopIteration


    a= one_ahead( iter( [1,2,3,4,5] ) )
    b= one_ahead( iter( [1,2,3] ) )
    print zip( a,b )
    print a.next()

    def peekzip( *itrs ):
    while 1:
    if not all( itrs ):
    raise StopIteration
    yield tuple( [ itr.next( ) for itr in itrs ] )

    a= one_ahead( iter( [1,2,3,4,5] ) )
    b= one_ahead( iter( [1,2,3] ) )
    print list( peekzip( a,b ) )
    print a.next()

    There's one more option, which is to create your own 'push-backable'
    class, which accepts a 'previous( item )' message.

    (Unproduced)
    >>a= push_backing( iter( [1,2,3,4,5] ) )
    >>a.next( )
    1
    >>a.next( )
    2
    >>a.previous( 2 )
    >>a.next( )
    2
    >>a.next( )
    3

    Comment

    • Steven D'Aprano

      #3
      Re: Peek inside iterator (is there a PEP about this?)

      On Wed, 01 Oct 2008 16:14:09 -0400, Terry Reedy wrote:
      Iterators are intended for 'iterate through once and discard' usages.
      Also for reading files, which are often seekable.

      I don't disagree with the rest of your post, I thought I'd just make an
      observation that if the data you are iterating over supports random
      access, it's possible to write an iterator that also supports random
      access.

      --
      Steven

      Comment

      Working...