Do you have real-world use cases for map's None fill-in feature?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Raymond Hettinger

    Do you have real-world use cases for map's None fill-in feature?

    I am evaluating a request for an alternate version of itertools.izip( )
    that has a None fill-in feature like the built-in map function:
    [color=blue][color=green][color=darkred]
    >>> map(None, 'abc', '12345') # demonstrate map's None fill-in feature[/color][/color][/color]
    [('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')]

    The movitation is to provide a means for looping over all data elements
    when the input lengths are unequal. The question of the day is whether
    that is both a common need and a good approach to real-world problems.
    The answer to the question can likely be found in results from other
    programming languages or from real-world Python code that has used
    map's None fill-in feature.

    I scanned the docs for Haskell, SML, and Perl and found that the norm
    for map() and zip() is to truncate to the shortest input or raise an
    exception for unequal input lengths. I scanned the standard library
    and found no instances where map's fill-in feature was used. Likewise,
    I found no examples in all of the code I've ever written.

    The history of Python's current zip() function serves as another
    indicator that the proposal is weak. PEP 201 contemplated and rejected
    the idea as one that likely had unintended consequences. In the years
    since zip() was introduced in Py2.0, SourceForge has shown no requests
    for a fill-in version of zip().

    My request for readers of comp.lang.pytho n is to search your own code
    to see if map's None fill-in feature was ever used in real-world code
    (not toy examples). I'm curious about the context, how it was used,
    and what alternatives were rejected (i.e. did the fill-in feature
    improve the code).

    Also, I'm curious as to whether someone has seen a zip fill-in feature
    employed to good effect in some other programming language, perhaps
    LISP or somesuch?

    Maybe a few real-word code examples and experiences from other
    languages will shed light on the question of whether lock-step
    iteration has meaning beyond the length of the shortest matching
    elements. If ordinal position were considered as a record key, then
    the proposal equates to a database-style outer join operation (where
    data elements with unmatched keys are included) and order is
    significant. Does an outer-join have anything to do with lock-step
    iteration? Is this a fundamental looping construct or just a
    theoretical wish-list item? IOW, does Python really need
    itertools.izip_ longest() or would that just become a distracting piece
    of cruft?


    Raymond Hettinger


    P.S. FWIW, the OP's use case involved printing files in multiple
    columns:

    for f, g in itertools.izip_ longest(file1, file2, fillin_value='' ):
    print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

    The alternative was straight-forward but not as terse:

    while 1:
    f = file1.readline( )
    g = file2.readline( )
    if not f and not g:
    break
    print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

  • Paul Rubin

    #2
    Re: Do you have real-world use cases for map's None fill-in feature?

    "Raymond Hettinger" <python@rcn.com > writes:[color=blue]
    > I am evaluating a request for an alternate version of itertools.izip( )
    > that has a None fill-in feature like the built-in map function:
    >[color=green][color=darkred]
    > >>> map(None, 'abc', '12345') # demonstrate map's None fill-in feature[/color][/color][/color]

    I think finding different ways to write it was an entertaining
    exercise but it's too limited in usefulness to become a standard
    feature.

    I do think some idiom ought to develop to allow checking whether an
    iterator is empty, without consuming an item. Here's an idea:
    introduce something like

    iterator = check_empty(ite rator)

    where check_empty would work roughly like (untested):

    def check_empty(ite rator):
    iclass = iterator.__clas s__
    class buffered(iclass ):
    def __init__(self):
    n = iter((self.next (),)) # might raise StopIteration
    self.__save = chain(n, self)
    def next(self):
    return self.__save.nex t()
    # all other operations are inherited from iclass

    return buffered(iterat or)

    The idea is you get back a new iterator which yields the same stream
    and supports the same operations as the old one, if the old one is
    non-empty. Otherwise it raises StopIteration.

    There are some obvious problems with the above:

    1) the new iterator should support all of the old one's attributes,
    not just inherit its operations
    2) In the case where the old iterator is already buffered, the
    constructor should just peek at the lookahead instead of making
    a new object. That means that checking an iterator multiple times
    won't burn more and more memory.

    Maybe there is some way of doing the above with metaclasses but I've
    never been able to wrap my head around those.

    Comment

    • Bengt Richter

      #3
      Re: Do you have real-world use cases for map's None fill-in feature?

      On 7 Jan 2006 23:19:41 -0800, "Raymond Hettinger" <python@rcn.com > wrote:
      [color=blue]
      >I am evaluating a request for an alternate version of itertools.izip( )
      >that has a None fill-in feature like the built-in map function:
      >[color=green][color=darkred]
      >>>> map(None, 'abc', '12345') # demonstrate map's None fill-in feature[/color][/color]
      >[('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')]
      >[/color]
      I don't like not being able to supply my own sentinel. None is too common
      a value. Hm, ... <bf warning> unless maybe it can also be a type that we can instantiate with
      really-mean-it context level like None(5) ;-)
      [color=blue][color=green][color=darkred]
      >>> map(None(5), 'abc', '12345') # demonstrate map's None fill-in feature[/color][/color][/color]
      [('a', '1'), ('b', '2'), ('c', '3'), (None(5), '4'), (None(5), '5')]

      But seriously, standard sentinels for "missing data" and "end of data" might be nice to have,
      and to have produced in appropriate standard contexts. Singleton string subclass
      instances "NOD" and "EOD"? Doesn't fit with EOF=='' though.
      </bf warning>
      [color=blue]
      >The movitation is to provide a means for looping over all data elements
      >when the input lengths are unequal. The question of the day is whether
      >that is both a common need and a good approach to real-world problems.
      >The answer to the question can likely be found in results from other
      >programming languages or from real-world Python code that has used
      >map's None fill-in feature.
      >[/color]
      What about some semantics like my izip2 in


      (which doesn't even need a separate name, since it would be backwards compatible)

      Also, what about factoring sequence-related stuff into being methods or attributes
      of iter instances? And letting iter take multiple sequences or callable/sentinel pairs,
      which could be a substitute for izip and then some? Methods could be called via a returned
      iterator before or after the first .next() call, to control various features, such as
      sentinel testing by 'is' instead of '==' for callable/sentinel pairs, or buffering n
      steps of lookahead supported by a .peek(n) method defaulting to .peek(1), etc. etc.
      The point being to have a place to implement universal sequence stuff.
      [color=blue]
      >I scanned the docs for Haskell, SML, and Perl and found that the norm
      >for map() and zip() is to truncate to the shortest input or raise an
      >exception for unequal input lengths. I scanned the standard library
      >and found no instances where map's fill-in feature was used. Likewise,
      >I found no examples in all of the code I've ever written.
      >
      >The history of Python's current zip() function serves as another
      >indicator that the proposal is weak. PEP 201 contemplated and rejected
      >the idea as one that likely had unintended consequences. In the years
      >since zip() was introduced in Py2.0, SourceForge has shown no requests
      >for a fill-in version of zip().
      >
      >My request for readers of comp.lang.pytho n is to search your own code
      >to see if map's None fill-in feature was ever used in real-world code
      >(not toy examples). I'm curious about the context, how it was used,
      >and what alternatives were rejected (i.e. did the fill-in feature
      >improve the code).
      >
      >Also, I'm curious as to whether someone has seen a zip fill-in feature
      >employed to good effect in some other programming language, perhaps
      >LISP or somesuch?[/color]
      ISTM in general there is a chicken-egg problem where workarounds are easy.
      I.e., the real question is how many workaround situations there are
      that would have been handled conveniently with a builtin feature,
      and _then_ to see whether the convenience would be worth enough.[color=blue]
      >
      >Maybe a few real-word code examples and experiences from other
      >languages will shed light on the question of whether lock-step
      >iteration has meaning beyond the length of the shortest matching
      >elements. If ordinal position were considered as a record key, then
      >the proposal equates to a database-style outer join operation (where
      >data elements with unmatched keys are included) and order is
      >significant. Does an outer-join have anything to do with lock-step
      >iteration? Is this a fundamental looping construct or just a
      >theoretical wish-list item? IOW, does Python really need
      >itertools.izip _longest() or would that just become a distracting piece
      >of cruft?[/color]
      Even if there is little use for continuing in correct code, IWT getting
      at the state of the iterator in an erroroneous situation would be a benefit.
      Being able to see the result of the last attempt at gathering tuple elements
      could help. (I can see reasons for wanting variations of trying all streams
      vs shortcutting on the first to exhaust though).

      Regards,
      Bengt Richter

      Comment

      • Raymond Hettinger

        #4
        Re: Do you have real-world use cases for map's None fill-in feature?

        [Bengt Richter][color=blue]
        > What about some semantics like my izip2 in
        > http://groups.google.com/group/comp....1ddb1f46?hl=en
        >
        > (which doesn't even need a separate name, since it would be backwards compatible)
        >
        > Also, what about factoring sequence-related stuff into being methods or attributes
        > of iter instances? And letting iter take multiple sequences or callable/sentinel pairs,
        > which could be a substitute for izip and then some? Methods could be called via a returned
        > iterator before or after the first .next() call, to control various features, such as
        > sentinel testing by 'is' instead of '==' for callable/sentinel pairs, or buffering n
        > steps of lookahead supported by a .peek(n) method defaulting to .peek(1), etc. etc.
        > The point being to have a place to implement universal sequence stuff.[/color]

        ISTM, these cures are worse than the disease ;-)

        [color=blue]
        > Even if there is little use for continuing in correct code, IWT getting
        > at the state of the iterator in an erroroneous situation would be a benefit.
        > Being able to see the result of the last attempt at gathering tuple elements
        > could help. (I can see reasons for wanting variations of trying all streams
        > vs shortcutting on the first to exhaust though).[/color]

        On the one hand, that seems reasonable. On the other hand, I can't see
        how to use it without snarling the surrounding code in which case it is
        probably better to explicitly manage individual iterators within a
        while loop.


        Raymond

        Comment

        • Raymond Hettinger

          #5
          Re: Do you have real-world use cases for map's None fill-in feature?

          [Raymond Hettinger][color=blue][color=green]
          > > I am evaluating a request for an alternate version of itertools.izip( )
          > > that has a None fill-in feature like the built-in map function:
          > >[color=darkred]
          > > >>> map(None, 'abc', '12345') # demonstrate map's None fill-in feature[/color][/color][/color]

          [Paul Rubin][color=blue]
          > I think finding different ways to write it was an entertaining
          > exercise but it's too limited in usefulness to become a standard
          > feature.[/color]

          Makes sense.
          [color=blue]
          > I do think some idiom ought to develop to allow checking whether an
          > iterator is empty, without consuming an item. Here's an idea:
          > introduce something like
          >
          > iterator = check_empty(ite rator)[/color]

          There are so many varieties of iterator that it's probably not workable
          to alter the iterator API for all of the them. In any case, a broad
          API change like this would need its own PEP.

          [color=blue]
          > There are some obvious problems with the above:
          >
          > 1) the new iterator should support all of the old one's attributes,
          > not just inherit its operations
          > 2) In the case where the old iterator is already buffered, the
          > constructor should just peek at the lookahead instead of making
          > a new object. That means that checking an iterator multiple times
          > won't burn more and more memory.
          >
          > Maybe there is some way of doing the above with metaclasses but I've
          > never been able to wrap my head around those.[/color]

          Metaclasses are unlikely to be of help because there are so many,
          unrelated kinds of iterator -- most do not inherit from a common
          parent.


          Raymond

          Comment

          • Paul Rubin

            #6
            Re: Do you have real-world use cases for map's None fill-in feature?

            "Raymond Hettinger" <python@rcn.com > writes:[color=blue][color=green]
            > > iterator = check_empty(ite rator)[/color]
            >
            > There are so many varieties of iterator that it's probably not workable
            > to alter the iterator API for all of the them. In any case, a broad
            > API change like this would need its own PEP.[/color]

            The hope was that it wouldn't be an API change, but rather just a new
            function dropped into the existing library, that could wrap any
            existing iterator without having to change or break anything that's
            already been written. Maybe the resulting iterator couldn't support
            every operation, or maybe it could have a __getattr__ that delegates
            everything except "next" to the wrapped iterator, or something. The
            obvious implementation methods that I can see are very kludgy but
            maybe something better is feasible. I defer to your knowledge about
            this.

            Comment

            • Tim Peters

              #7
              Re: Do you have real-world use cases for map's None fill-in feature?

              [Raymond Hettinger][color=blue]
              > ...
              > I scanned the docs for Haskell, SML, and Perl and found that the norm
              > for map() and zip() is to truncate to the shortest input or raise an
              > exception for unequal input lengths.
              > ...
              > Also, I'm curious as to whether someone has seen a zip fill-in feature
              > employed to good effect in some other programming language, perhaps
              > LISP or somesuch?[/color]

              FYI, Common Lisp's `pairlis` function requires that its first two
              arguments be lists of the same length. It's a strain to compare to
              Python's zip() though, as the _intended_ use of `pairlis` is to add
              new pairs to a Lisp association list. For that reason, `pairlis`
              accepts an optional third argument; if present, this should be an
              association list, and pairs from zipping the first two arguments are
              prepended to it. Also for this reason, the _order_ in which pairs are
              taken from the first two arguments isn't defined(!).



              For its intended special-purpose use, it wouldn't make sense to allow
              arguments of different lengths.

              Comment

              • Bengt Richter

                #8
                Re: Do you have real-world use cases for map's None fill-in feature?

                On 10 Jan 2006 00:47:36 -0800, "Raymond Hettinger" <python@rcn.com > wrote:
                [color=blue]
                >[Bengt Richter][color=green]
                >> What about some semantics like my izip2 in
                >> http://groups.google.com/group/comp....1ddb1f46?hl=en
                >>
                >> (which doesn't even need a separate name, since it would be backwards compatible)
                >>
                >> Also, what about factoring sequence-related stuff into being methods or attributes
                >> of iter instances? And letting iter take multiple sequences or callable/sentinel pairs,
                >> which could be a substitute for izip and then some? Methods could be called via a returned
                >> iterator before or after the first .next() call, to control various features, such as
                >> sentinel testing by 'is' instead of '==' for callable/sentinel pairs, or buffering n
                >> steps of lookahead supported by a .peek(n) method defaulting to .peek(1), etc. etc.
                >> The point being to have a place to implement universal sequence stuff.[/color]
                >
                >ISTM, these cures are worse than the disease ;-)[/color]
                Are you reacting to my turgidly rambling post, or to
                [color=blue][color=green][color=darkred]
                >>> from ut.izip2 import izip2 as izip
                >>> it = izip('abc','12' ,'ABCD')
                >>> for t in it: print t[/color][/color][/color]
                ...
                ('a', '1', 'A')
                ('b', '2', 'B')

                Then after a backwards-compatible izip, if the iterator has
                been bound, it can be used to continue, with sentinel sustitution:
                [color=blue][color=green][color=darkred]
                >>> for t in it.rest('<senti nel>'): print t[/color][/color][/color]
                ...
                ('c', '<sentinel>', 'C')
                ('<sentinel>', '<sentinel>', 'D')

                or optionally in sentinel substitution mode from the start:
                [color=blue][color=green][color=darkred]
                >>> for t in izip('abc','12' ,'ABCD').rest(' <sentinel>'): print t[/color][/color][/color]
                ...
                ('a', '1', 'A')
                ('b', '2', 'B')
                ('c', '<sentinel>', 'C')
                ('<sentinel>', '<sentinel>', 'D')

                Usage-wise, this seems not too diseased to me, so I guess I want to make sure
                this is what you were reacting to ;-)

                (Implementation was just to hack together a working demo. I'm sure it can be improved upon ;-)
                [color=blue]
                >
                >[color=green]
                >> Even if there is little use for continuing in correct code, IWT getting
                >> at the state of the iterator in an erroroneous situation would be a benefit.
                >> Being able to see the result of the last attempt at gathering tuple elements
                >> could help. (I can see reasons for wanting variations of trying all streams
                >> vs shortcutting on the first to exhaust though).[/color]
                >
                >On the one hand, that seems reasonable. On the other hand, I can't see
                >how to use it without snarling the surrounding code in which case it is
                >probably better to explicitly manage individual iterators within a
                >while loop.
                >[/color]
                The above would seem to allow separation of concerns. I.e., if you care why
                a normal iteration terminated, you can test after the fact. I.e., if all sequences
                were the same length, the .rest() iterator will be empty. And if you don't care at
                all about possible data, you can just try: it.rest().next( ) and catch StopIteration
                to check.

                BTW, is there any rule against passing information with StopIteration?

                Regards,
                Bengt Richter

                Comment

                • Szabolcs Nagy

                  #9
                  Re: Do you have real-world use cases for map's None fill-in feature?

                  > There are so many varieties of iterator that it's probably not workable[color=blue]
                  > to alter the iterator API for all of the them.[/color]

                  i always wondered if it can be implemented:

                  there are iterators which has length:[color=blue][color=green][color=darkred]
                  >>> i = iter([1,2,3])
                  >>> len(i)[/color][/color][/color]
                  3

                  now isn't there a way to make this length inheritible?
                  eg. generators could have length in this case:[color=blue][color=green][color=darkred]
                  >>> g = (x for x in [1,2,3])
                  >>> # len(g) == len([1,2,3]) == 3[/color][/color][/color]

                  of course in special cases length would remain undefined:[color=blue][color=green][color=darkred]
                  >>> f = (x for x in [1,2,3] if x>2)
                  >>> # len(f) == ?[/color][/color][/color]

                  IMHO there are special cases when this is useful:
                  L=list(it)
                  here if it has length, then list creation can be more effective
                  (required memory is known in advance)

                  nsz

                  Comment

                  • Szabolcs Nagy

                    #10
                    Re: Do you have real-world use cases for map's None fill-in feature?

                    > There are so many varieties of iterator that it's probably not workable[color=blue]
                    > to alter the iterator API for all of the them.[/color]

                    i always wondered if it can be implemented:

                    there are iterators which has length:[color=blue][color=green][color=darkred]
                    >>> i = iter([1,2,3])
                    >>> len(i)[/color][/color][/color]
                    3

                    now isn't there a way to make this length inheritible?
                    eg. generators could have length in this case:[color=blue][color=green][color=darkred]
                    >>> g = (x for x in [1,2,3])
                    >>> # len(g) == len([1,2,3]) == 3[/color][/color][/color]

                    of course in special cases length would remain undefined:[color=blue][color=green][color=darkred]
                    >>> f = (x for x in [1,2,3] if x>2)
                    >>> # len(f) == ?[/color][/color][/color]

                    IMHO there are special cases when this is useful:
                    L=list(it)
                    here if it has length, then list creation can be more effective
                    (required memory is known in advance)

                    nsz

                    Comment

                    • Paul Rubin

                      #11
                      Re: Do you have real-world use cases for map's None fill-in feature?

                      "Szabolcs Nagy" <nszabolcs@gmai l.com> writes:[color=blue]
                      > there are iterators which has length:[color=green][color=darkred]
                      > >>> i = iter([1,2,3])
                      > >>> len(i)[/color][/color]
                      > 3
                      >
                      > now isn't there a way to make this length inheritible?[/color]

                      I expect that's a __len__ method, which can be inherited.
                      [color=blue]
                      > eg. generators could have length in this case:[color=green][color=darkred]
                      > >>> g = (x for x in [1,2,3])
                      > >>> # len(g) == len([1,2,3]) == 3[/color][/color][/color]

                      I dunno what happens with this now.

                      Comment

                      • Fredrik Lundh

                        #12
                        Re: Do you have real-world use cases for map's None fill-in feature?

                        Szabolcs Nagy wrote:
                        [color=blue]
                        > there are iterators which has length:[color=green][color=darkred]
                        > >>> i = iter([1,2,3])
                        > >>> len(i)[/color][/color]
                        > 3[/color]

                        that's a bug, which has been fixed in 2.5:

                        Python 2.5a0 (#5, Dec 14 2005, 22:28:52)
                        Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
                        >>> i = iter([1, 2, 3])
                        >>> len(i)[/color][/color][/color]
                        Traceback (most recent call last):
                        File "<stdin>", line 1, in <module>
                        TypeError: len() of unsized object

                        </F>



                        Comment

                        • Raymond Hettinger

                          #13
                          Re: Do you have real-world use cases for map's None fill-in feature?

                          [Raymond][color=blue][color=green]
                          > >ISTM, these cures are worse than the disease ;-)[/color][/color]

                          [Bengt][color=blue]
                          > Are you reacting to my turgidly rambling post, or to
                          >[color=green][color=darkred]
                          > >>> from ut.izip2 import izip2 as izip
                          > >>> it = izip('abc','12' ,'ABCD')
                          > >>> for t in it: print t[/color][/color]
                          > ...
                          > ('a', '1', 'A')
                          > ('b', '2', 'B')
                          >
                          > Then after a backwards-compatible izip, if the iterator has
                          > been bound, it can be used to continue, with sentinel sustitution:
                          >[color=green][color=darkred]
                          > >>> for t in it.rest('<senti nel>'): print t[/color][/color]
                          > ...
                          > ('c', '<sentinel>', 'C')
                          > ('<sentinel>', '<sentinel>', 'D')
                          >
                          > or optionally in sentinel substitution mode from the start:
                          >[color=green][color=darkred]
                          > >>> for t in izip('abc','12' ,'ABCD').rest(' <sentinel>'): print t[/color][/color]
                          > ...
                          > ('a', '1', 'A')
                          > ('b', '2', 'B')
                          > ('c', '<sentinel>', 'C')
                          > ('<sentinel>', '<sentinel>', 'D')
                          >
                          > Usage-wise, this seems not too diseased to me, so I guess I want to make sure
                          > this is what you were reacting to ;-)[/color]

                          There is an elegance to the approach; however, if some sort of fill-in
                          were needed, I'm more inclined to add a separate function than to
                          append a method to the izip object. The latter API presents a bit of a
                          learning/memory challenge, not because it is hard, but because it is
                          atypical.

                          A unique advantage for your API is that the loop can be run in two
                          phases, matched and unmatched. The question then turns to whether there
                          is a need for that option. So far, the three threads on the subject
                          have shown us to be starved for use cases for a single phase
                          izip_longest function, much less a two-pass version of the same.



                          Raymond

                          Comment

                          Working...