Real-world use cases for map's None fill-in feature?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Raymond Hettinger

    Real-world use cases for map's None fill-in feature?

    Proposal
    --------
    I am gathering data to evaluate a request for an alternate version of
    itertools.izip( ) with a None fill-in feature like that for the built-in
    map() function:
    [color=blue][color=green][color=darkred]
    >>> map(None, 'abc', '12345') # demonstrate map's None fill-in feature[/color][/color][/color]
    [('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')]

    The motivation is to provide a means for looping over all data elements
    when the input lengths are unequal. The question of the day is whether
    that is both a common need and a good approach to real-world problems.
    The answer can likely be found in results from other programming
    languages and from surveying real-world Python code.

    Other languages
    ---------------
    I scanned the docs for Haskell, SML, and Perl6's yen operator and found
    that the norm for map() and zip() is to truncate to the shortest input
    or raise an exception for unequal input lengths. Ruby takes the
    opposite approach and fills-in nil values -- the reasoning behind the
    design choice is somewhat inscrutable:


    Real-world code
    ---------------
    I scanned the standard library, my own code, and a few third-party
    tools. I
    found no instances where map's fill-in feature was used.

    History of zip()
    ----------------
    PEP 201 (lock-step iteration) documents that a fill-in feature was
    contemplated and rejected for the zip() built-in introduced in Py2.0.
    In the years before and after, SourceForge logs show no requests for a
    fill-in feature.

    Request for more information
    ----------------------------
    My request for readers of comp.lang.pytho n is to search your own code
    to see if map's None fill-in feature was ever used in real-world code
    (not toy examples). I'm curious about the context, how it was used,
    and what alternatives were rejected (i.e. did the fill-in feature
    improve the code). Likewise, I'm curious as to whether anyone has seen
    a zip-style fill-in feature employed to good effect in some other
    programming language.

    Parallel to SQL?
    ----------------
    If an iterator element's ordinal position were considered as a record
    key, then the proposal equates to a database-style full outer join
    operation (one which includes unmatched keys in the result) where record
    order is significant. Does an outer-join have anything to do with
    lock-step iteration? Is this a fundamental looping construct or just a
    theoretical wish-list item? Does Python need itertools.izip_ longest()
    or would it just become a distracting piece of cruft?



    Raymond Hettinger


    FWIW, the OP's use case involved printing files in multiple
    columns:

    for f, g in itertools.izip_ longest(file1, file2, fillin_value='' ):
    print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

    The alternative was straightforward but less terse:

    while 1:
    f = file1.readline( )
    g = file2.readline( )
    if not f and not g:
    break
    print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())
  • Alex Martelli

    #2
    Re: Real-world use cases for map's None fill-in feature?

    Raymond Hettinger <python@rcn.com > wrote:
    ...[color=blue]
    > Request for more information
    > ----------------------------
    > My request for readers of comp.lang.pytho n is to search your own code
    > to see if map's None fill-in feature was ever used in real-world code
    > (not toy examples). I'm curious about the context, how it was used,
    > and what alternatives were rejected (i.e. did the fill-in feature[/color]

    I had (years ago, version was 1.5.2) one real-world case of map(max,
    seq1, seq2). The sequences represented alternate scores for various
    features, using None to mean "the score for this feature cannot be
    computed by the algorithm used to produce this sequence", and it was
    common to have one sequence longer (using a later-developed algorithm
    that computed more features). This use may have been an abuse of my
    observation that max(None, N) and max(N, None) were always N on the
    platform I was using at the time. I was relatively new at Python, and
    in retrospect I feel I might have been going for "use all the new toys
    we've just gotten" -- looping on feature index to compute the scores,
    and explicitly testing for None, might have been a better approach than
    building those lists (with seq1=map(scorer 1, range(N)), btw) and then
    running map on them, anyway. At any rate, I later migrated to a lazily
    computed version, don't recall the exact details but it was something
    like (in today's Python):

    class LazyMergedList( object):
    def __init__(self, *fs):
    self.fs = *fs
    self.known= {}
    def __getitem__(sel f, n):
    try: return self.known[n]
    except KeyError: pass
    result = self.known[n] = max(f(n) for f in fs)
    return result

    when it turned out that in most cases the downstream code wasn't
    actually using all the features (just a small subset in each case), so
    computing all of them ahead of time was a waste of cycles.

    I don't recall ever relying on map's None-filling feature in other
    real-world cases, and, as I mentioned, even here the reliance was rather
    doubtful. OTOH, if I had easily been able to specify a different
    filler, I _would_ have been able to use it a couple of times.


    Alex

    Comment

    • Anders Hammarquist

      #3
      Re: Real-world use cases for map's None fill-in feature?

      In article <mailman.194.11 36781640.27775. python-list@python.org >,
      Raymond Hettinger <python@rcn.com > wrote:[color=blue]
      >Request for more information
      >----------------------------
      >My request for readers of comp.lang.pytho n is to search your own code
      >to see if map's None fill-in feature was ever used in real-world code
      >(not toy examples).[/color]

      I had a quick look through our (Strakt's) codebase and found one example.

      The code is used to process user-designed macros, where the user wants
      to append data to strings stored in the system. Note that all data is
      stored as lists of whatever the relevant data type is.

      While I didn't write this bit of code (so I can't say what, if any,
      alternatives were considered), it does seem to me the most straight-
      forward way to do it. Being able to say what the fill-in value should
      be would make the code even simpler.

      oldAttrVal is the original stored data, and attValue is what the macro
      wants to append.

      --->8---
      newAttrVal = []
      for x, y in map(None, oldAttrVal, attrValue):
      newAttrVal.appe nd(u''.join((x or '', y or '')))
      --->8---

      /Anders

      --
      -- Of course I'm crazy, but that doesn't mean I'm wrong.
      Anders Hammarquist | iko@cd.chalmers .se
      Physics student, Chalmers University of Technology, | Hem: +46 31 88 48 50
      G|teborg, Sweden. RADIO: SM6XMM and N2JGL | Mob: +46 707 27 86 87

      Comment

      • Raymond Hettinger

        #4
        Re: Real-world use cases for map's None fill-in feature?

        [Alex Martelli][color=blue]
        > I had (years ago, version was 1.5.2) one real-world case of map(max,
        > seq1, seq2). The sequences represented alternate scores for various
        > features, using None to mean "the score for this feature cannot be
        > computed by the algorithm used to produce this sequence", and it was
        > common to have one sequence longer (using a later-developed algorithm
        > that computed more features). This use may have been an abuse of my
        > observation that max(None, N) and max(N, None) were always N on the
        > platform I was using at the time.[/color]

        Analysis
        --------

        That particular dataset has three unique aspects allowing the map(max,
        s1, s2, s3) approach to work at all.

        1) Fortuitious alignment in various meanings of None:
        - the input sequence using it to mean "feature cannot be computed"
        - the auto-fillin of None meaning "feature used in later
        algorithms, but not earlier ones"
        - the implementation quirk where max(None, n) == max(n, None) == n

        2) Use of a reduction function like max() which does not care about the
        order of inputs (i.e. the output sequence does not indicate which
        algorithm produced the best score).

        3) Later-developed sequences had to be created with the knowledge of
        the features used by all earlier sequences (lest two of the sequences
        get extended with different features corresponding to the same ordinal
        position).

        Getting around the latter limitation suggests using a mapping
        (feature->score) rather than tracking scores by ordinal position (with
        position corresponding to a particular feature):

        bestscore = {}
        for d in d1, d2, d3:
        for feature, score in d.iteritems():
        bestscore[feature] = max(bestscore.g et(feature, 0), score)

        Such an approach also gets around dependence on the other two unique
        aspects of the dataset. With dict.get() any object can be specified as
        a default value (with zero being a better choice for a null input to
        max()). Also, the pattern is not limited to commutative reduction
        functions like max(); instead, it would work just as well with a
        result.setdefau lt(feature, []).append(score) style accumulation of all
        results or with other combining/analysis functions.

        So, while map's None fill-in feature happened to apply to this
        dataset's unique features, I wonder if its availability steered you
        away from a better data-structure with greater flexibility, less
        dependence on quirks, and more generality.

        Perhaps the lesson is that outer-join operations are best expressed
        with dictionaries rather than sequences with unequal lengths.

        [color=blue]
        > I was relatively new at Python, and
        > in retrospect I feel I might have been going for "use all the new toys
        > we've just gotten"[/color]

        That suggests that if itertools.zip_l ongest() doesn't turn out to be
        TheRightTool(tm ) for many tasks, then it may have ill-effects beyond
        just being cruft -- it may steer folks away from better solutions. As
        you know, it can take a while for Python newcomers to realize the full
        power and generality of dictionary based approaches. I wonder if this
        proposed itertool would distract from that realization.

        [color=blue]
        > I don't recall ever relying on map's None-filling feature in other
        > real-world cases, and, as I mentioned, even here the reliance was rather
        > doubtful. OTOH, if I had easily been able to specify a different
        > filler, I _would_ have been able to use it a couple of times.[/color]

        Did you run across any cookbook code that would have been improved by
        the proposed itertools.zip_l ongest() function?



        Raymond

        Comment

        • rurpy@yahoo.com

          #5
          Re: Real-world use cases for map's None fill-in feature?


          "Raymond Hettinger" <python@rcn.com > wrote in message
          news:mailman.19 4.1136781640.27 775.python-list@python.org ...[color=blue]
          > Proposal
          > --------
          > I am gathering data to evaluate a request for an alternate version of
          > itertools.izip( ) with a None fill-in feature like that for the built-in
          > map() function:
          >[color=green][color=darkred]
          > >>> map(None, 'abc', '12345') # demonstrate map's None fill-in feature[/color][/color]
          > [('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')]
          >
          > The motivation is to provide a means for looping over all data elements
          > when the input lengths are unequal. The question of the day is whether
          > that is both a common need and a good approach to real-world problems.
          > The answer can likely be found in results from other programming
          > languages and from surveying real-world Python code.
          >
          > Other languages
          > ---------------
          > I scanned the docs for Haskell, SML, and Perl6's yen operator and found
          > that the norm for map() and zip() is to truncate to the shortest input
          > or raise an exception for unequal input lengths. Ruby takes the
          > opposite approach and fills-in nil values -- the reasoning behind the
          > design choice is somewhat inscrutable:
          > http://blade.nagaokaut.ac.jp/cgi-bin...ruby-dev/18651[/color]
          [color=blue]
          >From what I can make out (with help of internet[/color]
          language translation sites) the relevent part
          (section [2]) of this presents three options for
          handling unequal length arguments:
          1. zip to longest (Perl6 does it this way)
          2. zip to shortest (Python does it this way)
          3. use zip method and choose depending on
          whether argument list is shorter or longer
          than object's list.
          It then solicits opinions on the best way.
          It does not state or justify any particular choice.

          If "perl6"=="p erl6 yen operator" then there
          is a contradiction with your earlier statement.
          [color=blue]
          > Real-world code
          > ---------------
          > I scanned the standard library, my own code, and a few third-party
          > tools. I
          > found no instances where map's fill-in feature was used.
          >
          > History of zip()
          > ----------------
          > PEP 201 (lock-step iteration) documents that a fill-in feature was
          > contemplated and rejected for the zip() built-in introduced in Py2.0.
          > In the years before and after, SourceForge logs show no requests for a
          > fill-in feature.[/color]

          My perception is that many people view the process
          of advocating for a library addition as
          1. Very time consuming due to the large amount of
          work involved in presenting and defending a proposal.
          2. Having a very small chance of acceptance.
          I do not know whether this is really the case or even if my
          perception is correct, but if it is, it could account for the
          lack of feature requests.
          [color=blue]
          > Request for more information
          > ----------------------------
          > My request for readers of comp.lang.pytho n is to search your own code
          > to see if map's None fill-in feature was ever used in real-world code
          > (not toy examples). I'm curious about the context, how it was used,
          > and what alternatives were rejected (i.e. did the fill-in feature
          > improve the code). Likewise, I'm curious as to whether anyone has seen
          > a zip-style fill-in feature employed to good effect in some other
          > programming language.[/color]

          How well correlated in the use of map()-with-fill with the
          (need for) the use of zip/izip-with-fill?
          [color=blue]
          > Parallel to SQL?
          > ----------------
          > If an iterator element's ordinal position were considered as a record
          > key, then the proposal equates to a database-style full outer join
          > operation (one which includes unmatched keys in the result) where record
          > order is significant. Does an outer-join have anything to do with
          > lock-step iteration? Is this a fundamental looping construct or just a
          > theoretical wish-list item? Does Python need itertools.izip_ longest()
          > or would it just become a distracting piece of cruft?
          >
          > Raymond Hettinger
          >
          > FWIW, the OP's use case involved printing files in multiple
          > columns:
          >
          > for f, g in itertools.izip_ longest(file1, file2, fillin_value='' ):
          > print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())
          >
          > The alternative was straightforward but less terse:
          >
          > while 1:
          > f = file1.readline( )
          > g = file2.readline( )
          > if not f and not g:
          > break
          > print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())[/color]

          Actuall my use case did not have quite so much
          perlish line noise :-)
          Compared to
          for f, g in izip2 (file1, file2, fill=''):
          print '%s\t%s' % (f, g)
          the above looks like a relatively minor loss
          of conciseness, but consider the uses of the
          current izip, for example

          for i1, i2 in itertools.izip (iterable_1, iterable_2):
          print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

          can be replaced by:
          while 1:
          i1 = iterable_1.next ()
          i2 = iterable_2.next ()
          print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

          yet that was not justification for rejecting izip()'s
          inclusion in itertools.

          The other use case I had was a simple file diff.
          All I cared about was if the files were the same or
          not, and if not, what were the first differing lines.
          This was to compare output from a process that
          was supposed to match some saved reference
          data. Because of error propagation, lines beyond
          the first difference were meaningless. The code,
          using an "iterate to longest with fill" izip would be
          roughly:

          # Simple file diff to ident
          for ln1, ln2 in izip_long (file1, file2, fill="<EOF>"):
          if ln1 != ln2:
          break
          if ln1 == ln2:
          print "files are identical"
          else:
          print "files are different"

          This same use case occured again very recently
          when writing unit tests to compare output of a parser
          with known correct output during refactoring.

          With file iterators one can imagine many potential
          use cases for izip but not imap, but there are probably
          few real uses existant because generaly files may be
          of different lengths, and there currently is no useable
          izip for this case.

          [jan09 08:30 utc]

          Comment

          • Duncan Booth

            #6
            Re: Real-world use cases for map's None fill-in feature?

            Raymond Hettinger wrote:
            [color=blue]
            > My request for readers of comp.lang.pytho n is to search your own code
            > to see if map's None fill-in feature was ever used in real-world code
            > (not toy examples). I'm curious about the context, how it was used,
            > and what alternatives were rejected (i.e. did the fill-in feature
            > improve the code). Likewise, I'm curious as to whether anyone has seen
            > a zip-style fill-in feature employed to good effect in some other
            > programming language.[/color]

            One example of padding out iterators (although I didn't use map's fill-in
            to implement it) is turning a single column of items into a multi-column
            table with the items laid out across the rows first. The last row may have
            to be padded with some empty cells.

            Here's some code I wrote to do that. Never mind for the moment that the use
            of zip isn't actually defined here, it could use izip, but notice that the
            input iterator has to be converted to a list first so that I can add a
            suitable number of empty strings to the end. If there was an option to izip
            to pad the last element with a value of choice (such as a blank string) the
            code could work with iterators throughout:

            def renderGroups(se lf, group_size=2, allow_add=True) :
            """Iterates over the items rendering one item for each group.
            Each group contains an iterator for group_size elements.
            The last group may be padded out with empty strings.
            """
            elements = list(self.rende rIterator(allow _add)) + ['']*(group_size-
            1)
            eliter = iter(elements)
            return zip(*[eliter]*group_size)

            If there was a padding option to izip this could could have been something
            like:

            def renderGroups(se lf, group_size=2, allow_add=True) :
            """Iterates over the items rendering one item for each group.
            Each group contains an iterator for group_size elements.
            The last group may be padded out with empty strings.
            """
            iter = self.renderIter ator(allow_add)
            return itertools.izip( *[iter]*group_size, pad='')

            The code is then used to build a table using tal like this:

            <tal:loop repeat="row python:slot.ren derGroups(group _size=4);">
            <tr tal:define="isF irst repeat/row/start"
            tal:attributes= "class python:test(isF irst, 'slot-top','')">
            <td class="slotElem ent" tal:repeat="cel l row"
            tal:content="st ructure cell">4X Slot element</td>
            </tr>
            </tal:loop>

            Comment

            • Raymond Hettinger

              #7
              Re: Real-world use cases for map's None fill-in feature?

              [Anders Hammarquist]:[color=blue]
              > I had a quick look through our (Strakt's) codebase and found one example.[/color]

              Thanks for the research :-)

              [color=blue]
              > The code is used to process user-designed macros, where the user wants
              > to append data to strings stored in the system. Note that all data is
              > stored as lists of whatever the relevant data type is.
              >
              > While I didn't write this bit of code (so I can't say what, if any,
              > alternatives were considered), it does seem to me the most straight-
              > forward way to do it. Being able to say what the fill-in value should
              > be would make the code even simpler.
              >
              > oldAttrVal is the original stored data, and attValue is what the macro
              > wants to append.
              >
              > newAttrVal = []
              > for x, y in map(None, oldAttrVal, attrValue):
              > newAttrVal.appe nd(u''.join((x or '', y or '')))[/color]

              I'm finding this case difficult to analyze and generalize without
              knowing the significance of position in the list. It looks like None
              fill-in is used because attrValue may be a longer list whenever the
              user is specifying new system strings and it may be shorter when some
              of there are no new strings and the system strings aren't being updated
              at all. Either way, it looks like the ordinal position has some
              meaning that is shared by both oldAttrVal and newAttrVal, perhaps a
              message number or somesuch. If that is the case, is there some other
              table the assigns meanings to the resulting strings according to their
              index? What does the code look like that accesses newAttrVal and how
              does it know the significance of various positions in the list? This
              is important because it could shed some light on how an app finds
              itself looping over two lists which share a common meaning for each
              index position, yet they are unequal in length.



              Raymond

              Comment

              • Raymond Hettinger

                #8
                Re: Real-world use cases for map's None fill-in feature?

                Duncan Booth wrote:[color=blue]
                > One example of padding out iterators (although I didn't use map's fill-in
                > to implement it) is turning a single column of items into a multi-column
                > table with the items laid out across the rows first. The last row may have
                > to be padded with some empty cells.[/color]

                ANALYSIS
                --------

                This case relies on the side-effects of zip's implementation details --
                the trick of windowing or data grouping with code like: zip(it(),
                it(), it()). The remaining challenge is handling missing values when
                the reshape operation produces a rectangular matrix with more elements
                than provided by the iterable input.

                The proposed function directly meets the challenge:

                it = iter(iterable)
                result = izip_longest(*[it]*group_size, pad='')

                Alternately, the need can be met with existing tools by pre-padding the
                iterator with enough extra values to fill any holes:

                it = chain(iterable, repeat('', group_size-1))
                result = izip_longest(*[it]*group_size)

                Both approaches require a certain meaure of inventiveness, rely on
                advacned tricks, and forgo readability to gain the raw speed and
                conciseness afforded by a clever use of itertools. They are also a
                challenge to review, test, modify, read, or explain to others.

                In contrast, a simple generator is trivially easy to create and read,
                albiet less concise and not as speedy:

                it = iter(iterable)
                while 1:
                row = tuple(islice(it , group_size))
                if len(row) == group_size:
                yield row
                else:
                yield row + ('',) * (group_size - len(row))
                break

                The generator version is plain, simple, boring, and uninspirational .
                But it took only seconds to write and did not require a knowledge of
                advanced itertool combinations. It more easily explained than the
                versions with zip tricks.


                Raymond

                Comment

                • Paul Rubin

                  #9
                  Re: Real-world use cases for map's None fill-in feature?

                  "Raymond Hettinger" <python@rcn.com > writes:[color=blue]
                  > The generator version is plain, simple, boring, and uninspirational .
                  > But it took only seconds to write and did not require a knowledge of
                  > advanced itertool combinations. It more easily explained than the
                  > versions with zip tricks.[/color]

                  I had this cute idea of using dropwhile to detect the end of an iterable:

                  it = chain(iterable, repeat(''))
                  while True:
                  row = tuple(islice(it , group_size))
                  # next line raises StopIteration if row is entirely null-strings
                  dropwhile(lambd a x: x=='', row).next()
                  yield row

                  Comment

                  • Duncan Booth

                    #10
                    Re: Real-world use cases for map's None fill-in feature?

                    Raymond Hettinger wrote:
                    [color=blue]
                    > The generator version is plain, simple, boring, and uninspirational .
                    > But it took only seconds to write and did not require a knowledge of
                    > advanced itertool combinations. It more easily explained than the
                    > versions with zip tricks.
                    >[/color]
                    I can't argue with that.

                    Comment

                    • Raymond Hettinger

                      #11
                      Re: Real-world use cases for map's None fill-in feature?

                      rurpy@yahoo.com wrote:[color=blue]
                      > The other use case I had was a simple file diff.
                      > All I cared about was if the files were the same or
                      > not, and if not, what were the first differing lines.
                      > This was to compare output from a process that
                      > was supposed to match some saved reference
                      > data. Because of error propagation, lines beyond
                      > the first difference were meaningless.[/color]
                      . . .[color=blue]
                      > This same use case occured again very recently
                      > when writing unit tests to compare output of a parser
                      > with known correct output during refactoring.[/color]

                      Analysis
                      --------

                      Both of these cases compare two data streams and report the first
                      mismatch, if any. Data beyond the first mismatch is discarded.

                      The example code seeks to avoid managing two separate iterators and the
                      attendant code for trapping StopIteration and handling end-cases. The
                      simplification is accomplished by generating a single fill element so
                      that the end-of-file condition becomes it own element capable of being
                      compared or reported back as a difference. The EOF element serves as a
                      sentinel and allows a single line of comparison to handle all cases.
                      This is a normal and common use for sentinels.

                      The OP's code appends the sentinel using a proposed variant of zip()
                      which pads unequal iterables with a specified fill element:

                      for x, y in izip_longest(fi le1, file2, fill='<EOF>'):
                      if x != y:
                      return 'Mismatch', x, y
                      return 'Match'

                      Alternately, the example can be written using existing itertools:

                      for x, y in izip(chain(file 1, ['<EOF>']), chain(file2, ['<EOF>'])):
                      if x != y:
                      return 'Mismatch', x, y
                      return 'Match'

                      This is a typical use of chain() and not at all tricky. The chain()
                      function was specifically designed for tacking one or more elements
                      onto the end of another iterable. It is ideal for appending sentinels.


                      Raymond

                      Comment

                      • Raymond Hettinger

                        #12
                        Re: Real-world use cases for map's None fill-in feature?

                        > Alternately, the need can be met with existing tools by pre-padding the[color=blue]
                        > iterator with enough extra values to fill any holes:
                        >
                        > it = chain(iterable, repeat('', group_size-1))
                        > result = izip_longest(*[it]*group_size)[/color]

                        Typo: That should be izip() instead of izip_longest()

                        Comment

                        • rurpy@yahoo.com

                          #13
                          Re: Real-world use cases for map's None fill-in feature?


                          "Raymond Hettinger" <python@rcn.com > wrote:[color=blue]
                          > Duncan Booth wrote:[color=green]
                          > > One example of padding out iterators (although I didn't use map's fill-in
                          > > to implement it) is turning a single column of items into a multi-column
                          > > table with the items laid out across the rows first. The last row may have
                          > > to be padded with some empty cells.[/color]
                          >
                          > ANALYSIS
                          > --------
                          >
                          > This case relies on the side-effects of zip's implementation details --
                          > the trick of windowing or data grouping with code like: zip(it(),
                          > it(), it()). The remaining challenge is handling missing values when
                          > the reshape operation produces a rectangular matrix with more elements
                          > than provided by the iterable input.
                          >
                          > The proposed function directly meets the challenge:
                          >
                          > it = iter(iterable)
                          > result = izip_longest(*[it]*group_size, pad='')
                          >
                          > Alternately, the need can be met with existing tools by pre-padding the
                          > iterator with enough extra values to fill any holes:
                          >
                          > it = chain(iterable, repeat('', group_size-1))
                          > result = izip_longest(*[it]*group_size)[/color]

                          I assumed you meant izip() here (and saw your followup)
                          [color=blue]
                          > Both approaches require a certain meaure of inventiveness, rely on
                          > advacned tricks, and forgo readability to gain the raw speed and
                          > conciseness afforded by a clever use of itertools. They are also a
                          > challenge to review, test, modify, read, or explain to others.[/color]

                          The inventiveness is in the "(*[it]*group_size, " part. The
                          rest is straight forward (assuming of course that itertools
                          has good documentation, and it was read first.)
                          [color=blue]
                          > In contrast, a simple generator is trivially easy to create and read,
                          > albiet less concise and not as speedy:
                          >
                          > it = iter(iterable)
                          > while 1:
                          > row = tuple(islice(it , group_size))
                          > if len(row) == group_size:
                          > yield row
                          > else:
                          > yield row + ('',) * (group_size - len(row))
                          > break[/color]

                          Yes with 4 times the amount of code. (Yes, I am
                          one of those who believes production and maintence
                          cost is, under many circumstances, roughly correlated
                          with LOC.

                          An frankly, I don't find the above any more
                          comprehensible than:[color=blue]
                          > result = izip_longest(*[it]*group_size, pad='')[/color]
                          once a little thought is given to the *[it]*group_size,
                          part. I see much more opaque code everytime
                          I look at source code in the standard library.
                          [color=blue]
                          > The generator version is plain, simple, boring, and uninspirational .
                          > But it took only seconds to write and did not require a knowledge of
                          > advanced itertool combinations.[/color]

                          "advanced itertool combinations"?? Even I, newbie
                          that I am, found the concepts of repeat() and chain()
                          pretty straight forward. Of course having to
                          understand/use 3 itertools tools is more difficult
                          than understanding one (izip_longest). Better
                          documentation could mitigate that a lot.
                          But the solution using "advanced itertool combinations"
                          was your's, avoided altogether with an izip_long().

                          Also this same argument (uses of x can be easily
                          coded without x by using a generator) is equally
                          applicable to itertools.izip( ) itself, yes?
                          [color=blue]
                          > It more easily explained than the versions with zip tricks.[/color]

                          Calling this a "trick" is unfair. The (current pre-2.5)
                          documentation still mentions no requirement that
                          izip() arguments be independent (despite the fact
                          that this issue was discussed here a couple months
                          ago as I remember. If I remember it was not clear if
                          that should be a requirement or not, since it would
                          prevent any use of the same iterable more than
                          once in izip's arg list, it has not been documented
                          for 3(?) Python versions, and clearly people are
                          using the current behavior.

                          Comment

                          • Cappy2112

                            #14
                            Re: Real-world use cases for map's None fill-in feature?


                            I haven't used itertools yet, so I don't know their capabilities.

                            I have used map twice recently with None as the first argument. This
                            was also the first time I've used map, and was dissapointed when I
                            found out about the truncation. The lists map was iterating over in my
                            case were of unequal lengths, so I had to pad the lists to make sure
                            nothing was truncated.

                            The most universal solution would be to provide a mechanism to
                            truncate, pad, or remain the same length. However, with the pad
                            feature, room should be provided for the user to add the pad item.

                            Comment

                            • Peter Otten

                              #15
                              Re: Real-world use cases for map's None fill-in feature?

                              Raymond Hettinger wrote:
                              [color=blue]
                              > Alternately, the need can be met with existing tools by pre-padding the
                              > iterator with enough extra values to fill any holes:
                              >
                              > it = chain(iterable, repeat('', group_size-1))
                              > result = izip_longest(*[it]*group_size)
                              >
                              > Both approaches require a certain meaure of inventiveness, rely on
                              > advacned tricks, and forgo readability to gain the raw speed and
                              > conciseness afforded by a clever use of itertools. They are also a
                              > challenge to review, test, modify, read, or explain to others.[/color]

                              Is this the author of itertools becoming its most articulate opponent? What
                              use is this collection of small functions sharing an underlying concept if
                              you are not supposed to combine them to your heart's content? You probably
                              cannot pull off some of those tricks until you have good working knowledge
                              of the iterator protocol, but that is becoming increasingly important to
                              understand all Python code.
                              [color=blue]
                              > In contrast, a simple generator is trivially easy to create and read,
                              > albiet less concise and not as speedy:
                              >
                              > it = iter(iterable)
                              > while 1:
                              > row = tuple(islice(it , group_size))
                              > if len(row) == group_size:
                              > yield row
                              > else:[/color]
                              if row:
                              yield row + ('',) * (group_size - len(row))[color=blue]
                              > break
                              >
                              > The generator version is plain, simple, boring, and uninspirational .[/color]

                              I Can't argue with that :-) But nobody spotted the bug within a day; so
                              dumbing down the code didn't pay off. Furthermore, simple code like above
                              is often inlined and therefore harder to test and an impediment to
                              modification. Once you put the logic into a separate function/generator it
                              doesn't really matter which version you use. You can't get the
                              chain/repeat/izip variant to meet your (changing) requirements? Throw it
                              away and just keep the (modified) test suite.

                              A newbie, by the way, would have /written/ neither. The it = iter(iterable)
                              voodoo isn't obvious and the barrier to switch from lst[:group_size] to
                              islice(it, group_size) to /improve/ one's is code high. I expect to see an
                              inlined list-based solution. The two versions are both part of a learning
                              experience and both worth the effort.

                              Regarding the thread's topic, I have no use cases for a map(None, ...)-like
                              izip_longest(), but occasionally I would prefer izip() to throw a
                              ValueError if its iterable arguments do not have the same "length".

                              Peter

                              Comment

                              Working...