Python 3000, zip, *args and iterators

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steven Bethard

    Python 3000, zip, *args and iterators

    So, as I understand it, in Python 3000, zip will basically be replaced
    with izip, meaning that instead of returning a list, it will return an
    iterator. This is great for situations like:

    zip(*[iter1, iter2, iter3])

    where I want to receive tuples of (item1, item2, item3) from the
    iterables. But it doesn't work well for a situation like:

    zip(*tuple_iter )

    where tuple_iter is an iterator to tuples of the form
    (item1, item2, item3) and I want to receive three iterators, one to the
    item1s, one to the item2s and one to the item3s. I don't think this
    is too unreasonable of a desire as the current zip, in a situation like:

    zip(*tuple_list )

    where tuple_list is a list of tuples of the form (item1, item2, item3),
    returns a list of three tuples, one of the item1s, one of the item2s and
    one of the item3s.

    Of course, the reason this doesn't work currently is that the fn(*itr)
    notation converts 'itr' into a tuple, exhausting the iterator:
    [color=blue][color=green][color=darkred]
    >>> def g(x):[/color][/color][/color]
    .... for i in xrange(x):
    .... yield (i, i+1, i+2)
    .... print "exhausted"
    ....[color=blue][color=green][color=darkred]
    >>> zip(*g(4))[/color][/color][/color]
    exhausted
    [(0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5)][color=blue][color=green][color=darkred]
    >>> it.izip(*g(4))[/color][/color][/color]
    exhausted
    <itertools.iz ip object at 0x01157710>[color=blue][color=green][color=darkred]
    >>> x, y, z = it.izip(*g(4))[/color][/color][/color]
    exhausted[color=blue][color=green][color=darkred]
    >>> x, y, z[/color][/color][/color]
    ((0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5))

    What I would prefer is something like:
    [color=blue][color=green][color=darkred]
    >>> zip(*g(4))[/color][/color][/color]
    <iterator object at ...>[color=blue][color=green][color=darkred]
    >>> x, y, z = zip(*g(4))
    >>> x, y, z[/color][/color][/color]
    (<iterator object at ...>, <iterator object at ..., <iterator object at ...)

    Of course, I can write a separate function that will do what I want
    here[1] -- my question is if Python's builtin zip will support this in
    Python 3000. It's certainly not a trivial change -- it requires some
    pretty substantially backwards incompatible changes in how *args is
    parsed for a function call -- namely that fn(*itr) only extracts as many
    of the items in the iterable as necessary, e.g.
    [color=blue][color=green][color=darkred]
    >>> def h(x, y, *args):[/color][/color][/color]
    .... print x, y, args
    .... print list(it.islice( args, 4))
    ....[color=blue][color=green][color=darkred]
    >>> h(*it.count())[/color][/color][/color]
    0 1 count(2)
    [2, 3, 4, 5]

    So I guess my real question is, should I expect Python 3000 to play
    nicely with *args and iterators? Are there reasons (besides backwards
    incompatibility ) that parsing *args this way would be bad?


    Steve


    [1] In fact, with the help of the folks from this list, I did:

  • Terry Reedy

    #2
    Re: Python 3000, zip, *args and iterators


    "Steven Bethard" <steven.bethard @gmail.com> wrote in message
    news:3yGzd.2892 58$HA.962@attbi _s01...[color=blue]
    > So, as I understand it, in Python 3000, zip will basically be replaced
    > with izip, meaning that instead of returning a list, it will return an
    > iterator.[/color]

    I think it worth repeating that Python 3 is at yet something of a
    pipedream, as indicated by the joke name Python 3000 (that also being in
    part a satire on Windows 2000, and the like). So, while Guido has said he
    would like to make Python iterator-oriented in the way that it used to be
    list-oriented, nothing is set in stone, certainly not the details.

    Guido has also said that he would like there to be funding to pay him to
    spend a year on its development. He wants to take that long so there will
    be adequate discussion, thought, and testing so he can 'get it right' as
    least in the sense of having everything work well together.

    Terry J. Reedy



    Comment

    • Steven Bethard

      #3
      Re: Python 3000, zip, *args and iterators

      Terry Reedy wrote:[color=blue]
      > "Steven Bethard" <steven.bethard @gmail.com> wrote in message
      > news:3yGzd.2892 58$HA.962@attbi _s01...
      >[color=green]
      >>So, as I understand it, in Python 3000, zip will basically be replaced
      >>with izip, meaning that instead of returning a list, it will return an
      >>iterator.[/color]
      >
      > I think it worth repeating that Python 3 is at yet something of a
      > pipedream, as indicated by the joke name Python 3000 (that also being in
      > part a satire on Windows 2000, and the like).[/color]

      True, true. And worth repeating.
      [color=blue]
      > So, while Guido has said he
      > would like to make Python iterator-oriented in the way that it used to be
      > list-oriented, nothing is set in stone, certainly not the details.[/color]

      Right, though my understanding of PEP 3000[1] is that though "Python
      3000" may never exist, the PEP is there as a road-map of where Python
      as a language would like to go. I guess the point of my question is to
      find out if this kind of nice interaction of *args and iterators is
      something that's in the road-map. If it is, then maybe there are parts
      of it that could be implemented in a way that's backwards compatible,
      even if the full system wouldn't be available for some time. (Perhaps
      something along the lines of "from __future__ import iter_args".)

      Steve

      [1] http://www.python.org/peps/pep-3000.html

      Comment

      • Terry Reedy

        #4
        Re: Python 3000, zip, *args and iterators


        "Steven Bethard" <steven.bethard @gmail.com> wrote in message
        news:O5Jzd.5664 95$wV.471519@at tbi_s54...[color=blue]
        > Terry Reedy wrote:[color=green]
        >> I think it worth repeating that Python 3 is at yet something of a
        >> pipedream, as indicated by the joke name Python 3000[/color][/color]
        [color=blue]
        > Right, though my understanding of PEP 3000[1] is that though "Python
        > 3000" may never exist, the PEP is there as a road-map of where Python as
        > a language would like to go.[/color]

        A major backwards compatibility break will not happen without a major
        number change to Py3. And I expect it to happen -- the 'as yet' was
        intentional. In fact, here is my New Year's prediction (with subjective
        certainty > .5):

        a. The PyPy project will succeed.
        b. Python3 (actually, the reference implementation thereof) will be written
        in Python3 (perhaps with 'draft' in Py2).
        c. We will see it within 5 years.

        We will see if I am any better than the tabloid 'psychics'.
        [color=blue]
        >I guess the point of my question is to find out if this kind of nice
        >interaction of *args and iterators is something that's in the road-map.
        >If it is, then maybe there are parts of it that could be implemented in a
        >way that's backwards compatible, even if the full system wouldn't be
        >available for some time. (Perhaps something along the lines of "from
        >__future__ import iter_args".)[/color]

        You can certainly share your concerns with the PEP author. I believe that
        there is also a PyWiki page that you can directly add to.

        Terry J. Reedy



        Comment

        • Steven Bethard

          #5
          Re: Python 3000, zip, *args and iterators

          Terry Reedy wrote:[color=blue]
          > "Steven Bethard" <steven.bethard @gmail.com> wrote in message
          > news:O5Jzd.5664 95$wV.471519@at tbi_s54...
          >[color=green]
          >>I guess the point of my question is to find out if this kind of nice
          >>interaction of *args and iterators is something that's in the road-map.
          >>If it is, then maybe there are parts of it that could be implemented in a
          >>way that's backwards compatible, even if the full system wouldn't be
          >>available for some time. (Perhaps something along the lines of "from
          >>__future__ import iter_args".)[/color]
          >
          > You can certainly share your concerns with the PEP author. I believe that
          > there is also a PyWiki page that you can directly add to.[/color]

          Yeah, I found the wiki page too[1]. Does anyone know if it's okay to
          add things to this page? I had avoided doing so since it gives as its
          description "This page lists features that GvR has mentioned as goals
          for Python 3.0" which sounds like it's not intended for commentary by
          the general Python community.

          Maybe I should start a Python3.0Wishli st page?

          Steve

          [1]http://www.python.org/moin/Python3_2e0

          P.S. I thought about posting to python-dev where GvR might hear directly
          about this kind of thing, but it seems a little premature since most
          predictions put Python 3.0 at least 3-5 years from now.

          Comment

          • Raymond Hettinger

            #6
            Re: Python 3000, zip, *args and iterators

            [Steven Bethard][color=blue]
            > What I would prefer is something like:
            >[color=green][color=darkred]
            > >>> zip(*g(4))[/color][/color]
            > <iterator object at ...>[color=green][color=darkred]
            > >>> x, y, z = zip(*g(4))
            > >>> x, y, z[/color][/color]
            > (<iterator object at ...>, <iterator object at ..., <iterator object[/color]
            at ...)
            .. . .[color=blue]
            > So I guess my real question is, should I expect Python 3000 to play
            > nicely with *args and iterators? Are there reasons (besides[/color]
            backwards[color=blue]
            > incompatibility ) that parsing *args this way would be bad?[/color]
            .. . .[color=blue]
            > In fact, with the help of the folks from this list, I did:
            > http://aspn.activestate.com/ASPN/Coo.../Recipe/302325[/color]

            * The answer to the first question is Yes. The point of Python 3000 is
            building on what was learned and writing a simpler, cleaner language
            without the encumbrance of backwards compatibility.

            * However, IMHO, the proposed behavior doesn't qualify as "playing
            nicely".

            * Your excellent recipe provides a good basis for discussion and it
            highlights some of the issues around the proposed behavior:

            1: The current implementation' s behavior is easy to learn, easy to
            explain, and does what most folks expect (not folks who are pushing the
            iterator and *arg protocols to the outer limits). In contrast, the
            proposed recipe is somewhat complex and its implications are not
            immediately obvious. The itertools.tee() component is of extra concern
            because it invisibly introduces memory intensive characteristics into
            an otherwise lightweight, low-overhead function.

            2. It is instructive to look at Guido's reactions to other *args
            proposals. His receptivity to a,b,*c=it wanes whenever someone then
            requests support for a,*b,c=it. Likewise, he considers zip(*args) as a
            transpose function to be an abuse of the *arg protocol. IOW,
            supporting "odd" usages does not bode well for a proposal.

            3. The recipe discussion and newsgroup posting present only toy
            examples -- real use cases have not yet emerged. If some do emerge, I
            suspect that each problem will have a better solution (using existing
            tools) than the one being proposed. If so, then adopting the proposal
            will have the negative effect of leading folks away from the correct
            solution.


            Raymond Hettinger


            "Not everything that can be done, should be done."

            Comment

            • Alex Martelli

              #7
              Re: Python 3000, zip, *args and iterators

              Raymond Hettinger <python@rcn.com > wrote:
              ...[color=blue]
              > "Not everything that can be done, should be done."[/color]

              Or, to quote Scripture...:

              "'Everythin g is permissible for me' -- but not everything is beneficial"
              (1 Cor 6:12)...


              Alex

              Comment

              • Steve Holden

                #8
                Re: Python 3000, zip, *args and iterators

                Raymond Hettinger wrote:
                [...][color=blue]
                >
                > "Not everything that can be done, should be done."
                >[/color]

                .... and not everything that should be done, can be done.

                regards
                Steve
                --
                Steve Holden http://www.holdenweb.com/
                Python Web Programming http://pydish.holdenweb.com/
                Holden Web LLC +1 703 861 4237 +1 800 494 3119

                Comment

                • Steven Bethard

                  #9
                  Re: Python 3000, zip, *args and iterators

                  Raymond Hettinger wrote:[color=blue]
                  > [Steven Bethard]
                  >[color=green]
                  >>What I would prefer is something like:
                  >>[color=darkred]
                  >> >>> zip(*g(4))[/color]
                  >><iterator object at ...>[color=darkred]
                  >> >>> x, y, z = zip(*g(4))
                  >> >>> x, y, z[/color]
                  >>(<iterator object at ...>, <iterator object at ..., <iterator object[/color]
                  > at ...)
                  >
                  > 2. It is instructive to look at Guido's reactions to other *args
                  > proposals. His receptivity to a,b,*c=it wanes whenever someone then
                  > requests support for a,*b,c=it.[/color]

                  Yeah, I've seen his responses to those kind of suggestions. I don't
                  think what I'm suggesting (at least in terms of *args) is quite as
                  extreme though -- I'm still only talking about *args in function
                  definitions. I'm just suggesting that in a function with a *args in the
                  def, the args variable be an iterator instead of a tuple. (This doesn't
                  entirely solve my zip problem of course, but it's the only *args change
                  I was suggesting.)
                  [color=blue]
                  > Likewise, he considers zip(*args) as a
                  > transpose function to be an abuse of the *arg protocol.[/color]

                  Ahh, I didn't know that. Is there another (preferred) way to do this?
                  [color=blue]
                  > 3. The recipe discussion and newsgroup posting present only toy
                  > examples -- real use cases have not yet emerged.[/color]

                  Ok, I'll try to give you one of my use cases. It's a little
                  complicated, so sorry if my explanation goes on for a bit here.

                  Basically, I'm parsing one file format to another. The files can be
                  quite large, so it's important to use iterators wherever possible. My
                  conversion function is a generator that generates a (label,
                  feature_dict) pair for each line in the input file.

                  Now, two possible things can happen at this point (depending on
                  parameters from the user):

                  CASE 1: I output the (label, feature_dict) pairs as is, with code
                  something like:

                  for label, feature_dict in generator:
                  write_instance( label, feature_dict)

                  This is, of course, the simple case.

                  CASE 2: I need to apply a windowing function to the iterables so that
                  each line includes not only its feature_dict's values, but also the
                  values of some of the surrounding feature_dicts. Note that I only want
                  to window the feature_dicts, not the labels. This gives me code
                  something like:

                  labels, feature_dicts = starzip(generat or)
                  for label, feature_window in izip(labels, window(feature_ dicts)):
                  write_instance( label, combine_dicts(f eature_widow))

                  Note that I can't write the code like:

                  for label, feature_dict in generator:
                  feature_dict = combine_dicts(w indow(feature_d ict)) # WRONG!
                  write_instance( label, feature_dict)

                  because window produces an iterable from an *iterable* of feature_dicts,
                  not from a single feature_dict. So basically what I've done here is to
                  "transpose" (to use your word) the iterators, apply my function, and
                  then transpose the iterators back.


                  Hopefully this gives a little better justification for starzip? If you
                  have a cleaner way to do this kind of thing, I'd welcome any suggestions
                  of course.


                  If zip(*) is discouraged as a transpose function, maybe I should be
                  lobbying for adding a transpose function instead? (For now, of course,
                  it would go into itertools, but when iterators become the standard in
                  Python 3.0, maybe it could be moved into the builtins...)


                  Thanks for your comments!

                  Steve

                  Comment

                  • Raymond Hettinger

                    #10
                    Re: Python 3000, zip, *args and iterators

                    [Steven Bethard] I'm just suggesting that in a function with a[color=blue]
                    > *args in the def, the args variable be an iterator instead of
                    > a tuple.[/color]

                    So people would lose the useful abilities to check len(args) or extract
                    an argument with args[1]?

                    Besides, if a function really wants an iterator, then its signature
                    should accept one directly -- no need for the star operator.


                    [color=blue][color=green]
                    > > Likewise, he considers zip(*args) as a
                    > > transpose function to be an abuse of the *arg protocol.[/color]
                    >
                    > Ahh, I didn't know that. Is there another (preferred) way to do[/color]
                    this?

                    I prefer the abusive approach ;-) however, the Right Way (tm) is
                    probably nested list comps or just plain for-loops. And, if you have
                    numeric, there is an obvious preferred approach.


                    [color=blue]
                    > So basically what I've done here is to
                    > "transpose" (to use your word) the iterators, apply my function, and
                    > then transpose the iterators back.[/color]

                    If you follow the data movements, you'll find that iterators provide no
                    advantage here. To execute transpose(map(f , transpose(itera tor)), the
                    whole iterator necessarily has to be read into memory so that the first
                    function application will have all of its arguments present -- using
                    the star operator only obscures that fact.

                    Realizing that the input has to be in memory anyway, then you might as
                    well take advantage of the code simplication offered by indexing:
                    [color=blue][color=green][color=darkred]
                    >>> def twistedmap(f, iterable):[/color][/color][/color]
                    .... data = list(iterable)
                    .... rows = range(len(data) )
                    .... for col in xrange(len(data[0])):
                    .... args = [data[row][col] for rows in rows]
                    .... yield f(*args)



                    Raymond Hettinger

                    Comment

                    • Steven Bethard

                      #11
                      Re: Python 3000, zip, *args and iterators

                      Raymond Hettinger wrote:[color=blue]
                      > [Steven Bethard] I'm just suggesting that in a function with a
                      >[color=green]
                      >>*args in the def, the args variable be an iterator instead of
                      >>a tuple.[/color]
                      >
                      >
                      > So people would lose the useful abilities to check len(args) or extract
                      > an argument with args[1]?[/color]

                      No more than you lose these abilities with any other iterators:

                      def f(x, y, *args):
                      args = list(args) # or tuple(args)
                      if len(args) == 3:
                      print args[0], args[1], args[2]

                      True, if you do want to check argument counts, this is an extra step of
                      work. I personally find that most of my functions with *args parameters
                      look like:

                      def f(x, y, *args):
                      do_something1(x )
                      do_something2(y )
                      for arg in args:
                      do_something3(a rg)

                      where having *args be an iterable would not be a problem.
                      [color=blue][color=green]
                      >> So basically what I've done here is to
                      >>"transpose" (to use your word) the iterators, apply my function, and
                      >>then transpose the iterators back.[/color]
                      >
                      > If you follow the data movements, you'll find that iterators provide no
                      > advantage here. To execute transpose(map(f , transpose(itera tor)), the
                      > whole iterator necessarily has to be read into memory so that the first
                      > function application will have all of its arguments present -- using
                      > the star operator only obscures that fact.[/color]

                      I'm not sure I follow you here. Looking at my code:

                      labels, feature_dicts = starzip(generat or)
                      for label, feature_window in izip(labels, window(feature_ dicts)):
                      write_instance( label, combine_dicts(f eature_widow))

                      A few points:

                      (1) starzip uses itertools.tee, so it is not going to read the entire
                      contents of the generator in at once as long as the two parallel
                      iterators do not run out of sync

                      (2) window does not exhaust the iterator passed to it; instead, it uses
                      the items of that iterator to generate a new iterator in sync with the
                      original, so izip(labels, window(feature_ dicts)) will keep the labels
                      and feature_dicts iterators in sync.

                      (3) the for loop just iterates over the izip iterator, so it should be
                      consuming (label, feature_window) pairs in sync.

                      I assume you disagree with one of these points or you wouldn't say that
                      "iterators provide no advantage here". Could you explain what doesn't
                      work here?

                      Steve

                      Comment

                      Working...