Beginner Question : Iterators and zip

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • moogyd@yahoo.co.uk

    Beginner Question : Iterators and zip

    Hi group,

    I have a basic question on the zip built in function.

    I am writing a simple text file comparison script, that compares line
    by line and character by character. The output is the original file,
    with an X in place of any characters that are different.

    I have managed a solution for a fixed (3) number of files, but I want
    a solution of any number of input files.

    The outline of my solution:

    for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
    res = ''
    for entry in zip(vec[0],vec[1],vec[2]):
    if len(set(entry)) 1:
    res = res+'X'
    else:
    res = res+entry[0]
    outfile.write(r es)

    So vec is a tuple containing a line from each file, and then entry is
    a tuple containg a character from each line.

    2 questions
    1) What is the general solution. Using zip in this way looks wrong. Is
    there another function that does what I want
    2) I am using set to remove any repeated characters. Is there a
    "better" way ?

    Any other comments/suggestions appreciated.

    Thanks,

    Steven





  • Larry Bates

    #2
    Re: Beginner Question : Iterators and zip

    moogyd@yahoo.co .uk wrote:
    Hi group,
    >
    I have a basic question on the zip built in function.
    >
    I am writing a simple text file comparison script, that compares line
    by line and character by character. The output is the original file,
    with an X in place of any characters that are different.
    >
    I have managed a solution for a fixed (3) number of files, but I want
    a solution of any number of input files.
    >
    The outline of my solution:
    >
    for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
    res = ''
    for entry in zip(vec[0],vec[1],vec[2]):
    if len(set(entry)) 1:
    res = res+'X'
    else:
    res = res+entry[0]
    outfile.write(r es)
    >
    So vec is a tuple containing a line from each file, and then entry is
    a tuple containg a character from each line.
    >
    2 questions
    1) What is the general solution. Using zip in this way looks wrong. Is
    there another function that does what I want
    2) I am using set to remove any repeated characters. Is there a
    "better" way ?
    >
    Any other comments/suggestions appreciated.
    >
    Thanks,
    >
    Steven
    >
    >
    >
    >
    >
    You should take a look at Python's difflib library. I probably already does
    what you are attempting to "re-invent".

    -Larry

    Comment

    • bruno.desthuilliers@gmail.com

      #3
      Re: Beginner Question : Iterators and zip

      On 12 juil, 20:55, moo...@yahoo.co .uk wrote:
      Hi group,
      >
      I have a basic question on the zip built in function.
      >
      I am writing a simple text file comparison script, that compares line
      by line and character by character. The output is the original file,
      with an X in place of any characters that are different.
      >
      I have managed a solution for a fixed (3) number of files, but I want
      a solution of any number of input files.
      >
      The outline of my solution:
      >
      for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
      res = ''
      for entry in zip(vec[0],vec[1],vec[2]):
      if len(set(entry)) 1:
      res = res+'X'
      else:
      res = res+entry[0]
      outfile.write(r es)
      >
      So vec is a tuple containing a line from each file, and then entry is
      a tuple containg a character from each line.
      >
      2 questions
      1) What is the general solution. Using zip in this way looks wrong. Is
      there another function that does what I want
      zip is (mostly) ok. What you're missing is how to use it for any
      arbitrary number of sequences. Try this instead:
      >>lists = [range(5), range(5,11), range(11, 16)]
      >>lists
      [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
      >>for item in zip(*lists):
      .... print item
      ....
      (0, 5, 11)
      (1, 6, 12)
      (2, 7, 13)
      (3, 8, 14)
      (4, 9, 15)
      >>lists = [range(5), range(5,11), range(11, 16), range(16, 20)]
      >>for item in zip(*lists):
      .... print item
      ....
      (0, 5, 11, 16)
      (1, 6, 12, 17)
      (2, 7, 13, 18)
      (3, 8, 14, 19)
      >>>
      The only caveat with zip() is that it will only use as many items as
      there are in your shorter sequence, ie:
      >>zip(range(3 ), range(10))
      [(0, 0), (1, 1), (2, 2)]
      >>zip(range(30) , range(10))
      [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8,
      8), (9, 9)]
      >>>
      So you'd better pad your sequences to make them as long as the longer
      one. There are idioms for doing this using the itertools package's
      chain and repeat iterators, but I'll leave concrete example as an
      exercice to the reader !-)
      2) I am using set to remove any repeated characters. Is there a
      "better" way ?
      That's probably what I'd do too.
      Any other comments/suggestions appreciated.
      There's a difflib package in the standard lib. Did you give it a try ?

      Comment

      • Terry Reedy

        #4
        Re: Beginner Question : Iterators and zip



        moogyd@yahoo.co .uk wrote:
        Hi group,
        >
        I have a basic question on the zip built in function.
        >
        I am writing a simple text file comparison script, that compares line
        by line and character by character. The output is the original file,
        with an X in place of any characters that are different.
        >
        I have managed a solution for a fixed (3) number of files, but I want
        a solution of any number of input files.
        >
        The outline of my solution:
        >
        for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
        res = ''
        for entry in zip(vec[0],vec[1],vec[2]):
        if len(set(entry)) 1:
        res = res+'X'
        else:
        res = res+entry[0]
        outfile.write(r es)
        >
        So vec is a tuple containing a line from each file, and then entry is
        a tuple containg a character from each line.
        >
        2 questions
        1) What is the general solution. Using zip in this way looks wrong. Is
        there another function that does what I want
        zip(*vec_list) will zip together all entries in vec_list
        Do be aware that zip stops on the shortest iterable. So if vec[1] is
        shorter than vec[0] and matches otherwise, your output line will be
        truncated. Or if vec[1] is longer and vec[0] matches as far as it goes,
        there will be no signal either.

        res=rex+whateve r can be written as res+=whatever
        2) I am using set to remove any repeated characters. Is there a
        "better" way ?
        I might have written a third loop to compare vec[0] to vec[1]..., but
        your set solution is easier and prettier.

        If speed is an issue, don't rebuild the output line char by char. Just
        change what is needed in a mutable copy. I like this better anyway.

        res = list(vec[0]) # if all ascii, in 3.0 use bytearray
        for n, entry in enumerate(zip(v ec[0],vec[1],vec[2])):
        if len(set(entry)) 1:
        res[n] = 'X'
        outfile.write(' '.join(res)) # in 3.0, write(res)

        tjr




        Comment

        • moogyd@yahoo.co.uk

          #5
          Re: Beginner Question : Iterators and zip

          On 12 Jul, 21:50, "bruno.desthuil li...@gmail.com "
          <bruno.desthuil li...@gmail.com wrote:
          On 12 juil, 20:55, moo...@yahoo.co .uk wrote:
          >
          >
          >
          zip is (mostly) ok. What you're missing is how to use it for any
          arbitrary number of sequences. Try this instead:
          >
          >lists = [range(5), range(5,11), range(11, 16)]
          >lists
          >
          [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]>>for item in zip(*lists):
          >
          ... print item
          ...
          (0, 5, 11)
          (1, 6, 12)
          (2, 7, 13)
          (3, 8, 14)
          (4, 9, 15)
          What is this *lis operation called? I am having trouble finding any
          reference to it in the python docs or the book learning python.
          Any other comments/suggestions appreciated.
          >
          There's a difflib package in the standard lib. Did you give it a try ?
          I'll check it out, but I am a newbie, so I am writing this as a
          (useful) learning excercise.

          Thanks for the help

          Steven


          Comment

          • Terry Reedy

            #6
            Re: Beginner Question : Iterators and zip

            moogyd@yahoo.co .uk wrote:
            What is this *lis operation called? I am having trouble finding any
            reference to it in the python docs or the book learning python.
            One might call this argument unpacking, but
            Language Manual / Expressions / Primaries / Calls
            simply calls it *expression syntax.
            "If the syntax *expression appears in the function call, expression must
            evaluate to a sequence. Elements from this sequence are treated as if
            they were additional positional arguments; if there are positional
            arguments x1,...,*xN* , and expression evaluates to a sequence
            y1,...,*yM*, this is equivalent to a call with M+N positional arguments
            x1,...,*xN*,*y1 *,...,*yM*."

            See Compound Statements / Function definitions for the mirror syntax in
            definitions.

            tjr

            Comment

            • cokofreedom@gmail.com

              #7
              Re: Beginner Question : Iterators and zip

              >
              zip(*vec_list) will zip together all entries in vec_list
              Do be aware that zip stops on the shortest iterable. So if vec[1] is
              shorter than vec[0] and matches otherwise, your output line will be
              truncated. Or if vec[1] is longer and vec[0] matches as far as it goes,
              there will be no signal either.
              >
              Do note that from Python 3.0 there is another form of zip that will
              read until all lists are exhausted, with the other being filled up
              with a settable default value. Very useful!

              Comment

              • moogyd@yahoo.co.uk

                #8
                Re: Beginner Question : Iterators and zip

                On 13 Jul, 19:49, Terry Reedy <tjre...@udel.e duwrote:
                moo...@yahoo.co .uk wrote:
                What is this *lis operation called? I am having trouble finding any
                reference to it in the python docs or the book learning python.
                >
                One might call this argument unpacking, but
                Language Manual / Expressions / Primaries / Calls
                simply calls it *expression syntax.
                "If the syntax *expression appears in the function call, expression must
                evaluate to a sequence. Elements from this sequence are treated as if
                they were additional positional arguments; if there are positional
                arguments x1,...,*xN* , and expression evaluates to a sequence
                y1,...,*yM*, this is equivalent to a call with M+N positional arguments
                x1,...,*xN*,*y1 *,...,*yM*."
                >
                See Compound Statements / Function definitions for the mirror syntax in
                definitions.
                >
                tjr
                Thanks,

                It's starting to make sense :-)

                Steven

                Comment

                Working...