file tell in a for-loop

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Magdoll

    file tell in a for-loop

    I was trying to map various locations in a file to a dictionary. At
    first I read through the file using a for-loop, but tell() gave back
    weird results, so I switched to while, then it worked.

    The for-loop version was something like:
    d = {}
    for line in f:
    if line.startswith ('>'): d[line] = f.tell()

    And the while version was:
    d = {}
    while 1:
    line = f.readline()
    if len(line) == 0: break
    if line.startswith ('>'): d[line] = f.tell()


    In the for-loop version, f.tell() would sometimes return the same
    result multiple times consecutively, even though the for-loop
    apparently progressed the file descriptor. I don't have a clue why
    this happened, but I switched to while loop and then it worked.

    Does anyone have any ideas as to why this is so?

    Thanks,
    Magdoll
  • Justin Ezequiel

    #2
    Re: file tell in a for-loop

    On Nov 19, 7:00 am, Magdoll <magd...@gmail. comwrote:
    I was trying to map various locations in a file to a dictionary. At
    first I read through the file using a for-loop, but tell() gave back
    weird results, so I switched to while, then it worked.
    >
    The for-loop version was something like:
                    d = {}
                    for line in f:
                             if line.startswith ('>'): d[line] = f.tell()
    >
    And the while version was:
                    d = {}
                    while 1:
                            line = f.readline()
                            if len(line) == 0: break
                            if line.startswith ('>'): d[line] = f.tell()
    >
    In the for-loop version, f.tell() would sometimes return the same
    result multiple times consecutively, even though the for-loop
    apparently progressed the file descriptor. I don't have a clue why
    this happened, but I switched to while loop and then it worked.
    >
    Does anyone have any ideas as to why this is so?
    >
    Thanks,
    Magdoll
    got bitten by that too a while back
    the for line in f reads ahead so your f.tell would not be the position
    of the end of the line
    had to use a while True loop instead also


    Comment

    • Tim Chase

      #3
      Re: file tell in a for-loop

      Magdoll wrote:
      I was trying to map various locations in a file to a dictionary. At
      first I read through the file using a for-loop, but tell() gave back
      weird results, so I switched to while, then it worked.
      >
      The for-loop version was something like:
      d = {}
      for line in f:
      if line.startswith ('>'): d[line] = f.tell()
      >
      And the while version was:
      d = {}
      while 1:
      line = f.readline()
      if len(line) == 0: break
      if line.startswith ('>'): d[line] = f.tell()
      >
      >
      In the for-loop version, f.tell() would sometimes return the same
      result multiple times consecutively, even though the for-loop
      apparently progressed the file descriptor. I don't have a clue why
      this happened, but I switched to while loop and then it worked.
      >
      Does anyone have any ideas as to why this is so?
      I suspect that at least the iterator version uses internal
      buffering, so the tell() call returns the current buffer
      read-location, not the current read location. I've also had
      problems with tell() returning bogus results while reading
      through large non-binary files (in this case about a 530 meg
      text-file) once the file-offset passed some point I wasn't able
      to identify. It may have to do with newline translation as this
      was python2.4 on Win32. Switching to "b"inary mode resolved the
      issue for me.

      I created the following generator to make my life a little easier:

      def offset_iter(fp) :
      assert 'b' in fp.mode.lower() , \
      "offset_ite r must have a binary file"
      while True:
      addr = fp.tell()
      line = fp.readline()
      if not line: break
      yield (addr, line.rstrip('\n \r'))

      That way, I can just use

      f = file('foo.txt', 'b')
      for offset, line in offset_iter(f):
      if line.startswith ('>'): d[line] = offset

      This bookmarks the *beginning* (I think your code notes the
      *end*) of each line that starts with ">"

      -tkc





      Comment

      • Magdoll

        #4
        Re: file tell in a for-loop

        Gotcha. Thanks!

        Magdoll

        On Nov 19, 2:57 am, Tim Chase <python.l...@ti m.thechases.com wrote:
        Magdoll wrote:
        I was trying to map various locations in a file to a dictionary. At
        first I read through the file using a for-loop, buttell() gave back
        weird results, so I switched to while, then it worked.
        >
        The for-loop version was something like:
                        d = {}
                        for line in f:
                                 if line.startswith ('>'): d[line] = f.tell()
        >
        And the while version was:
                        d = {}
                        while 1:
                                line = f.readline()
                                if len(line) == 0: break
                                if line.startswith ('>'): d[line] = f.tell()
        >
        In the for-loop version, f.tell() would sometimes return the same
        result multiple times consecutively, even though the for-loop
        apparently progressed the file descriptor. I don't have a clue why
        this happened, but I switched to while loop and then it worked.
        >
        Does anyone have any ideas as to why this is so?
        >
        I suspect that at least the iterator version uses internal
        buffering, so thetell() call returns the current buffer
        read-location, not the current read location.  I've also had
        problems withtell() returning bogus results while reading
        through large non-binary files (in this case about a 530 meg
        text-file) once the file-offset passed some point I wasn't able
        to identify.  It may have to do with newline translation as this
        was python2.4 on Win32.  Switching to "b"inary mode resolved the
        issue for me.
        >
        I created the following generator to make my life a little easier:
        >
           def offset_iter(fp) :
             assert 'b' in fp.mode.lower() , \
               "offset_ite r must have a binary file"
             while True:
               addr = fp.tell()
               line = fp.readline()
               if not line: break
               yield (addr, line.rstrip('\n \r'))
        >
        That way, I can just use
        >
           f = file('foo.txt', 'b')
           for offset, line in offset_iter(f):
             if line.startswith ('>'): d[line] = offset
        >
        This bookmarks the *beginning* (I think your code notes the
        *end*) of each line that starts with ">"
        >
        -tkc

        Comment

        Working...