file.tell() ?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chris McAvoy

    file.tell() ?

    Is this a bug? (file is an open text file):
    [color=blue][color=green][color=darkred]
    >>> for i in range(0,5):[/color][/color][/color]
    .... var = file.next()
    .... file.tell()
    ....
    1675L
    1675L
    1675L
    1675L
    1675L

    I would have thought that it would increase as the position of the
    file.

    When I use readline, it works as I would expect:
    [color=blue][color=green][color=darkred]
    >>> for i in range(0,5):[/color][/color][/color]
    .... var = file.readline()
    .... file.tell()
    ....
    18L
    31L
    53L
    67L
    85L

    The reason I ask is, I have a very large file to parse line by line.
    I thought I'd try and use an iterator, but it looks like the iterator
    is really reading the entire file into memory before it starts
    iterating. So my best option is still to use file.readline() .

    Am I understanding this correctly? Am I using the iterator
    incorrectly?

    Thanks,
    Chris
  • Erik Max Francis

    #2
    Re: file.tell() ?

    Chris McAvoy wrote:
    [color=blue]
    > The reason I ask is, I have a very large file to parse line by line.
    > I thought I'd try and use an iterator, but it looks like the iterator
    > is really reading the entire file into memory before it starts
    > iterating. So my best option is still to use file.readline() .
    >
    > Am I understanding this correctly? Am I using the iterator
    > incorrectly?[/color]

    The iterating methods of file input tend to buffer input, so calling
    things like .tell or additionally trying to read data manually is not
    going to work properly.

    If it's important to you that you have total control over the current
    "read pointer" in the file, call .readline manually. If you don't care
    and just want to read through everything, use the iterators.

    --
    __ Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
    / \ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
    \__/ The average dog is a nicer person than the average person.
    -- Andrew A. Rooney

    Comment

    • Jeff Epler

      #3
      Re: file.tell() ?

      When using a file as an iterator, multiple lines are read at a time.
      If you have a long file, not all the lines will be read at once.
      When I wrote xreadlines (for python 2.1 or 2.2) it was defined in terms of
      readlines(SIZEH INT), but I am no longer familiar with the implementation.
      I don't think the exact details are documented anywhere, or guaranteed
      not to change between releases.

      I wrote a small program to read all the lines in /usr/share/dict/words
      and keep a record of all the positions returned by tell(). Here are the
      results:
      [jepler@parrot jepler]$ cat /tmp/mcavoy.py
      d = {}
      f = file("/usr/share/dict/words")
      for l in f:
      d[f.tell()] = None
      dk = d.keys()
      dk.sort()
      print dk
      print dk[-1] * 1.0 / len(dk) # Average block size

      [jepler@parrot jepler]$ python /tmp/mcavoy.py
      [8196L, 16393L, 24596L, 32793L, 40994L, 49186L, 57379L, 65576L,
      73776L, 81972L, 90167L, 98362L, 106557L, 114750L, 122945L, 131137L,
      139332L, 147532L, 155729L, 163922L, 172119L, 180316L, 188515L,
      196710L, 204910L, 213105L, 221306L, 229505L, 237697L, 245896L,
      254088L, 262291L, 270486L, 278688L, 286893L, 295092L, 303288L,
      311488L, 319687L, 327884L, 336082L, 344277L, 352474L, 360675L,
      368869L, 377061L, 385261L, 393459L, 401656L, 409305L]
      8186.1

      As you can see, my Python reads about 8K at a time, which is a perfectly
      reasonable amount on any machine I still use.

      Jeff

      Comment

      Working...