Reading a text file backwards

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jay

    Reading a text file backwards

    I have a very large text file (being read by a CGI script on a web server),
    and I get memory errors when I try to read the whole file into a list of
    strings. The problem is, I want to read the file backwards, starting with
    the last line.

    Previously, I did:

    myfile = open('myfile.tx t', 'r')
    mylines = myfile.readline s()
    myfile.close()
    for line in range(len(mylin es)-1, -1, -1):
    # do something with mylines[line]

    This, however caused a "MemoryErro r," so I want to do something like

    myfile = open('myfile.tx t', 'r')
    for line in myfile:
    # do something with line
    myfile.close()

    Only, I want to iterate backwards, starting with the last line of the file.
    Can anybody suggest a simple way of doing this? Do I need to jump around
    with myfile.seek() and use myfile.readline () ?




  • Rick Holbert

    #2
    Re: Reading a text file backwards

    Jay,

    Try this:

    myfile = open('myfile.tx t', 'r')
    mylines = myfile.readline s()
    myfile.close()
    mylines.reverse ()

    Rick

    Jay wrote:
    [color=blue]
    > I have a very large text file (being read by a CGI script on a web
    > server), and I get memory errors when I try to read the whole file into a
    > list of strings. The problem is, I want to read the file backwards,
    > starting with the last line.
    >
    > Previously, I did:
    >
    > myfile = open('myfile.tx t', 'r')
    > mylines = myfile.readline s()
    > myfile.close()
    > for line in range(len(mylin es)-1, -1, -1):
    > # do something with mylines[line]
    >
    > This, however caused a "MemoryErro r," so I want to do something like
    >
    > myfile = open('myfile.tx t', 'r')
    > for line in myfile:
    > # do something with line
    > myfile.close()
    >
    > Only, I want to iterate backwards, starting with the last line of the
    > file. Can anybody suggest a simple way of doing this? Do I need to jump
    > around with myfile.seek() and use myfile.readline () ?[/color]

    Comment

    • Andrew Dalke

      #3
      Re: Reading a text file backwards

      Jay wrote:[color=blue]
      > Only, I want to iterate backwards, starting with the last line of the file.
      > Can anybody suggest a simple way of doing this? Do I need to jump around
      > with myfile.seek() and use myfile.readline () ?[/color]

      Python Cookbook has a recipe. Or two.




      I've not looked at them to judge the quality

      Another approach is to read the lines forwards and save
      the starting line position. Then iterate backwards
      through the positions, seek to it and read a line.

      def find_offsets(in file):
      offsets = []
      offset = 0
      for line in infile:
      offsets.append( offset)
      offset += len(line)
      return offsets

      def iter_backwards( infile):
      # make sure it's seekable and at the start
      infile.seek(0)
      offsets = find_offsets(in file)
      for offset in offsets[::-1]:
      infile.seek(off set)
      yield infile.readline ()

      for line in iter_backwards( open("spam.py") ):
      print repr(line)

      This won't work on MS Windows because of the
      '\r\n' -> '\n' conversion. You would instead
      need something like

      def find_offsets(in file):
      offsets = []
      while 1:
      offset = infile.tell()
      if not infile.readline ():
      break
      offsets.append( offset)
      return offsets


      Just submitted this solution to the cookbook.

      Andrew
      dalke@dalkescie ntific.com

      Comment

      • Daniel Yoo

        #4
        Re: Reading a text file backwards

        Rick Holbert <holbertr@dma.o rg> wrote:
        : Jay,

        : Try this:

        : myfile = open('myfile.tx t', 'r')
        : mylines = myfile.readline s()
        : myfile.close()
        : mylines.reverse ()


        Hi Rick,

        But this probably won't work for Jay: he's running into memory issues
        because the file's too large to hold in memory at once. The point is
        to avoid readlines().

        Here's a generator that tries to iterate backwards across a file. We
        first get the file positions of each newline, and then afterwards
        start going through the offsets.

        ###

        def backfileiter(my file):
        """Iterates the lines of a file, but in reverse order."""
        myfile.seek(0)
        offsets = _getLineOffsets (myfile)
        myfile.seek(0)
        offsets.reverse ()
        for i in offsets:
        myfile.seek(i+1 )
        yield myfile.readline ()

        def _getLineOffsets (myfile):
        """Return a list of offsets where newlines are located."""
        offsets = [-1]
        i = 0
        while True:
        byte = myfile.read(1)
        if not byte:
        break
        elif byte == '\n':
        offsets.append( i)
        i += 1
        return offsets
        ###



        For example:

        ###[color=blue][color=green][color=darkred]
        >>> from StringIO import StringIO
        >>> f = StringIO("""[/color][/color][/color]
        .... hello world
        .... this
        .... is a
        .... test""")
        [color=blue][color=green][color=darkred]
        >>> f.seek(0)
        >>> for line in backfileiter(f) : print repr(line)[/color][/color][/color]
        ....
        'test'
        'is a\n'
        'this\n'
        'hello world\n'
        '\n'
        ###


        Hope this helps!

        Comment

        • Graham  Fawcett

          #5
          Re: Reading a text file backwards

          It's just shifting the burden perhaps, but if you're on a Unix system
          you should be able to use tac(1) to reverse your file a bit faster:

          import os
          for line in os.popen('tac myfile.txt'):
          #do something with the line

          Comment

          • Andrew Dalke

            #6
            Re: Reading a text file backwards

            Graham Fawcett wrote:[color=blue]
            > It's just shifting the burden perhaps, but if you're on a Unix system
            > you should be able to use tac(1) to reverse your file a bit faster:[/color]

            Huh. Hadn't heard of that one. It's not installed
            on my OS X box. It's on my FreeBSD account as gtac.
            Ah, but it is available on a Linux account.

            Andrew
            dalke@dalkescie ntific.com

            Comment

            • Jeremy Bowers

              #7
              Re: Reading a text file backwards

              On Thu, 30 Sep 2004 17:41:14 -0700, Graham Fawcett wrote:
              [color=blue]
              > It's just shifting the burden perhaps, but if you're on a Unix system
              > you should be able to use tac(1) to reverse your file a bit faster:
              >
              > import os
              > for line in os.popen('tac myfile.txt'):
              > #do something with the line[/color]

              It probably isn't shifting the burden; they probably do it right.

              Doing it right involves reading the file in chunks backwards, and scanning
              backwards for newlines, but getting it right when lines cross boundaries,
              while perhaps not *hard*, is exactly the kind of tricky programming it is
              best to do once... preferably somebody else's once. :-)

              This way you don't read the file twice, as the first time can take a while.

              Comment

              • Paul Rubin

                #8
                Re: Reading a text file backwards

                Andrew Dalke <adalke@mindspr ing.com> writes:[color=blue][color=green]
                > > It's just shifting the burden perhaps, but if you're on a Unix system
                > > you should be able to use tac(1) to reverse your file a bit faster:[/color]
                >
                > Huh. Hadn't heard of that one. It's not installed
                > on my OS X box. It's on my FreeBSD account as gtac.
                > Ah, but it is available on a Linux account.[/color]

                You can try tail(1).

                Comment

                Working...