Iteration on file reading

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paul Watson

    Iteration on file reading

    for line in sys.stdin:

    Does this statement cause all of stdin to be read before the loop begins?

    I may need to read several GB and I do not want to swamp the machine's
    memory.


  • Andrew Dalke

    #2
    Re: Iteration on file reading

    Paul Watson[color=blue]
    > for line in sys.stdin:
    >
    > Does this statement cause all of stdin to be read before the loop begins?[/color]

    No. It will read a block of text at a time and break that block
    into lines. This gives great performance and is scalable to
    large files (so long as you can can afford to keep that extra
    block around). However, it's lousy for interactive work.

    Andrew
    dalke@dalkescie ntific.com


    Comment

    • Paul McGuire

      #3
      Re: Iteration on file reading

      Try a generator. This will just read a line at a time.
      -- Paul

      <code>
      from sys import stdin

      def lineReader( strm ):
      while 1:
      yield strm.readline() .rstrip("\n")

      for f in lineReader( stdin ):
      print ">>> " + f
      </code>

      "Paul Watson" <pwatson@redlin ec.com> wrote in message
      news:3f7ca9ea$1 _1@themost.net. ..[color=blue]
      > for line in sys.stdin:
      >
      > Does this statement cause all of stdin to be read before the loop begins?
      >
      > I may need to read several GB and I do not want to swamp the machine's
      > memory.
      >
      >[/color]


      Comment

      • Andrew Dalke

        #4
        Re: Iteration on file reading

        Paul McGuire:[color=blue]
        > def lineReader( strm ):
        > while 1:
        > yield strm.readline() .rstrip("\n")
        >
        > for f in lineReader( stdin ):
        > print ">>> " + f[/color]

        You can simplify that with the iter builtin.

        for f in iter(stdin.read line, ""):
        print ">>> " + f

        (Hmm... maybe I should test it? Naaaaahhh.)

        Andrew
        dalke@dalkescie ntific.com


        Comment

        • Just

          #5
          Re: Iteration on file reading

          In article <3f7ca9ea$1_1@t hemost.net>,
          "Paul Watson" <pwatson@redlin ec.com> wrote:
          [color=blue]
          > for line in sys.stdin:
          >
          > Does this statement cause all of stdin to be read before the loop begins?[/color]

          Nope.

          Just

          Comment

          • Alex Martelli

            #6
            Re: Iteration on file reading

            Andrew Dalke wrote:
            [color=blue]
            > Paul McGuire:[color=green]
            >> def lineReader( strm ):
            >> while 1:
            >> yield strm.readline() .rstrip("\n")
            >>
            >> for f in lineReader( stdin ):
            >> print ">>> " + f[/color]
            >
            > You can simplify that with the iter builtin.
            >
            > for f in iter(stdin.read line, ""):
            > print ">>> " + f
            >
            > (Hmm... maybe I should test it? Naaaaahhh.)[/color]

            There is a difference in behavior: the readline method
            returns a line WITH a trailing \n, which then gets
            printed, giving a "double-spaced" effect. Sure, you
            can strip the \n in the loop body, but if you always
            want a sequence of newline-stipped lines, that is
            somewhat repetitious. If the use of readline is
            mandated (i.e., no direct looping on the file for one
            reason or another), my favourite way of expression is:

            def linesof(somefil e):
            for line in iter(somefile.r eadline, ''):
            yield line.rstrip('\n ')

            not as concise as either of the above, but, I think,
            a wee little bit clearer.


            Alex



            Comment

            • Jeremy Fincher

              #7
              Re: Iteration on file reading

              "Paul Watson" <pwatson@redlin ec.com> wrote in message news:<3f7ca9ea$ 1_1@themost.net >...[color=blue]
              > for line in sys.stdin:
              >
              > Does this statement cause all of stdin to be read before the loop begins?
              >
              > I may need to read several GB and I do not want to swamp the machine's
              > memory.[/color]

              Have you considered simply inputting this into an interactive
              interpreter and seeing if it swamps the machine's memory?

              Jeremy

              Comment

              • Andrew Dalke

                #8
                Re: Iteration on file reading

                Alex:[color=blue]
                > There is a difference in behavior: the readline method
                > returns a line WITH a trailing \n, which then gets
                > printed, giving a "double-spaced" effect. Sure, you
                > can strip the \n in the loop body, ....[/color]

                Quite true.

                As it turns out, the OP wanted to know about

                for line in sys.stdin:

                The post to which I replied changed the spec to
                remove the newline, but the main point was to
                use a generator ... which could if desired to extra
                work to get rid of the "\n". It could just have
                easily converted everything to uppercase or done
                rot13 conversion on the text.

                My reply meant to point out that the iter builtin
                can be used to turn a "function returns the next
                object each time it's called and a sentinel when
                it's done" into an iterable. I just left out the extra
                work his code did since it wasn't needed by the OP.

                Andrew
                dalke@dalkescie ntific.com


                Comment

                Working...