Having trouble with tail -f standard input

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • sab

    Having trouble with tail -f standard input

    Hello,

    I have been working on a python script to parse a continuously growing
    log file on a UNIX server. The input is the standard in, piped in
    from the log file. The application works well for the most part, but
    the problem is when attempting to continuously pipe information into
    the application via the tail -f command. The command line looks
    something like this:

    tail -f <logfile| grep <search string| python parse.py

    If I don't pipe the standard in to the python script, it displays any
    new entries immediately on the screen. However, if I pipe the
    information into the script, the sys.stdin.readl ine() doesn't get any
    new data until a buffer fills, after which it parses a block of new
    information all at once (output is fine). I need it to read the data
    in real-time instead of waiting for the buffer to fill. I have tried
    running the script with the -u parameter but that doesn't seem to be
    doing anything. Also, if I run the program against a text file and
    add a line to the text file (via cat ><text file>) it picks it up
    right away. I'm sure that it's just a simple parameter that needs to
    be passed or something along those lines but have been unable to find
    the answer. Any ideas would be appreciated.

    Thanks!
  • Diez B. Roggisch

    #2
    Re: Having trouble with tail -f standard input

    sab schrieb:
    Hello,
    >
    I have been working on a python script to parse a continuously growing
    log file on a UNIX server. The input is the standard in, piped in
    from the log file. The application works well for the most part, but
    the problem is when attempting to continuously pipe information into
    the application via the tail -f command. The command line looks
    something like this:
    >
    tail -f <logfile| grep <search string| python parse.py
    >
    If I don't pipe the standard in to the python script, it displays any
    new entries immediately on the screen. However, if I pipe the
    information into the script, the sys.stdin.readl ine() doesn't get any
    new data until a buffer fills, after which it parses a block of new
    information all at once (output is fine). I need it to read the data
    in real-time instead of waiting for the buffer to fill. I have tried
    running the script with the -u parameter but that doesn't seem to be
    doing anything. Also, if I run the program against a text file and
    add a line to the text file (via cat ><text file>) it picks it up
    right away. I'm sure that it's just a simple parameter that needs to
    be passed or something along those lines but have been unable to find
    the answer. Any ideas would be appreciated.
    Get rid of tail, it's useless here anyway and most probably causing the
    problem.

    If for whatever reason you can't get rid of it, try and see if there is
    any other way of skipping most of the input file - maybe creating *one*
    python script to seek to the end, grep & parse.

    You can't do anything in python though - the buffering and potential
    flushing is courtesy of the upper end of the pipe - not python.

    Diez

    Comment

    • Derek Martin

      #3
      Re: Having trouble with tail -f standard input

      On Thu, Aug 21, 2008 at 02:58:24PM -0700, sab wrote:
      I have been working on a python script to parse a continuously growing
      log file on a UNIX server.
      If you weren't aware, there are already a plethora of tools which do
      this... You might save yourself the trouble by just using one of
      those. Try searching for something like "parse log file" on google or
      freshmeat.net or whatever...
      The input is the standard in, piped in from the log file. The
      application works well for the most part, but the problem is when
      attempting to continuously pipe information into the application via
      the tail -f command. The command line looks something like this:
      >
      tail -f <logfile| grep <search string| python parse.py
      The pipe puts STDIN/STDOUT into "fully buffered" mode, which results
      in the behavior you're seeing. You can set the buffering mode of
      those files in your program, but unfortunately tail and grep are not
      your program... You might get this to work by setting stdin to
      non-blocking I/O in your Python program, but I don't think it will be
      that easy...

      You can get around this in a couple of ways. One is to call tail and
      grep from within your program, using something like os.popen()...
      Then set the blocking mode on the resulting files. You'll have to
      feed the output of one to the input of the other, then read the output
      of grep and parse that. Yucky. That method isn't very efficient,
      since Python can do everything that tail and grep are doing for you...
      So I'd suggest you read the file directly in your python program, and
      use Python's regex parsing functionality to do what you're doing with
      grep.

      As for how to actually do what tail does, I'd suggest looking at the
      source code for tail to see how it does what it does.

      But, if I were you, I'd just download something like swatch, and be
      done with it. :)

      --
      Derek D. Martin

      GPG Key ID: 0x81CFE75D


      -----BEGIN PGP SIGNATURE-----
      Version: GnuPG v1.2.1 (GNU/Linux)

      iD8DBQFIrfCMdjd lQoHP510RAkGKAJ 9lMoo7i7Tb/ZFFWaLaDDWwqzi/3ACfZ+LA
      0ArSlkB+1IP+Jh9 V0ulwRgw=
      =8Woo
      -----END PGP SIGNATURE-----

      Comment

      • norseman

        #4
        Re: Having trouble with tail -f standard input

        Derek Martin wrote:
        On Thu, Aug 21, 2008 at 02:58:24PM -0700, sab wrote:
        >I have been working on a python script to parse a continuously growing
        >log file on a UNIX server.
        >
        If you weren't aware, there are already a plethora of tools which do
        this... You might save yourself the trouble by just using one of
        those. Try searching for something like "parse log file" on google or
        freshmeat.net or whatever...
        >
        >The input is the standard in, piped in from the log file. The
        >application works well for the most part, but the problem is when
        >attempting to continuously pipe information into the application via
        >the tail -f command. The command line looks something like this:
        >>
        >tail -f <logfile| grep <search string| python parse.py
        >
        The pipe puts STDIN/STDOUT into "fully buffered" mode, which results
        in the behavior you're seeing. You can set the buffering mode of
        those files in your program, but unfortunately tail and grep are not
        your program... You might get this to work by setting stdin to
        non-blocking I/O in your Python program, but I don't think it will be
        that easy...
        >
        You can get around this in a couple of ways. One is to call tail and
        grep from within your program, using something like os.popen()...
        Then set the blocking mode on the resulting files. You'll have to
        feed the output of one to the input of the other, then read the output
        of grep and parse that. Yucky. That method isn't very efficient,
        since Python can do everything that tail and grep are doing for you...
        So I'd suggest you read the file directly in your python program, and
        use Python's regex parsing functionality to do what you're doing with
        grep.
        >
        As for how to actually do what tail does, I'd suggest looking at the
        source code for tail to see how it does what it does.
        >
        But, if I were you, I'd just download something like swatch, and be
        done with it. :)
        >
        >
        >
        ------------------------------------------------------------------------
        >
        --
        http://mail.python.org/mailman/listinfo/python-list
        =============== =============== ==
        I have to agree with Derek about using Python as the control here. Pipe
        or otherwise redirect incoming data to Python. If the incoming is
        buffered then the program terminates only by force. (Deleted from memory
        or system shutdown or crash)

        The python: print >>file, str see Python's lib.pdf
        acts like incoming | tee -a file in the sense of double output.
        One to a file and one to standard out. Str can be a .read() on stdin.
        As long as it is a string it don't care how it got there.

        Depending on choice (per Unix):
        incoming | tee -a logfile | program.py
        incoming | program.py (copy all to (log)file) | programsub1.py
        with all parsing in the .py's

        The advantage is python can control keeping the buffers and thus the
        programs open and running, whether or not data is in the pipe at the
        moment. This way the logfile gets a full data set and is not further
        disturbed. No trying to determine where last record read is located.
        OR
        Last time I looked, the syslog section was NOT disallowed the use of
        named pipes (which default to first in, first out (FIFO)).
        This allows pgm.py to read named_pipe, append all read to log and
        parse each line as desired, sleep for a time when empty and go again.
        Once more, sequence maintained. No digging to find last tested input.



        Hope this helps.

        Steve
        norseman@hughes .net

        Comment

        Working...