issues simply parsing a whitespace-delimited textfile in pythonscript

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Damon Getsman

    issues simply parsing a whitespace-delimited textfile in pythonscript

    Okay so I'm writing a script in python right now as a dirty fix for a
    problem we're having at work.. Unfortunately this is the first really
    non-trivial script that I've had to work with in python and the book
    that I have on it really kind of sucks.

    I'm having an issue parsing lines of 'last' output that I have stored
    in a /tmp file. The first time it does a .readline() I get the full
    line of output, which I'm then able to split() and work with the
    individual fields of without any problem. Unfortunately, the second
    time that I do a .readline() on the file, I am only receiving the
    first character of the first field. Looking through the /tmp file
    shows that it's not corrupted from the format that it should be in at
    all... Here's the relevant script:

    ----
    #parse
    Lastdump = open('/tmp/esd_tmp', 'r')

    #find out what the last day entry is in the wtmp
    cur_rec = Lastdump.readli ne()
    work = cur_rec.split()

    if debug == 1:
    print work
    print " is our split record line from /tmp/esd_tmp\n"

    startday = work[3]

    if debug == 1:
    print startday + " is the starting day\n"
    print days
    print " is our dictionary of days\n"
    print days[startday] + " is our ending day\n"

    for cur_rec in Lastdump.readli ne():
    work = cur_rec.split()

    if debug == 1:
    print "Starting table building pass . . .\n"
    print work
    print " is the contents of our split record line now\n"
    print cur_rec + " is the contents of cur_rec\n"

    #only go back 2 days

    while work[0] != days[startday]:
    tmp = work[1]
    if table.has_key(w ork[0]):
    continue
    elif tmp[0] != ':':
    #don't keep it if it isn't a SunRay terminal
    identifier
    continue
    else:
    #now we keep it
    table[work[0]] = tmp
    ----

    the first and second sets of debugging output show everything as they
    should be... the third shows that the next working line (in cur_rec),
    and thus 'work', as well, only hold the first character of the line.
    Here's the output:

    ----
    Debugging run


    Building table . . .

    ['dgetsman', 'pts/3', ':0.0', 'Wed', 'May', '21', '10:21', 'still',
    'logged',
    'in']
    is our split record line from /tmp/esd_tmp

    Wed is the starting day

    {'Wed': 'Mon', 'Sun': 'Fri', 'Fri': 'Wed', 'Thurs': 'Tues', 'Tues':
    'Sun',
    'Mon': 'Sat', 'Sat': 'Thurs'}
    is our dictionary of days

    Mon is our ending day

    Starting table building pass . . .

    ['d']
    is the contents of our split record line now

    d is the contents of cur_rec

    ----
    And thus everything fails when I try to work with the different fields
    in subsequent script afterwards. Does anybody have an idea as to why
    this would be happening?

    Oh, and if relevant, here's the datafile's first few lines:

    ----
    dgetsman pts/3 :0.0 Wed May 21 10:21 still logged
    in
    dgetsman pts/2 :0.0 Wed May 21 09:04 still logged
    in
    dgetsman pts/1 :0.0 Wed May 21 08:56 - 10:21
    (01:24)
    dgetsman pts/0 :0.0 Wed May 21 08:56 still logged
    in

    I would really appreciate any pointers or suggestions you can give.

    <a href="http://www.zoominfo.co m/people/Getsman_Damon_-214241.aspx">
    *Damon Getsman
    Linux/Solaris System Administrator
    </a>
  • Damon Getsman

    #2
    Re: issues simply parsing a whitespace-delimited textfile in pythonscript

    Okay, so I manged to kludge around the issue by not using
    the .readline() in my 'for' statement. Instead, I'm slurping the
    whole file into a new list that I put in for that purpose, and
    everything seems to be working just fine. However, I don't know WHY
    the other method failed and I'm at a loss for why that didn't work and
    this is working. I'd really like to know the why about this issue so
    that I don't have to use crappy coding practice and kludge around it
    the next time I have an issue like this.

    Any ideas much appreciated.

    Damon G.

    Comment

    • Paul McGuire

      #3
      Re: issues simply parsing a whitespace-delimited textfile in pythonscript

      On May 21, 10:59 am, Damon Getsman <dgets...@amire hab.netwrote:
      I'm having an issue parsing lines of 'last' output that I have stored
      in a /tmp file.  The first time it does a .readline() I get the full
      line of output, which I'm then able to split() and work with the
      individual fields of without any problem.  Unfortunately, the second
      time that I do a .readline() on the file, I am only receiving the
      first character of the first field.  Looking through the /tmp file
      shows that it's not corrupted from the format that it should be in at
      all...  Here's the relevant script:
      >
      ----
          #parse
          Lastdump = open('/tmp/esd_tmp', 'r')
      >
          #find out what the last day entry is in the wtmp
          cur_rec = Lastdump.readli ne()
          work = cur_rec.split()
      >
          if debug == 1:
              print work
              print " is our split record line from /tmp/esd_tmp\n"
      >
          startday = work[3]
      >
          if debug == 1:
              print startday + " is the starting day\n"
              print days
              print " is our dictionary of days\n"
              print days[startday] + " is our ending day\n"
      >
          for cur_rec in Lastdump.readli ne():
              work = cur_rec.split()
      >
      <snip>


      for cur_rec in Lastdump.readli ne():

      is the problem. readline() returns a string containing the next
      line's worth of text, NOT an iterator over all the subsequent lines in
      the file. So your code is really saying:

      next_line_in_fi le = Lastdump.readli ne():
      for cur_rec in next_line_in_fi le:

      which of course, is iterating over a string character by character.

      Since you are opening Lastdump (not great casing for a variable name,
      BTW - looks like a class name with that leading capital letter), it
      gives you an iterator already. Try this instead:

      lastdump = open('/tmp/esd_tmp', 'r')

      cur_rec = lastdump.next()

      ...

      for cur_rec in lastdump:

      ...

      This should get you over the hump on reading the file.

      Also, may I suggest this method for splitting up each record line, and
      assigning individual fields to variables:

      user,s1,s2,day, month,date,time ,desc = cur_rec.split(N one,7)

      -- Paul

      Comment

      • Damon Getsman

        #4
        Re: issues simply parsing a whitespace-delimited textfile in pythonscript

        On May 21, 11:15 am, Paul McGuire <pt...@austin.r r.comwrote:
        <snip>
        >
        for cur_rec in Lastdump.readli ne():
        >
        is the problem. readline() returns a string containing the next
        line's worth of text, NOT an iterator over all the subsequent lines in
        the file. So your code is really saying:
        >
        next_line_in_fi le = Lastdump.readli ne():
        for cur_rec in next_line_in_fi le:
        >
        which of course, is iterating over a string character by character.
        >
        Since you are opening Lastdump (not great casing for a variable name,
        BTW - looks like a class name with that leading capital letter), it
        gives you an iterator already. Try this instead:
        >
        lastdump = open('/tmp/esd_tmp', 'r')
        >
        cur_rec = lastdump.next()
        >
        ...
        >
        for cur_rec in lastdump:
        >
        ...
        >
        This should get you over the hump on reading the file.
        >
        Also, may I suggest this method for splitting up each record line, and
        assigning individual fields to variables:
        >
        user,s1,s2,day, month,date,time ,desc = cur_rec.split(N one,7)
        >
        -- Paul
        Well the individual variables isn't exactly appropriate as I'm only
        going to be using 2 of the fields. I think I will set those to
        individual variables with a slice of what you mentioned, though, for
        readability. Thank you for the tips, they were all much appreciated.

        -Damon

        Comment

        Working...