Issue reading data lines multiple times from a file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rka77
    New Member
    • Nov 2009
    • 1

    Issue reading data lines multiple times from a file

    Hi,
    I am trying to make a Python2.6 script on a Win32 that will read all the text files stored in a directory and print only the lines containing actual data. A sample file -
    Set : 1
    Date: 10212009
    12 34 56
    25 67 90
    End Set
    ********
    Set: 2
    Date: 10222009
    34 56 89
    25 67 89
    End Set

    In the above example file, I want to print only the lines 3, 4 and 9, 10 (the actual data values). The program does this iteratively on all txt files.
    I wrote the script as below and am testing it on a single txt file as I go.
    My logic is to read the input files one by one and search for a start string. As soon as the match is found, start searching for end string. when both are found, print the lines from start string to end string.Repeat on the rest of the file before opening another file.
    The problem I am having is that it successfully reads the Set 1 of data, but then screws up on subsequent sets in the file. For set 2, it identifies the no. of lines to read, but prints them starting at incorrect line number.
    A little digging leads to following explanations -
    1. Using seek and tell to reposition the 2nd iteration of the loop, which did not work since the file is read from buffer and that screws up "tell" value.
    2. Opening the file in binary mode helped someone, but it is not working for me.
    3. Open the file with 0 buffer mode, but it did not work.

    Second problem I am having is when it prints data from Set 1, it inserts a blank line between 2 lines of data values. How can I get rid of it?

    Note: Ignore all references to next_run in the code below. I was trying it out for repositioning line read. Subsequent searches for start string should begin from the last position of end string

    Code:
    #!C:/Python26 python
    
    # Import necessary modules
    import os, glob, string, sys, fileinput, linecache
    from goto import goto, label
    
    # Set working path
    path = 'C:\\System_Data'
    
    
    # --------------------
    # PARSE DATA MODULE
    # --------------------
    
    # Define the search strings for data
    start_search = "Set :"
    end_search ="End Set"
    # For Loop to read the input txt files one by one
    for inputfile in glob.glob( os.path.join( path, '*.txt' ) ):
      inputfile_fileHandle = open ( inputfile, 'rb', 0 )
      print( "Current file being read: " +inputfile )
      # start_line initializes to first line
      start_line = 0
      # After first set of data is extracted, next_run will store the position to read the rest of the file
      # next_run = 0
      # start reading the input files, one line by one line
      for line in inputfile:
        line = inputfile_fileHandle.readline()
        start_line += 1
        # next_run+=1
        # If a line matched with the start_search string
        has_match = line.find( start_search )
        if has_match >= 0:
          print ( "Start String found at line number: %d" %( start_line ) )
          # Store the location where the search will be restarted
          # next_run = inputfile_fileHandle.tell() #inputfile_fileHandle.lineno()
          print ("Current Position: %d" % next_run)
          end_line = start_line
          print ( "Start_Line: %d" %start_line )
          print ( "End_Line: %d" %end_line )
          #print(line)
          for line in inputfile:
            line = inputfile_fileHandle.readline()
            #print (line)
            end_line += 1
            has_match = line.find(end_search)
            if has_match >= 0:
              print 'End   String found at line number: %d' % (end_line)
              # total lines to print:
              k=0
              # for loop to print all the lines from start string to end string
              for j in range(0,end_line-start_line-1):
                print linecache.getline(inputfile, start_line +1+ j )
                k+=1
              print ( "Number of lines Printed: %d " %k )
              # Using goto to get out of 2 loops at once
              goto .re_search_start_string
        label .re_search_start_string
        #inputfile_fileHandle.seek(next_run,0)
    
      inputfile_fileHandle.close ()
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    I take it this is not homework.

    You only need to iterate on the file once. This part of your code:
    Code:
      for line in inputfile:
        line = inputfile_fileHandle.readline()
    is part of your problem. There is no need to redefine variable line.

    You have a space after "Set" one place but no space after the other. Change start_search to "Set".

    Assuming you have a list of file names fnList, the following will compile a list of all the data from the files in fnList:
    Code:
    results = []
    start_search = "Set"
    end_search ="End Set"
    for fn in fnList:
        f = open(fn)
        inData = False
        for line in f:
            if line.startswith(start_search):
                inData = True
            elif line.startswith(end_search):
                inData = False
            elif inData and not line.startswith("Date"):
                results.append(line.strip())
        f.close()
    The following prints out the results using string method join():
    Code:
    print "\n".join(results)

    Comment

    • Glenton
      Recognized Expert Contributor
      • Nov 2008
      • 391

      #3
      In the past when I've done similar things I've had a boolean variable called something like "recording" . Then I set recording=False , and go through the lines as you suggest. If I find the start_search I set recording to True, and if I find the end_search I set it to False.

      So in pseudocode it's something like this:
      Code:
      recording=False
      Loop through the files:
          Loop through the lines:
              if startCondition: recording=True
              if endCondition: recording=False
              if recording:
                  Print it, or save it to a file or whatever you want
      Last edited by Glenton; Nov 19 '09, 05:55 AM. Reason: pretty up the pseudocode

      Comment

      Working...