Extract Values between two strings in a text file using python

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • helloR
    New Member
    • Jun 2015
    • 8

    Extract Values between two strings in a text file using python

    Lets say I have a Text file (input_file.txt , file size is ~10GB ).

    Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.

    I wrote the following code.

    Code:
    import re  
    
    with open(r'C:\Python27\log\master_input.txt', 'r') as infile, open(r'C:\Python27\log\output', 'w') as outfile:  
       copy = False  
       for line in infile:  
          if re.match("Jun  6 17:58:16(.*)", line):  
             copy = True  
          elif re.match("Jun  6 17:58:31(.*)", line):  
             copy = False  
          elif copy:  
             outfile.write(line)
    I'm not getting the desired output as expected:

    Output of the code ( output_of_my_co de.txt ):

    Expected output is ( Expected_output .txt ):

    Pls help me here to do it in best way
    Attached Files
    Last edited by bvdet; Jun 10 '15, 02:14 PM. Reason: Edited code tags. Place open tag "[code]" before code and closing tag "[/code]" after code.
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    To achieve the output you need, use re to determine an integer representing the seconds and compare to the lower and upper boundaries. Here's an example:
    Code:
    import re
    
    data = """Jun  6 17:58:13 other strings
    Jun  6 17:58:13 other strings
    Jun  6 17:58:14 other strings
    Jun  6 17:58:14 other strings
    Jun  6 17:58:15 other strings
    Jun  6 17:58:15 other strings
    Jun  6 17:58:15 other strings
    Jun  6 17:58:15 other strings
    Jun  6 17:58:16 other strings
    Jun  6 17:58:16 other strings
    Jun  6 17:58:16 other strings
    Jun  6 17:58:16 other strings
    Jun  6 17:58:16 other strings
    Jun  6 17:58:16 other strings
    Jun  6 17:58:17 other strings
    Jun  6 17:58:17 other strings
    Jun  6 17:58:17 other strings
    Jun  6 17:58:17 other strings
    Jun  6 17:58:18 other strings
    Jun  6 17:58:18 other strings
    Jun  6 17:58:18 other strings
    Jun  6 17:58:18 other strings
    Jun  6 17:58:18 other strings
    Jun  6 17:58:19 other strings
    Jun  6 17:58:19 other strings
    Jun  6 17:58:20 other strings
    Jun  6 17:58:20 other strings
    Jun  6 17:58:21 other strings
    Jun  6 17:58:21 other strings
    Jun  6 17:58:21 other strings
    Jun  6 17:58:21 other strings
    Jun  6 17:58:22 other strings
    Jun  6 17:58:23 other strings
    Jun  6 17:58:24 other strings
    Jun  6 17:58:27 other strings
    Jun  6 17:58:28 other strings
    Jun  6 17:58:28 other strings
    Jun  6 17:58:29 other strings
    Jun  6 17:58:29 other strings
    Jun  6 17:58:29 other strings
    Jun  6 17:58:29 other strings
    Jun  6 17:58:30 other strings
    Jun  6 17:58:31 other strings
    Jun  6 17:58:31 other strings
    Jun  6 17:58:32 other strings
    Jun  6 17:58:33 other strings
    Jun  6 17:58:33 other strings
    Jun  6 17:58:33 other strings
    Jun  6 17:58:33 other strings"""
    
    patt = re.compile("Jun  6 17:58:(\d+?) (.*)")
    upper = 31
    lower = 16
    
    for line in data.split("\n"):
        m = patt.match(line)
        if m:
            i = int(m.group(1))
            if i >= lower and i <= upper:
                print line

    Comment

    • helloR
      New Member
      • Jun 2015
      • 8

      #3
      @bvdet: Thanks for the solution. Here i do not know the upper and lower value... How did you get those values...

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        You knew the upper and lower values in your original post. How did you know them? If you are dealing with dates and times instead of strictly formatted data, look into using the time and datetime modules. Example of creating a datetime object from the date/time string:
        Code:
        >>> datetime.datetime.strptime("Jun  6 17:58:13", "%b  %d %H:%M:%S")
        datetime.datetime(1900, 6, 6, 17, 58, 13)
        >>>
        From there you can create timedelta objects:
        Code:
        >>> d1 = datetime.datetime.strptime("Jun  6 17:58:13", "%b  %d %H:%M:%S")
        >>> d2 = datetime.datetime.strptime("Jun  7 12:55:48", "%b  %d %H:%M:%S")
        >>> d1-d2
        datetime.timedelta(-1, 18145)
        >>> d2-d1
        datetime.timedelta(0, 68255)
        >>> dt1 = d1-d2
        >>> dt1.days
        -1
        >>> dt1.total_seconds()
        -68255.0
        >>> dt2 = d2-d1
        >>> dt2.days
        0
        >>> dt2.total_seconds()
        68255.0
        >>>

        Comment

        Working...