How to delete lines between two words from text file?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • scharnisto
    New Member
    • Dec 2010
    • 8

    How to delete lines between two words from text file?

    Hi,

    i need a script, that deletes all the lines between two certain words in a text file. the two words appear several times in the text file.

    i have a code, that works so far. the problem is, once it looped for the first time through the text file, the end_word appears before the new start_word. so i have to tell the program, to start searching for the end_word, after the start_word.

    here is the code. i hope that makes it more understandable:

    Code:
    orig_file = open ("*Strata1CombinedAll.txt", "r")
    lines = orig_file.readlines()
    
    
    
    
    import fnmatch
    def find(seq, pattern):
        pattern = pattern.lower()
        for i, n in enumerate(seq):
            if fnmatch.fnmatch(n.lower(), pattern):
                return i
                return -1
    
    def index(seq, pattern):
        result = find(seq, pattern)
        if result == -1:
            raise ValueError
        return result
    
    
    
    try:
        for item in lines:
            begin_line = index(lines, "*start_word*")
            end_line = index(lines, "*end_word*") 
            del lines[begin_line:end_line]
    
    except:
        new_text_file = open ("*test604", "w")
        new_text_file.writelines(lines)
        new_text_file.close()
  • dwblas
    Recognized Expert Contributor
    • May 2008
    • 626

    #2
    In the function index(), result will never equal negative one. Also, under the try block you delete records from the same list that the for loop is using leading to errors. To illustrate, look at the length of the list originally and compare to how many records actually print (you should copy the records you want to keep to a new list instead of deleting from the original list).
    Code:
    test_data = [["abc"], ["def"], ["ghi"], ["abc"], ["xyz"], ["abc"], ["mno"], ["rst"]]
    ctr = 0
    for rec in test_data:
        ctr += 1
        print ctr, rec
        if ctr%2:
            del test_data[ctr]

    Comment

    • dwblas
      Recognized Expert Contributor
      • May 2008
      • 626

      #3
      Remove all records between "abc" and "def". There should be a way to do this using groupby from itertools but I don't have time now to try it. Perhaps someone else will post it.
      Code:
      test_data = ["abc", "def", "ghi", "xabcy", "xyz", "def", "mno", "rst"]
      
      start = False
      saved_list = []
      
      for rec in test_data:
          if "abc" in rec.lower():
              start = True
          if not start:
              saved_list.append(rec)
              ## or in your case
              ##output_file.write(rec)
          if "def" in rec:
              start = False
      print saved_list

      Comment

      • scharnisto
        New Member
        • Dec 2010
        • 8

        #4
        the last code would work. thank you dwblas!
        i have it like this now:

        Code:
        orig_file = open (input(), "r")
        lines = orig_file.readlines()
        
        start = False
        saved_list = []
        for rec in lines:
            if "count" in rec:
                start = True
            if not start:
                saved_list.append(rec)
            if "Volume" in rec:
                start= False
        
        
        new_text_file = open (input(), "w")
        new_text_file.writelines(saved_list)
        new_text_file.close()
        the only problem that is left is, that the last line shall not be deleted. the last line always has the word 'volume' in there. the line before the word 'volume' is always a blank line, which shall be deleted.
        any ideas how to get this to work?

        Comment

        • scharnisto
          New Member
          • Dec 2010
          • 8

          #5
          i figured it out. the following code, does what i need to do.

          Code:
          orig_file = open (input(), "r")
          lines = orig_file.readlines()
           
          start = False
          saved_list = []
          for rec in lines:
              if "count" in rec:
                  start = True
              if "Volume" in rec:
                  start= False
              if not start:
                  saved_list.append(rec) 
           
          new_text_file = open (input(), "w")
          new_text_file.writelines(saved_list)
          new_text_file.close()

          Comment

          Working...