Text file manipulation.

**kudos** · Jan 22 '08, 06:42 PM

hi,
start by searching for <BACKSPAC E (for instance by using the find method). Note the index of it, then check what the value of the char after BACKSPACE (you handle it differently if it is & or :). substring the string and clue it together without the backspace, and go n characters back, given what the value is. I would recommend to string replace every <BACKSPACE&g t; with <BACKSPAC E: 1> (so you only need to handle one case)

Did this help, or was my post confusing?
-kudos

Originally posted by Joe1986

Hi there, im trying to create a python program that can read a text file line by line and search for specified words/text/strings and remove them from the text file. Then finally save the modified text file to an output file. The only problem is, the text file contains code for a BACKSPACE typed in the text. e.g. "<BACKSPACE& gt;" this needs to be removed which sounds quite simple, but often there are numbers involved. e.g. "<BACKSP ACE: 4>" which would represent 4 backspaces, and so the string needs to be removed and 4 backspaces take place on the text before the code. e.g. "repatre<BAC KSPACE: 4>resent" would become "represent" I know a search and delete script that can search for a word at the beggining of the line and then delete the whole line, but im not sure about this. Any ideas would be great.
Cheers,
Joe

**Joe1986** · Jan 23 '08, 11:30 AM

Originally posted by kudos

hi,
start by searching for <BACKSPAC E (for instance by using the find method). Note the index of it, then check what the value of the char after BACKSPACE (you handle it differently if it is & or :). substring the string and clue it together without the backspace, and go n characters back, given what the value is. I would recommend to string replace every <BACKSPACE&g t; with <BACKSPAC E: 1> (so you only need to handle one case)

Did this help, or was my post confusing?
-kudos

Hi there,

Im a little confused. Im not sure I understand the string replace method you mentioned at the end of your post.
cheers,
Joe

**Joe1986** · Jan 23 '08, 02:54 PM

Hi,
Here is a solution. I can write results to a txt file which is great. All I need to do now is read from a text which im guessing will just go in place of the 'test' section in the code and to be able to iterate this code so that I can run through a whole paragraph containing multiple <BACKSPACE> sections removing all (rather than getting to the first one and terminating.) Would you have any ideas looking at this to do such a thing? Any help would be great. Cheers. Joe

Code:

#! /usr/bin/python

import re
# Global variable


bs = re.compile('<BACKSPACE(:[ ]*[0-9]+)?>')


#theInFile = open("test2.txt", "r")

theOutFile = open("backspace_out.txt", "w")



tests = ['re <BACKSPACE>present', 'This is bound to repatreaaaabbbbcccc <BACKSPACE: 17>resent']

def bs_remove(s):

    global bs

    for m in bs.finditer(s):

        if m.groups()[0] is None:

            return s[:m.start() - 1] + s[m.end():]

        else:

            return s[:m.start() - int(m.groups()[0][1:])] + s[m.end():]



for s in tests:

      theOutFile.write (bs_remove(s)+ " ")

**bvdet** · Jan 24 '08, 05:13 PM

Originally posted by Joe1986

Hi,
Here is a solution. I can write results to a txt file which is great. All I need to do now is read from a text which im guessing will just go in place of the 'test' section in the code and to be able to iterate this code so that I can run through a whole paragraph containing multiple <BACKSPACE> sections removing all (rather than getting to the first one and terminating.) Would you have any ideas looking at this to do such a thing? Any help would be great. Cheers. Joe

Code:

#! /usr/bin/python

import re
# Global variable


bs = re.compile('<BACKSPACE(:[ ]*[0-9]+)?>')


#theInFile = open("test2.txt", "r")

theOutFile = open("backspace_out.txt", "w")



tests = ['re <BACKSPACE>present', 'This is bound to repatreaaaabbbbcccc <BACKSPACE: 17>resent']

def bs_remove(s):

    global bs

    for m in bs.finditer(s):

        if m.groups()[0] is None:

            return s[:m.start() - 1] + s[m.end():]

        else:

            return s[:m.start() - int(m.groups()[0][1:])] + s[m.end():]



for s in tests:

      theOutFile.write (bs_remove(s)+ " ")

To account for single or multiple occurrences of backspaces in each line:[code=Python]# <BACKSPACE&g t; 1 backspace
# <BACKSPAC E: 4> 4 backspaces

import re

def remove_bs(s):
patt = re.compile(r'&l t;BACKSPACE:? ?(\d+)?>')
while True:
m = patt.search(s)
if m:
if m.group(1):
start = m.start()-int(m.group(1))
else:
start = m.start()-1
s = ''.join([s[:start], s[m.end():]])
else:
break
return s

def parse_file(fn):
return ''.join([remove_bs(line) for line in open(fn).readli nes()])

fn = 'input.txt'
fnOut = 'output.txt
f = open(fnOut, 'w')
f.write(parse_f ile(fn))
f.close()[/code]

**thisissuma** · Feb 2 '08, 06:21 AM

Hi there, im trying to create a python program that can read a text file line by line and search for specified words/text/strings and place them in another text file.
where line length should be 65 characters and number of lines per page are 70.

Text file manipulation.

Text file manipulation.

Comment

Comment

Comment

Comment

Comment