Regular Expression Help, getting over the newline \n

**bvdet** · May 7 '07, 10:11 PM

Originally posted by BLaw

Hello all,

I am trying to parse an HTML file but everytime I bump into the newline character my regex stops. How do I hit the newline, skip it, and then continue grabbing text until the next paragraph starts? When I try re.DOTALL it is too greedy and grabs the paragraph dividers as well. Thanks so much!

Code:

# Sample HTML text:
text = '<p>&nbsp;&nbsp;&nbsp;We operate forever. \nWe will become Representatives. \n<p>&nbsp;&nbsp;&nbsp; Any conference'

# My regex:
results = open("results.txt","a")  
speechPattern = re.compile(r'''
<p>&nbsp;&nbsp;&nbsp;   
(.*)
''', re.VERBOSE)        
test = speechPattern.findall(text)
results.writelines(test)
results.close()

Thanks again!

Law

If you must have a regex solution, this will not help:

Code:

>>> text = '<p>&nbsp;&nbsp;&nbsp;We operate forever. \nWe will become Representatives. \n<p>&nbsp;&nbsp;&nbsp; Any conference'
>>> [s.strip() for s in text.replace('\n', '').split('<p>&nbsp;&nbsp;&nbsp;') if s != '']
['We operate forever. We will become Representatives.', 'Any conference']
>>>

**BLaw** · May 7 '07, 11:52 PM

Originally posted by bvdet

If you must have a regex solution, this will not help:

Code:

>>> text = '<p>&nbsp;&nbsp;&nbsp;We operate forever. \nWe will become Representatives. \n<p>&nbsp;&nbsp;&nbsp; Any conference'
>>> [s.strip() for s in text.replace('\n', '').split('<p>&nbsp;&nbsp;&nbsp;') if s != '']
['We operate forever. We will become Representatives.', 'Any conference']
>>>

A great example of my beginner's eyes not seeing a better way; thanks so much!

**ghostdog74** · May 8 '07, 12:06 AM

to match newline over multilines, in your re.compile() statement, add re.DOTALL | re.M
eg
re.compile("reg exp", re.DOTALL|re.M)

**bartonc** · May 25 '07, 03:30 PM

Originally posted by ghostdog74

to match newline over multilines, in your re.compile() statement, add re.DOTALL | re.M
eg
re.compile("reg exp", re.DOTALL|re.M)

This is worth a second look, so I'm bumping the thread. I'm studying regular expressions at the moment, and it bugged me that I didn't know the answer. Then, out of the blue, while reading Mastering Regular Expressons, it came to me {Python has a DOTALL flag}. Thanks, GD, for you succinct expertise on this matter.

**ghostdog74** · May 26 '07, 02:09 AM

Originally posted by bartonc

This is worth a second look, so I'm bumping the thread. I'm studying regular expressions at the moment, and it bugged me that I didn't know the answer. Then, out of the blue, while reading Mastering Regular Expressons, it came to me {Python has a DOTALL flag}. Thanks, GD, for you succinct expertise on this matter.

hey bc no prob...:) yup that book is good.

Regular Expression Help, getting over the newline \n

Regular Expression Help, getting over the newline \n

Comment

Comment

Comment

Comment

Comment