Hello all,
I am trying to parse an HTML file but everytime I bump into the newline character my regex stops. How do I hit the newline, skip it, and then continue grabbing text until the next paragraph starts? When I try re.DOTALL it is too greedy and grabs the paragraph dividers as well. Thanks so much!
Thanks again!
Law
I am trying to parse an HTML file but everytime I bump into the newline character my regex stops. How do I hit the newline, skip it, and then continue grabbing text until the next paragraph starts? When I try re.DOTALL it is too greedy and grabs the paragraph dividers as well. Thanks so much!
Code:
# Sample HTML text: text = '<p> We operate forever. \nWe will become Representatives. \n<p> Any conference' # My regex: results = open("results.txt","a") speechPattern = re.compile(r''' <p> (.*) ''', re.VERBOSE) test = speechPattern.findall(text) results.writelines(test) results.close()
Law
Comment