Can anyone recommend a good HTML/XHTML parser, similar to
HTMLParser.HTML Parser or htmllib.HTMLPar ser, but able to intelligently
know that certain tags, like <br>, are implicitly closed? I need to
iterate through the entire DOM, building up a DOM path, but the stdlib
parsers aren't calling handle_endtag() for any implicitly closed tags.
I looked at BeautifulSoup, but it only seems to work by first parsing
the entire document, then allowing you to query the document
afterwards. I need something like a SAX parser.
HTMLParser.HTML Parser or htmllib.HTMLPar ser, but able to intelligently
know that certain tags, like <br>, are implicitly closed? I need to
iterate through the entire DOM, building up a DOM path, but the stdlib
parsers aren't calling handle_endtag() for any implicitly closed tags.
I looked at BeautifulSoup, but it only seems to work by first parsing
the entire document, then allowing you to query the document
afterwards. I need something like a SAX parser.
Comment