lxml question

**Mark Thomas** · Sep 26 '08, 04:55 PM

Re: lxml question

On Sep 26, 11:19 am, Uwe Schmitt <rocksportroc.. .@googlemail.co m>
wrote:

I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?

By "pretends to be XML" you mean XML-like but not really XML?

My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.

That's actually not a bad solution, if you know that the document is
otherwise well-formed. Another thing you can do is use libxml2's
"recover" mode which accommodates non-well-formed XML.

parser = etree.XMLParser (recover=True)
tree = etree.XML(your_ xml_string, parser)

You'll still need to use your wrapper root element, because recover
mode will ignore everything after the first root closes (and it won't
throw an error).

-- Mark.

**alex23** · Sep 27 '08, 01:15 AM

Re: lxml question

On Sep 27, 1:19 am, Uwe Schmitt <rocksportroc.. .@googlemail.co m>
wrote:

I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.

Another option is BeautifulSoup, which handles badly formed XML really
well:

Beautiful Soup: We called him Tortoise because he taught us.

http://www.crummy.com/software/BeautifulSoup/

**Stefan Behnel** · Oct 3 '08, 04:25 PM

Re: lxml question

Uwe Schmitt wrote:

I have to parse some text which pretends to be XML. lxml does not want
to parse it, because it lacks a root element.
I think that this situation is not unusual, so: is there a way to
force lxml to parse it ?
>
My work around is wrapping the text with "<root>...</root>" before
feeding lxmls parser.

Yes, you can do that. To avoid creating an intermediate string, you can use
the feed parser and do something like this:

parser = etree.XMLParser ()
parser.feed("<r oot>")
parser.feed(you r_xml_tag_seque nce_data)
parser.feed("</root>")
root = parser.close()

Stefan

lxml question

lxml question

Comment

Comment

Comment