DOM with HTML

**F. GEIGER** · Jul 18 '05, 12:14 AM

Re: DOM with HTML

> Hi, I need to get a sort of DOM from an HTML page that is declared as
XHTML[color=blue]
> but unfortunately is *not* xhtml valid.. If I try to parse it with[/color]

I use mx.Tidy in such cases, with great success.

Cheers
Franz

"Alessio Pace" <puccio_13@yaho o.it> schrieb im Newsbeitrag
news:3GbMa.4404 $FI4.118833@tor nado.fastwebnet .it...[color=blue]
> Hi, I need to get a sort of DOM from an HTML page that is declared as[/color]
XHTML[color=blue]
> but unfortunately is *not* xhtml valid.. If I try to parse it with
> xml.dom.minidom I get error with expat (as I supposed), so I was told to
> try in this way, with a "forgiving" html parser:
>
> from xml.dom.ext.rea der import HtmlLib
> reader = HtmlLib.Reader( )
> dom = reader.fromUri( url) # 'url' the web page
>
> FIRST ISSUE:
> It seemed to me, reading the source code in
> $MY_PYTHON_INST ALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
> that these are 4DOM APIs , so from what I know of python distributions,[/color]
they[color=blue]
> are extra packages, or not? I would like to use *only* libs that are
> available in the python2.2 suite, not any extra.
>
> SECOND ISSUE:
> If the above libs were included in python (and so I would continue using
> them), how do I print a string representation of a (sub) tree of the DOM?[/color]
I[color=blue]
> tried with .toxml() as in the XML tutorial but that method does not exist
> for the FtNode objects that are involved there... Any idea??
>
> Thanks so much for who can help me
>
> --
> bye
> Alessio Pace[/color]

DOM with HTML

DOM with HTML

Comment