HTMLDocument and Xpath

**Alan Kennedy** · Feb 3 '06, 12:15 PM

Re: HTMLDocument and Xpath

[swilson@acs.on. ca][color=blue]
> Hi, I want to use xpath to scrape info from a website using pyXML but I
> keep getting no results.
>
> For example, in the following, I want to return the text "Element1" I
> can't get xpath to return anything at all. What's wrong with this
> code?[/color]

Your xpath expression is wrong.
[color=blue]
> test = Evaluate('td', doc_node.docume ntElement)[/color]

Try one of the following alternatives, all of which should work.

test = Evaluate('//td', doc_node.docume ntElement)
test = Evaluate('/html/body/table/tr/td', doc_node.docume ntElement)
test = Evaluate('/html/body/table/tr/td[1]', doc_node.docume ntElement)

HTH,

Alan.

**swilson@acs.on.ca** · Feb 3 '06, 02:35 PM

Re: HTMLDocument and Xpath

Alan Kennedy wrote:[color=blue]
> [swilson@acs.on. ca][color=green]
> > Hi, I want to use xpath to scrape info from a website using pyXML but I
> > keep getting no results.
> >
> > For example, in the following, I want to return the text "Element1" I
> > can't get xpath to return anything at all. What's wrong with this
> > code?[/color]
>
> Your xpath expression is wrong.
>[color=green]
> > test = Evaluate('td', doc_node.docume ntElement)[/color]
>
> Try one of the following alternatives, all of which should work.
>
> test = Evaluate('//td', doc_node.docume ntElement)
> test = Evaluate('/html/body/table/tr/td', doc_node.docume ntElement)
> test = Evaluate('/html/body/table/tr/td[1]', doc_node.docume ntElement)
>
> HTH,
>
> Alan.[/color]

I tried all of those and in every case, test returns "[]". Does
Evaluate only work with XML documents?

Shawn

**swilson@acs.on.ca** · Feb 7 '06, 10:15 AM

Re: HTMLDocument and Xpath

Got the answer - there's a bug in xpath. I think the HTML parser
converts all the tags (but not the attributes) to uppercase. Xpath
definitely does not like my first string but, these work fine:

test = Evaluate('//TD', doc_node.docume ntElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD', doc_node.docume ntElement)
test = Evaluate('/HTML/BODY/TABLE/TR/TD[1]', doc_node.docume ntElement)

Shawn

HTMLDocument and Xpath

HTMLDocument and Xpath

Comment

Comment

Comment