Hi Paul...
Thanks for the reply. Came to the same conclusion a few minutes before I saw
your email.
Another question:
tr=d.xpath(foo)
gets me an array of nodes.
is there a way for me to then iterate through the node tr[x] to see if a
child node exists???
"d" is a document object, while "tr" would be a node object?, or would i
convert the "tr[x]" to a string, and then feed that into the
libxml2dom.pars eString()...
thanks
-----Original Message-----
From: python-list-bounces+bedougl as=earthlink.ne t@python.org
[mailto:python-list-bounces+bedougl as=earthlink.ne t@python.org]On Behalf
Of Paul Boddie
Sent: Friday, June 13, 2008 12:49 PM
To: python-list@python.org
Subject: Re: python screen scraping/parsing
On 13 Jun, 20:10, "bruce" <bedoug...@eart hlink.netwrote:
[...]
"/html/body/div[@id='pgSiteCont ainer']/div[@id='pgPageCont ent']/table[2]/tbo
[...]
Yes, I can confirm this.
no
Yes, but the DOM tool in Firefox probably inserts virtual nodes for
its own purposes. Remember that it has to do a lot of other stuff like
implement CSS rendering and DOM event models.
You can confirm that there really is no tbody by printing the result
of this...
d.xpath("/html/body/div[@id='pgSiteCont ainer']/
div[@id='pgPageCont ent']/table[2]")[0].toString()
This should fetch the second table in a single element list and then
obviously give you the only element of that list. You'll see that the
raw HTML doesn't have any tbody tags at all.
Paul
--
Thanks for the reply. Came to the same conclusion a few minutes before I saw
your email.
Another question:
tr=d.xpath(foo)
gets me an array of nodes.
is there a way for me to then iterate through the node tr[x] to see if a
child node exists???
"d" is a document object, while "tr" would be a node object?, or would i
convert the "tr[x]" to a string, and then feed that into the
libxml2dom.pars eString()...
thanks
-----Original Message-----
From: python-list-bounces+bedougl as=earthlink.ne t@python.org
[mailto:python-list-bounces+bedougl as=earthlink.ne t@python.org]On Behalf
Of Paul Boddie
Sent: Friday, June 13, 2008 12:49 PM
To: python-list@python.org
Subject: Re: python screen scraping/parsing
On 13 Jun, 20:10, "bruce" <bedoug...@eart hlink.netwrote:
>
url ="http://www.pricegrabbe r.com/rating_summary. php/page=1"
url ="http://www.pricegrabbe r.com/rating_summary. php/page=1"
tr =
>
>
dy/tr[4]"
>
tr_=d.xpath(tr)
>
tr_=d.xpath(tr)
my issue appears to be related to the last "tbody", or tbody/tr[4]...
>
if i leave off the tbody, i can display data, as the tr_ is an array with
data...
>
if i leave off the tbody, i can display data, as the tr_ is an array with
data...
with the "tbody" it appears that the tr_ array is not defined, or it has
data... however, i can use the DOM tool with firefox to observe the fact
that the "tbody" is there...
that the "tbody" is there...
its own purposes. Remember that it has to do a lot of other stuff like
implement CSS rendering and DOM event models.
You can confirm that there really is no tbody by printing the result
of this...
d.xpath("/html/body/div[@id='pgSiteCont ainer']/
div[@id='pgPageCont ent']/table[2]")[0].toString()
This should fetch the second table in a single element list and then
obviously give you the only element of that list. You'll see that the
raw HTML doesn't have any tbody tags at all.
Paul
--
Comment