RE: python screen scraping/parsing

bruce
#1

RE: python screen scraping/parsing

Jun 27 '08, 04:28 PM

Hi Paul...

Thanks for the reply. Came to the same conclusion a few minutes before I saw
your email.

Another question:

tr=d.xpath(foo)

gets me an array of nodes.

is there a way for me to then iterate through the node tr[x] to see if a
child node exists???

"d" is a document object, while "tr" would be a node object?, or would i
convert the "tr[x]" to a string, and then feed that into the
libxml2dom.pars eString()...

thanks

-----Original Message-----
From: python-list-bounces+bedougl as=earthlink.ne t@python.org
[mailto:python-list-bounces+bedougl as=earthlink.ne t@python.org]On Behalf
Of Paul Boddie
Sent: Friday, June 13, 2008 12:49 PM
To: python-list@python.org
Subject: Re: python screen scraping/parsing

On 13 Jun, 20:10, "bruce" <bedoug...@eart hlink.netwrote:

>
url ="http://www.pricegrabbe r.com/rating_summary. php/page=1"

[...]

tr =
>

"/html/body/div[@id='pgSiteCont ainer']/div[@id='pgPageCont ent']/table[2]/tbo

dy/tr[4]"
>
tr_=d.xpath(tr)

[...]

my issue appears to be related to the last "tbody", or tbody/tr[4]...
>
if i leave off the tbody, i can display data, as the tr_ is an array with
data...

Yes, I can confirm this.

with the "tbody" it appears that the tr_ array is not defined, or it has

no

data... however, i can use the DOM tool with firefox to observe the fact
that the "tbody" is there...

Yes, but the DOM tool in Firefox probably inserts virtual nodes for
its own purposes. Remember that it has to do a lot of other stuff like
implement CSS rendering and DOM event models.

You can confirm that there really is no tbody by printing the result
of this...

d.xpath("/html/body/div[@id='pgSiteCont ainer']/
div[@id='pgPageCont ent']/table[2]")[0].toString()

This should fetch the second table in a single element list and then
obviously give you the only element of that list. You'll see that the
raw HTML doesn't have any tbody tags at all.

Paul
--

Mailman 3 Info | python-list@python.org - python.org

http://mail.python.org/mailman/listinfo/python-list
Tags: None
Paul Boddie
#2

Jun 27 '08, 04:28 PM

Re: python screen scraping/parsing

On 13 Jun, 23:09, "bruce" <bedoug...@eart hlink.netwrote:

>
Thanks for the reply. Came to the same conclusion a few minutes before I saw
your email.
>
Another question:
>
tr=d.xpath(foo)
>
gets me an array of nodes.
>
is there a way for me to then iterate through the node tr[x] to see if a
child node exists???

You can always use the DOM or perform another XPath query:

for node in tr[x].childNodes:
<do something with node>

for node in tr[x].xpath(some_oth er_query_inside _tr):
<do something with node>

"d" is a document object, while "tr" would be a node object?, or would i
convert the "tr[x]" to a string, and then feed that into the
libxml2dom.pars eString()...

There's no need to parse anything again: just use the methods on the
object that tr[x] produces, including the xpath method, of course.
Remember that the document object is just a special node object, so
most of the methods are available on both. If in doubt, run your
program using Python's -i option and then inspect the objects at the
interactive prompt.

Paul
Comment

RE: python screen scraping/parsing

RE: python screen scraping/parsing

Comment