Help extracting info from HTML source ..

**Miki** · Jan 26 '07, 01:55 PM

Re: Help extracting info from HTML source ..

Hello Shelton,

I am learning Python, and have never worked with HTML. However, I would
like to write a simple script to audit my 100+ Netware servers via their web
portal.

Always use the right tool, BeautilfulSoup
(http://www.crummy.com/software/BeautifulSoup/) is best for web
scraping (IMO).

from urllib import urlopen
from BeautifulSoup import BeautifulSoup

html = urlopen("http://www.python.org" ).read()
soup = BeautifulSoup(h tml)
for link in soup("a"):
print link["href"], "-->", link.contents

HTH,
--
Miki

PythonWise

http://pythonwise.blogspot.com/

If it won't be simple, it simply won't be. [Hire me, source code]

**Nikita the Spider** · Jan 26 '07, 07:05 PM

Re: Help extracting info from HTML source ..

In article <1169819118.201 093.267320@h3g2 000cwc.googlegr oups.com>,
"Miki" <miki.tebeka@gm ail.comwrote:

Hello Shelton,
>

I am learning Python, and have never worked with HTML. However, I would
like to write a simple script to audit my 100+ Netware servers via their web
portal.

Always use the right tool, BeautilfulSoup
(http://www.crummy.com/software/BeautifulSoup/) is best for web
scraping (IMO).
>
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
>
html = urlopen("http://www.python.org" ).read()
soup = BeautifulSoup(h tml)
for link in soup("a"):
print link["href"], "-->", link.contents

Agreed. HTML scraping is really complicated once you get into it. It
might be interesting to write such a library just for your own
satisfaction, but if you want to get something done then use a module
that already written, like BeautifulSoup. Another module that will do
the same job but works differently (and more simply, IMO) is HTMLData by
Connelly Barnes:

Oregon State University

http://oregonstate.edu/~barnesc/htmldata/

Oregon State University delivers exceptional, accessible education and problem-solving innovation as Oregon's largest and statewide public research university.

--
Philip

Nikita the Spider

http://NikitaTheSpider.com/

The ultimate life protection

Whole-site HTML validation, link checking and more

Help extracting info from HTML source ..

Help extracting info from HTML source ..

Comment

Comment