Regex Help

**Miki** · Sep 23 '08, 09:05 AM

Re: Regex Help

Hello,

Anybody know of a good regex to parse html links from html code?

BeautifulSoup is *the* library to handle HTML

from BeautifulSoup import BeautifulSoup
from urllib import urlopen

soup = BeautifulSoup(u rlopen("http://python.org/"))
for a in soup("a"):
print a["href"]

HTH,
--
Miki <miki.tebeka@gm ail.com>

PythonWise

http://pythonwise.blogspot.com

If it won't be simple, it simply won't be. [Hire me, source code]

**Lawrence D'Oliveiro** · Sep 23 '08, 11:55 PM

Re: Regex Help

In message <mailman.1369.1 222101506.3487. python-list@python.org >, Support
Desk wrote:

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like
>

http://somesite.co

>
or http://somesite.ph
>
the code I am using is
>
>
regex = r'<a href=["|\']([^"|\']+)["|\']>'

Can you post some example HTML sequences that this regexp is not handling
correctly?

**Support Desk** · Sep 24 '08, 02:25 PM

RE: Regex Help

Thanks for the reply, I found out the problem was occurring later on in the
script. The regexp works well.

-----Original Message-----
From: Lawrence D'Oliveiro [mailto:ldo@geek-central.gen.new _zealand]
Sent: Tuesday, September 23, 2008 6:51 PM
To: python-list@python.org
Subject: Re: Regex Help

In message <mailman.1369.1 222101506.3487. python-list@python.org >, Support
Desk wrote:

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like
>

http://somesite.co

>
or http://somesite.ph
>
the code I am using is
>
>
regex = r'<a href=["|\']([^"|\']+)["|\']>'

Can you post some example HTML sequences that this regexp is not handling
correctly?

**Lawrence D'Oliveiro** · Sep 25 '08, 09:25 AM

RE: Regex Help

In message <mailman.1450.1 222266191.3487. python-list@python.org >, Support
Desk wrote:

Thanks for the reply ...

A: The vulture doesn't get Frequent Poster miles.
Q: What's the difference between a top-poster and a vulture?

Regex Help

Regex Help

Comment

Comment

Comment

Comment