Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like
or http://somesite.ph
the code I am using is
regex = r'<a href=["|\']([^"|\']+)["|\']>'
page_text = urllib.urlopen( 'http://somesite.com')
page_text = page_text.read( )
links = re.findall(rege x, text, re.IGNORECASE)
am currently using seems to be cutting off the last letter of some links,
and returning links like
or http://somesite.ph
the code I am using is
regex = r'<a href=["|\']([^"|\']+)["|\']>'
page_text = urllib.urlopen( 'http://somesite.com')
page_text = page_text.read( )
links = re.findall(rege x, text, re.IGNORECASE)
Comment