urllib2.HTTPError: HTTP Error 204: NoContent
Collapse
This topic is closed.
X
X
-
silk.odysseyTags: None -
Philip Semanchuk
Re: urllib2.HTTPErr or: HTTP Error 204: NoContent
On Oct 19, 2008, at 6:13 AM, silk.odyssey wrote:
Are you changing the user-agent? Some sites sniff user agents and
return different results to browsers than to suspected bots.
I'd try it from here if you post a self-contained sample that
demonstrates the problem. Should only take a couple of lines.
-
Mark Sapiro
Re: urllib2.HTTPErr or: HTTP Error 204: NoContent
On Oct 19, 9:49 am, Philip Semanchuk <phi...@semanch uk.comwrote:On Oct 19, 2008, at 6:13 AM, silk.odyssey wrote:
>>I am getting the following error trying to download an html page using
urllib2.>urllib2.HTTPErr or: HTTP Error 204: NoContent>>The url is of this type:>I can open it in my browser without problems.Any ideas on a solution?
Are you changing the user-agent? Some sites sniff user agents and
return different results to browsers than to suspected bots.
I tried it.
Traceback (most recent call last):>>import urllib2
>>url = 'http://www.amazon.com/gp/offer-listing/B000KJX3A0%3FSu bscriptionId%3D 183VXJS74KNQ89D 0NRR2%26tag%3Dw s%26linkCode%3D xm2%26camp%3D20 25%26creative%3 D386001%26creat iveASIN%3DB000K JX3A0'
>>op = urllib2.urlopen (url)
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/urllib2.py", line 121, in urlopen
return _opener.open(ur l, data)
File "/usr/lib/python2.5/urllib2.py", line 380, in open
response = meth(req, response)
File "/usr/lib/python2.5/urllib2.py", line 491, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.5/urllib2.py", line 418, in error
return self._call_chai n(*args)
File "/usr/lib/python2.5/urllib2.py", line 353, in _call_chain
result = func(*args)
File "/usr/lib/python2.5/urllib2.py", line 499, in
http_error_defa ult
raise HTTPError(req.g et_full_url(), code, msg, hdrs, fp)
urllib2.HTTPErr or: HTTP Error 204: NoContent(lots of HTML)>>headers = {}
>>headers['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3'
>>ro = urllib2.Request (url, None, headers)
>>op = urllib2.urlopen (ro)
>>page = op.read()
>>page
So the answer is as Philip suggests - amazon.com doesn't like 'Python-
urllib/2.5' as a User-Agent. You have to give it something that looks
like a browser.
--
(for email use this address please - you can figure it out)
Mark Sapiro mark at msapiro net Any clod can have the facts;
San Francisco Bay Area, California having opinions is an art. -
C. McCabe, The Fearless
Spectator
Comment
Comment