Try:
import re
import urllib2
url = 'http://www.google.com/search?num=20&h l=en&q=ipod&btn G=Search'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {'User-Agent' : user_agent}
req = urllib2.Request (url, None, headers)
file_source=ope n("google_sourc e.txt", 'w')
file_source.wri te(urllib2.urlo pen(req).read() )
file_source.clo se()
I think Google blocks the User-Agent urllib2 sends.
--Jonas Galvez, http://jonasgalvez.com.br/log
On Thu, Jul 3, 2008 at 3:52 AM, spandana g <spandanagella@ gmail.comwrote:
import re
import urllib2
url = 'http://www.google.com/search?num=20&h l=en&q=ipod&btn G=Search'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {'User-Agent' : user_agent}
req = urllib2.Request (url, None, headers)
file_source=ope n("google_sourc e.txt", 'w')
file_source.wri te(urllib2.urlo pen(req).read() )
file_source.clo se()
I think Google blocks the User-Agent urllib2 sends.
--Jonas Galvez, http://jonasgalvez.com.br/log
On Thu, Jul 3, 2008 at 3:52 AM, spandana g <spandanagella@ gmail.comwrote:
Hello ,
>
I have written a code to get the page source of the google search
page .. this is working for other urls. I have this problem with
>
import re
from urllib2 import urlopen
string='http://www.google.com/search?num=20&h l=en&q=ipod&btn G=Search'
file_source=fil e("google_sourc e.txt",'w')
file_source.wri te(urlopen(stri ng).read())
page_content=fi le_source.readl ines()
>
Traceback (most recent call last) :
File "C:/Python25/google.py", line 5,in <module>
file_source.wri te(urlopen(stri ng).read())
File "C:\Python25\li b\urllib2.py", line 124 , in urlopen
return__opener. open(url, data)
File "C:\Python25\li b\urllib2.py", line 387 , in open
response =meth(req, response)
File "C:\Python25\li b\urllib2.py", line 498 , in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\li b\urllib2.py", line 425, in error
return self._call_chai n(*args)
File "C:\Python25\li b\urllib2.py", line 360, in __call_chain
result = func(*args)
File "C:\Python25\li b\urllib2.py", line 506, in http_error_defa ult
raise HTTPError(req.g et_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
>
Actually urlopen is working for google labs sets page but not for the
google.com and even I have same problem with wikipedia . Please let me know
.. If any one of have any idea about this .
>
Thank You,
Spandana.
>
>
>
>
>
>
>
--
>
>
I have written a code to get the page source of the google search
page .. this is working for other urls. I have this problem with
>
import re
from urllib2 import urlopen
string='http://www.google.com/search?num=20&h l=en&q=ipod&btn G=Search'
file_source=fil e("google_sourc e.txt",'w')
file_source.wri te(urlopen(stri ng).read())
page_content=fi le_source.readl ines()
>
Traceback (most recent call last) :
File "C:/Python25/google.py", line 5,in <module>
file_source.wri te(urlopen(stri ng).read())
File "C:\Python25\li b\urllib2.py", line 124 , in urlopen
return__opener. open(url, data)
File "C:\Python25\li b\urllib2.py", line 387 , in open
response =meth(req, response)
File "C:\Python25\li b\urllib2.py", line 498 , in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python25\li b\urllib2.py", line 425, in error
return self._call_chai n(*args)
File "C:\Python25\li b\urllib2.py", line 360, in __call_chain
result = func(*args)
File "C:\Python25\li b\urllib2.py", line 506, in http_error_defa ult
raise HTTPError(req.g et_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
>
Actually urlopen is working for google labs sets page but not for the
google.com and even I have same problem with wikipedia . Please let me know
.. If any one of have any idea about this .
>
Thank You,
Spandana.
>
>
>
>
>
>
>
--
>