Re: using urllib2

This topic is closed.

Alexnb
#1

Re: using urllib2

Jun 27 '08, 08:25 PM

Okay, I tried to follow that, and it is kinda hard. But since you obviously
know what you are doing, where did you learn this? Or where can I learn
this?

Maric Michaud wrote:

Le Friday 27 June 2008 10:43:06 Alexnb, vous avez Ã©critÂ :

>I have never used the urllib or the urllib2. I really have looked online
>for help on this issue, and mailing lists, but I can't figure out my
>problem because people haven't been helping me, which is why I am here!
>:].
>Okay, so basically I want to be able to submit a word to dictionary.com
>and
>then get the definitions. However, to start off learning urllib2, I just
>want to do a simple google search. Before you get mad, what I have found
>on
>urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
>I
>did not post the html, but I mean if you want, right click on your
>browser
>and hit view source of the google homepage. Basically what I want to know
>is how to submit the values(the search term) and then search for that
>value. Heres what I know:
>>
>import urllib2
>response = urllib2.urlopen ("http://www.google.com/")
>html = response.read()
>print html
>>
>Now I know that all this does is print the source, but thats about all I
>know. I know it may be a lot to ask to have someone show/help me, but I
>really would appreciate it.

This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :

>>>>[207]: import urllib, urllib2

You need to trick the server with an imaginary User-Agent.

>>>>[208]: def google_search(t erms) :

return urllib2.urlopen (urllib2.Reques t("http://www.google.com/search?"
+
urllib.urlencod e({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:

>>>>[212]: res = google_search(" python & co")

Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :

>>>>[213]: import re

>>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2

class=r>.*?</h2>',

res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ...',
'Re: os x, panther, python & co: msg#00041',
'Re: os x, panther, python & co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees < Programs < Python < Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python & Co']

--
_____________

Maric Michaud
--

Mailman 3 Info | python-list@python.org - python.org

http://mail.python.org/mailman/listinfo/python-list

--
View this message in context: http://www.nabble.com/using-urllib2-...p18160312.html
Sent from the Python - python-list mailing list archive at Nabble.com.
Tags: None
Jeff McNeil
#2

Jun 27 '08, 08:25 PM

Re: using urllib2

I stumbled across this a while back: http://www.voidspace.org.uk/python/a.../urllib2.shtml.
It covers quite a bit. The urllib2 module is pretty straightforward
once you've used it a few times. Some of the class naming and whatnot
takes a bit of getting used to (I found that to be the most confusing
bit).

On Jun 27, 1:41 pm, Alexnb <alexnbr...@gma il.comwrote:

Okay, I tried to follow that, and it is kinda hard. But since you obviously
know what you are doing, where did you learn this? Or where can I learn
this?
>
>
>
>
>
Maric Michaud wrote:
>

Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :

I have never used the urllib or the urllib2. I really have looked online
for help on this issue, and mailing lists, but I can't figure out my
problem because people haven't been helping me, which is why I am here!
:].
Okay, so basically I want to be able to submit a word to dictionary.com
and
then get the definitions. However, to start off learning urllib2, I just
want to do a simple google search. Before you get mad, what I have found
on
urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
I
did not post the html, but I mean if you want, right click on your
browser
and hit view source of the google homepage. Basically what I want to know
is how to submit the values(the search term) and then search for that
value. Heres what I know:

>

import urllib2
response = urllib2.urlopen ("http://www.google.com/")
html = response.read()
print html

>

Now I know that all this does is print the source, but thats about allI
know. I know it may be a lot to ask to have someone show/help me, but I
really would appreciate it.

>

This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :

>

>>>[207]: import urllib, urllib2

>

You need to trick the server with an imaginary User-Agent.

>

>>>[208]: def google_search(t erms) :

return urllib2.urlopen (urllib2.Reques t("http://www.google.com/search?"
+
urllib.urlencod e({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:

>

>>>[212]: res = google_search(" python & co")

>

Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :

>

>>>[213]: import re

>

>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2

class=r>.*?</h2>',

res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ....',
'Re: os x, panther, python & co: msg#00041',
'Re: os x, panther, python & co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees < Programs < Python < Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python & Co']

>

--
_____________

>

Maric Michaud
--
http://mail.python.org/mailman/listinfo/python-list

>
--
View this message in context:http://www.nabble.com/using-urllib2-...p18160312.html
Sent from the Python - python-list mailing list archive at Nabble.com.
Comment
Alexnb
#3

Jun 27 '08, 11:35 PM

Re: using urllib2

I have read that multiple times. It is hard to understand but it did help a
little. But I found a bit of a work-around for now which is not what I
ultimately want. However, even when I can get to the page I want lets say,
"Http://dictionary.refe rence.com/browse/cheese", I look on firebug, and
extension and see the definition in javascript,

<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">1. </td>
<td valign="top">th e curd of milk separated from the whey and prepared in
many ways as a food. </td>

Jeff McNeil-2 wrote:

the problem being that if I use code like this to get the html of that
page in python:

response = urllib2.urlopen ("the webiste....")
html = response.read()
print html

then, I get a bunch of stuff, but it doesn't show me the code with the
table that the definition is in. So I am asking how do I access this
javascript. Also, if someone could point me to a better reference than the
last one, because that really doesn't tell me much, whether it be a book
or anything.

I stumbled across this a while back:
http://www.voidspace.org.uk/python/a.../urllib2.shtml.
It covers quite a bit. The urllib2 module is pretty straightforward
once you've used it a few times. Some of the class naming and whatnot
takes a bit of getting used to (I found that to be the most confusing
bit).

On Jun 27, 1:41 pm, Alexnb <alexnbr...@gma il.comwrote:

>Okay, I tried to follow that, and it is kinda hard. But since you
>obviously
>know what you are doing, where did you learn this? Or where can I learn
>this?
>>
>>
>>
>>
>>
>Maric Michaud wrote:
>>

Le Friday 27 June 2008 10:43:06 Alexnb, vous avez Ã©crit :
>I have never used the urllib or the urllib2. I really have looked

>online

>for help on this issue, and mailing lists, but I can't figure out my
>problem because people haven't been helping me, which is why I am

>here!

>:].
>Okay, so basically I want to be able to submit a word to

>dictionary.c om

>and
>then get the definitions. However, to start off learning urllib2, I

>just

>want to do a simple google search. Before you get mad, what I have

>found

>on
>urllib2 hasn't helped me. Anyway, How would you go about doing this.

>No,

>I
>did not post the html, but I mean if you want, right click on your
>browser
>and hit view source of the google homepage. Basically what I want to

>know

>is how to submit the values(the search term) and then search for that
>value. Heres what I know:

>>

>import urllib2
>response = urllib2.urlopen ("http://www.google.com/")
>html = response.read()
>print html

>>

>Now I know that all this does is print the source, but thats about all

>I

>know. I know it may be a lot to ask to have someone show/help me, but

>I

>really would appreciate it.

>>

This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :

>>

>>>>[207]: import urllib, urllib2

>>

You need to trick the server with an imaginary User-Agent.

>>

>>>>[208]: def google_search(t erms) :
return

>urllib2.urlope n(urllib2.Reque st("http://www.google.com/search?"

+
urllib.urlencod e({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:

>>

>>>>[212]: res = google_search(" python & co")

>>

Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :

>>

>>>>[213]: import re

>>

>>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2

>class=r>.*?</h2>',

res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty

>...',

'Re: os x, panther, python & co: msg#00041',
'Re: os x, panther, python & co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees < Programs < Python < Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python & Co']

>>

--
_____________

>>

Maric Michaud
--
>http://mail.python.org/mailman/listinfo/python-list

>>
>--
>View this message in
>context:http://www.nabble.com/using-urllib2-...p18160312.html
>Sent from the Python - python-list mailing list archive at Nabble.com.

--

Mailman 3 Info | python-list@python.org - python.org

http://mail.python.org/mailman/listinfo/python-list

--
View this message in context: http://www.nabble.com/using-urllib2-...p18165634.html
Sent from the Python - python-list mailing list archive at Nabble.com.
Comment
Alexnb
#4

Jun 27 '08, 11:45 PM

Re: using urllib2

I have read that multiple times. It is hard to understand but it did help a
little. But I found a bit of a work-around for now which is not what I
ultimately want. However, even when I can get to the page I want lets say,
"Http://dictionary.refe rence.com/browse/cheese", I look on firebug, and
extension and see the definition in javascript,

<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">1. </td>
<td valign="top">th e curd of milk separated from the whey and prepared in
many ways as a food. </td>

the problem being that if I use code like this to get the html of that page
in python:

response = urllib2.urlopen ("the webiste....")
html = response.read()
print html

then, I get a bunch of stuff, but it doesn't show me the code with the table
that the definition is in. So I am asking how do I access this javascript.
Also, if someone could point me to a better reference than the last one,
because that really doesn't tell me much, whether it be a book or anything.

Jeff McNeil-2 wrote:

I stumbled across this a while back:
http://www.voidspace.org.uk/python/a.../urllib2.shtml.
It covers quite a bit. The urllib2 module is pretty straightforward
once you've used it a few times. Some of the class naming and whatnot
takes a bit of getting used to (I found that to be the most confusing
bit).

On Jun 27, 1:41 pm, Alexnb <alexnbr...@gma il.comwrote:

>Okay, I tried to follow that, and it is kinda hard. But since you
>obviously
>know what you are doing, where did you learn this? Or where can I learn
>this?
>>
>>
>>
>>
>>
>Maric Michaud wrote:
>>

Le Friday 27 June 2008 10:43:06 Alexnb, vous avez Ã©crit :
>I have never used the urllib or the urllib2. I really have looked

>online

>for help on this issue, and mailing lists, but I can't figure out my
>problem because people haven't been helping me, which is why I am

>here!

>:].
>Okay, so basically I want to be able to submit a word to

>dictionary.c om

>and
>then get the definitions. However, to start off learning urllib2, I

>just

>want to do a simple google search. Before you get mad, what I have

>found

>on
>urllib2 hasn't helped me. Anyway, How would you go about doing this.

>No,

>I
>did not post the html, but I mean if you want, right click on your
>browser
>and hit view source of the google homepage. Basically what I want to

>know

>is how to submit the values(the search term) and then search for that
>value. Heres what I know:

>>

>import urllib2
>response = urllib2.urlopen ("http://www.google.com/")
>html = response.read()
>print html

>>

>Now I know that all this does is print the source, but thats about all

>I

>know. I know it may be a lot to ask to have someone show/help me, but

>I

>really would appreciate it.

>>

This example is for google, of course using pygoogle is easier in this
case,
but this is a valid example for the general case :

>>

>>>>[207]: import urllib, urllib2

>>

You need to trick the server with an imaginary User-Agent.

>>

>>>>[208]: def google_search(t erms) :
return

>urllib2.urlope n(urllib2.Reque st("http://www.google.com/search?"

+
urllib.urlencod e({'hl':'fr', 'q':terms}),
headers={'User-Agent':'MyNav
1.0
(compatible; MSIE 6.0; Linux'})
).read()
.....:

>>

>>>>[212]: res = google_search(" python & co")

>>

Now you got the whole html response, you'll have to parse it to recover
datas,
a quick & dirty try on google response page :

>>

>>>>[213]: import re

>>

>>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2

>class=r>.*?</h2>',

res) ]
...[229]:
['Python Gallery',
'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty

>...',

'Re: os x, panther, python & co: msg#00041',
'Re: os x, panther, python & co: msg#00040',
'Cardiff Web Site Design, Professional web site design services ...',
'Python Properties',
'Frees < Programs < Python < Bin-Co',
'Torb: an interface between Tcl and CORBA',
'Royal Python Morphs',
'Python & Co']

>>

--
_____________

>>

Maric Michaud
--
>http://mail.python.org/mailman/listinfo/python-list

>>
>--
>View this message in
>context:http://www.nabble.com/using-urllib2-...p18160312.html
>Sent from the Python - python-list mailing list archive at Nabble.com.

--

Mailman 3 Info | python-list@python.org - python.org

http://mail.python.org/mailman/listinfo/python-list

--
View this message in context: http://www.nabble.com/using-urllib2-...p18165692.html
Sent from the Python - python-list mailing list archive at Nabble.com.
Comment

Previous template Next

Re: using urllib2

Comment

Comment

Comment