Re: using urllib2

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alexnb

    Re: using urllib2


    Okay, I tried to follow that, and it is kinda hard. But since you obviously
    know what you are doing, where did you learn this? Or where can I learn
    this?


    Maric Michaud wrote:

    Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
    >I have never used the urllib or the urllib2. I really have looked online
    >for help on this issue, and mailing lists, but I can't figure out my
    >problem because people haven't been helping me, which is why I am here!
    >:].
    >Okay, so basically I want to be able to submit a word to dictionary.com
    >and
    >then get the definitions. However, to start off learning urllib2, I just
    >want to do a simple google search. Before you get mad, what I have found
    >on
    >urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
    >I
    >did not post the html, but I mean if you want, right click on your
    >browser
    >and hit view source of the google homepage. Basically what I want to know
    >is how to submit the values(the search term) and then search for that
    >value. Heres what I know:
    >>
    >import urllib2
    >response = urllib2.urlopen ("http://www.google.com/")
    >html = response.read()
    >print html
    >>
    >Now I know that all this does is print the source, but thats about all I
    >know. I know it may be a lot to ask to have someone show/help me, but I
    >really would appreciate it.
    This example is for google, of course using pygoogle is easier in this
    case,
    but this is a valid example for the general case :
    >>>>[207]: import urllib, urllib2
    You need to trick the server with an imaginary User-Agent.
    >>>>[208]: def google_search(t erms) :
    return urllib2.urlopen (urllib2.Reques t("http://www.google.com/search?"
    +
    urllib.urlencod e({'hl':'fr', 'q':terms}),
    headers={'User-Agent':'MyNav
    1.0
    (compatible; MSIE 6.0; Linux'})
    ).read()
    .....:
    >>>>[212]: res = google_search(" python & co")
    Now you got the whole html response, you'll have to parse it to recover
    datas,
    a quick & dirty try on google response page :
    >>>>[213]: import re
    >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
    class=r>.*?</h2>',
    res) ]
    ...[229]:
    ['Python Gallery',
    'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ...',
    'Re: os x, panther, python &amp; co: msg#00041',
    'Re: os x, panther, python &amp; co: msg#00040',
    'Cardiff Web Site Design, Professional web site design services ...',
    'Python Properties',
    'Frees &lt; Programs &lt; Python &lt; Bin-Co',
    'Torb: an interface between Tcl and CORBA',
    'Royal Python Morphs',
    'Python &amp; Co']


    --
    _____________

    Maric Michaud
    --

    --
    View this message in context: http://www.nabble.com/using-urllib2-...p18160312.html
    Sent from the Python - python-list mailing list archive at Nabble.com.

  • Jeff McNeil

    #2
    Re: using urllib2

    I stumbled across this a while back: http://www.voidspace.org.uk/python/a.../urllib2.shtml.
    It covers quite a bit. The urllib2 module is pretty straightforward
    once you've used it a few times. Some of the class naming and whatnot
    takes a bit of getting used to (I found that to be the most confusing
    bit).

    On Jun 27, 1:41 pm, Alexnb <alexnbr...@gma il.comwrote:
    Okay, I tried to follow that, and it is kinda hard. But since you obviously
    know what you are doing, where did you learn this? Or where can I learn
    this?
    >
    >
    >
    >
    >
    Maric Michaud wrote:
    >
    Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
    I have never used the urllib or the urllib2. I really have looked online
    for help on this issue, and mailing lists, but I can't figure out my
    problem because people haven't been helping me, which is why I am here!
    :].
    Okay, so basically I want to be able to submit a word to dictionary.com
    and
    then get the definitions. However, to start off learning urllib2, I just
    want to do a simple google search. Before you get mad, what I have found
    on
    urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
    I
    did not post the html, but I mean if you want, right click on your
    browser
    and hit view source of the google homepage. Basically what I want to know
    is how to submit the values(the search term) and then search for that
    value. Heres what I know:
    >
    import urllib2
    response = urllib2.urlopen ("http://www.google.com/")
    html = response.read()
    print html
    >
    Now I know that all this does is print the source, but thats about allI
    know. I know it may be a lot to ask to have someone show/help me, but I
    really would appreciate it.
    >
    This example is for google, of course using pygoogle is easier in this
    case,
    but this is a valid example for the general case :
    >
    >>>[207]: import urllib, urllib2
    >
    You need to trick the server with an imaginary User-Agent.
    >
    >>>[208]: def google_search(t erms) :
    return urllib2.urlopen (urllib2.Reques t("http://www.google.com/search?"
    +
    urllib.urlencod e({'hl':'fr', 'q':terms}),
    headers={'User-Agent':'MyNav
    1.0
    (compatible; MSIE 6.0; Linux'})
    ).read()
    .....:
    >
    >>>[212]: res = google_search(" python & co")
    >
    Now you got the whole html response, you'll have to parse it to recover
    datas,
    a quick & dirty try on google response page :
    >
    >>>[213]: import re
    >
    >>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
    class=r>.*?</h2>',
    res) ]
    ...[229]:
    ['Python Gallery',
    'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ....',
    'Re: os x, panther, python &amp; co: msg#00041',
    'Re: os x, panther, python &amp; co: msg#00040',
    'Cardiff Web Site Design, Professional web site design services ...',
    'Python Properties',
    'Frees &lt; Programs &lt; Python &lt; Bin-Co',
    'Torb: an interface between Tcl and CORBA',
    'Royal Python Morphs',
    'Python &amp; Co']
    >
    --
    _____________
    >>
    --
    View this message in context:http://www.nabble.com/using-urllib2-...p18160312.html
    Sent from the Python - python-list mailing list archive at Nabble.com.


    Comment

    • Alexnb

      #3
      Re: using urllib2


      I have read that multiple times. It is hard to understand but it did help a
      little. But I found a bit of a work-around for now which is not what I
      ultimately want. However, even when I can get to the page I want lets say,
      "Http://dictionary.refe rence.com/browse/cheese", I look on firebug, and
      extension and see the definition in javascript,

      <table class="luna-Ent">
      <tbody>
      <tr>
      <td class="dn" valign="top">1. </td>
      <td valign="top">th e curd of milk separated from the whey and prepared in
      many ways as a food. </td>

      Jeff McNeil-2 wrote:


      the problem being that if I use code like this to get the html of that
      page in python:

      response = urllib2.urlopen ("the webiste....")
      html = response.read()
      print html

      then, I get a bunch of stuff, but it doesn't show me the code with the
      table that the definition is in. So I am asking how do I access this
      javascript. Also, if someone could point me to a better reference than the
      last one, because that really doesn't tell me much, whether it be a book
      or anything.



      I stumbled across this a while back:
      http://www.voidspace.org.uk/python/a.../urllib2.shtml.
      It covers quite a bit. The urllib2 module is pretty straightforward
      once you've used it a few times. Some of the class naming and whatnot
      takes a bit of getting used to (I found that to be the most confusing
      bit).

      On Jun 27, 1:41 pm, Alexnb <alexnbr...@gma il.comwrote:
      >Okay, I tried to follow that, and it is kinda hard. But since you
      >obviously
      >know what you are doing, where did you learn this? Or where can I learn
      >this?
      >>
      >>
      >>
      >>
      >>
      >Maric Michaud wrote:
      >>
      Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
      >I have never used the urllib or the urllib2. I really have looked
      >online
      >for help on this issue, and mailing lists, but I can't figure out my
      >problem because people haven't been helping me, which is why I am
      >here!
      >:].
      >Okay, so basically I want to be able to submit a word to
      >dictionary.c om
      >and
      >then get the definitions. However, to start off learning urllib2, I
      >just
      >want to do a simple google search. Before you get mad, what I have
      >found
      >on
      >urllib2 hasn't helped me. Anyway, How would you go about doing this.
      >No,
      >I
      >did not post the html, but I mean if you want, right click on your
      >browser
      >and hit view source of the google homepage. Basically what I want to
      >know
      >is how to submit the values(the search term) and then search for that
      >value. Heres what I know:
      >>
      >import urllib2
      >response = urllib2.urlopen ("http://www.google.com/")
      >html = response.read()
      >print html
      >>
      >Now I know that all this does is print the source, but thats about all
      >I
      >know. I know it may be a lot to ask to have someone show/help me, but
      >I
      >really would appreciate it.
      >>
      This example is for google, of course using pygoogle is easier in this
      case,
      but this is a valid example for the general case :
      >>
      >>>>[207]: import urllib, urllib2
      >>
      You need to trick the server with an imaginary User-Agent.
      >>
      >>>>[208]: def google_search(t erms) :
      return
      >urllib2.urlope n(urllib2.Reque st("http://www.google.com/search?"
      +
      urllib.urlencod e({'hl':'fr', 'q':terms}),
      headers={'User-Agent':'MyNav
      1.0
      (compatible; MSIE 6.0; Linux'})
      ).read()
      .....:
      >>
      >>>>[212]: res = google_search(" python & co")
      >>
      Now you got the whole html response, you'll have to parse it to recover
      datas,
      a quick & dirty try on google response page :
      >>
      >>>>[213]: import re
      >>
      >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
      >class=r>.*?</h2>',
      res) ]
      ...[229]:
      ['Python Gallery',
      'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty
      >...',
      'Re: os x, panther, python &amp; co: msg#00041',
      'Re: os x, panther, python &amp; co: msg#00040',
      'Cardiff Web Site Design, Professional web site design services ...',
      'Python Properties',
      'Frees &lt; Programs &lt; Python &lt; Bin-Co',
      'Torb: an interface between Tcl and CORBA',
      'Royal Python Morphs',
      'Python &amp; Co']
      >>
      --
      _____________
      >>>>
      >--
      >View this message in
      >context:http://www.nabble.com/using-urllib2-...p18160312.html
      >Sent from the Python - python-list mailing list archive at Nabble.com.


      --

      --
      View this message in context: http://www.nabble.com/using-urllib2-...p18165634.html
      Sent from the Python - python-list mailing list archive at Nabble.com.

      Comment

      • Alexnb

        #4
        Re: using urllib2


        I have read that multiple times. It is hard to understand but it did help a
        little. But I found a bit of a work-around for now which is not what I
        ultimately want. However, even when I can get to the page I want lets say,
        "Http://dictionary.refe rence.com/browse/cheese", I look on firebug, and
        extension and see the definition in javascript,

        <table class="luna-Ent">
        <tbody>
        <tr>
        <td class="dn" valign="top">1. </td>
        <td valign="top">th e curd of milk separated from the whey and prepared in
        many ways as a food. </td>

        the problem being that if I use code like this to get the html of that page
        in python:

        response = urllib2.urlopen ("the webiste....")
        html = response.read()
        print html

        then, I get a bunch of stuff, but it doesn't show me the code with the table
        that the definition is in. So I am asking how do I access this javascript.
        Also, if someone could point me to a better reference than the last one,
        because that really doesn't tell me much, whether it be a book or anything.

        Jeff McNeil-2 wrote:

        I stumbled across this a while back:
        http://www.voidspace.org.uk/python/a.../urllib2.shtml.
        It covers quite a bit. The urllib2 module is pretty straightforward
        once you've used it a few times. Some of the class naming and whatnot
        takes a bit of getting used to (I found that to be the most confusing
        bit).

        On Jun 27, 1:41 pm, Alexnb <alexnbr...@gma il.comwrote:
        >Okay, I tried to follow that, and it is kinda hard. But since you
        >obviously
        >know what you are doing, where did you learn this? Or where can I learn
        >this?
        >>
        >>
        >>
        >>
        >>
        >Maric Michaud wrote:
        >>
        Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
        >I have never used the urllib or the urllib2. I really have looked
        >online
        >for help on this issue, and mailing lists, but I can't figure out my
        >problem because people haven't been helping me, which is why I am
        >here!
        >:].
        >Okay, so basically I want to be able to submit a word to
        >dictionary.c om
        >and
        >then get the definitions. However, to start off learning urllib2, I
        >just
        >want to do a simple google search. Before you get mad, what I have
        >found
        >on
        >urllib2 hasn't helped me. Anyway, How would you go about doing this.
        >No,
        >I
        >did not post the html, but I mean if you want, right click on your
        >browser
        >and hit view source of the google homepage. Basically what I want to
        >know
        >is how to submit the values(the search term) and then search for that
        >value. Heres what I know:
        >>
        >import urllib2
        >response = urllib2.urlopen ("http://www.google.com/")
        >html = response.read()
        >print html
        >>
        >Now I know that all this does is print the source, but thats about all
        >I
        >know. I know it may be a lot to ask to have someone show/help me, but
        >I
        >really would appreciate it.
        >>
        This example is for google, of course using pygoogle is easier in this
        case,
        but this is a valid example for the general case :
        >>
        >>>>[207]: import urllib, urllib2
        >>
        You need to trick the server with an imaginary User-Agent.
        >>
        >>>>[208]: def google_search(t erms) :
        return
        >urllib2.urlope n(urllib2.Reque st("http://www.google.com/search?"
        +
        urllib.urlencod e({'hl':'fr', 'q':terms}),
        headers={'User-Agent':'MyNav
        1.0
        (compatible; MSIE 6.0; Linux'})
        ).read()
        .....:
        >>
        >>>>[212]: res = google_search(" python & co")
        >>
        Now you got the whole html response, you'll have to parse it to recover
        datas,
        a quick & dirty try on google response page :
        >>
        >>>>[213]: import re
        >>
        >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
        >class=r>.*?</h2>',
        res) ]
        ...[229]:
        ['Python Gallery',
        'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty
        >...',
        'Re: os x, panther, python &amp; co: msg#00041',
        'Re: os x, panther, python &amp; co: msg#00040',
        'Cardiff Web Site Design, Professional web site design services ...',
        'Python Properties',
        'Frees &lt; Programs &lt; Python &lt; Bin-Co',
        'Torb: an interface between Tcl and CORBA',
        'Royal Python Morphs',
        'Python &amp; Co']
        >>
        --
        _____________
        >>>>
        >--
        >View this message in
        >context:http://www.nabble.com/using-urllib2-...p18160312.html
        >Sent from the Python - python-list mailing list archive at Nabble.com.


        --

        --
        View this message in context: http://www.nabble.com/using-urllib2-...p18165692.html
        Sent from the Python - python-list mailing list archive at Nabble.com.

        Comment

        Working...