Trying to read google sercah page from python

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • tedpottel@gmail.com

    Trying to read google sercah page from python

    Hi,

    My program reads as follows
    import urllib


    print "-------- Google Web Page --------"
    print urllib.urlopen( 'http://www.google.com//').read()

    print "-------- Google Search Web Page --------"
    print urllib.urlopen( 'http://www.google.com/search?
    hl=en&q=ted').r ead()

    The first urlib read works fine. The second one, when I am trying to
    read in googles serach results, I get a web page saying I do not have
    permission.
    "Your client does not have permission to get URL "
    Is there a way to do this? I am trying to write a program to read in
    googles esercah results.

    -Ted
  • WalterGR

    #2
    Re: Trying to read google sercah page from python

    On Aug 19, 9:47 am, "tedpot...@gmai l.com" <tedpot...@gmai l.comwrote:
    Hi,
    >
    My program reads as follows
    import urllib
    >
    print "-------- Google Web Page --------"
    print urllib.urlopen( 'http://www.google.com//').read()
    >
    print "-------- Google Search Web Page --------"
    print urllib.urlopen( 'http://www.google.com/search?
    hl=en&q=ted').r ead()
    >
    The first urlib read works fine. The second one, when I am trying to
    read in googles serach results, I get a web page saying I do not have
    permission.
    "Your client does not have permission to get URL "
    Is there a way to do this? I am trying to write a program to read in
    googles esercah results.
    >
    -Ted
    This is a PHP discussion group - not a Python group - but I'll answer.

    It's against Google's Terms of Service to do what you're doing, so
    they're blocking you. (Not you specifically, but anyone who requests
    their search results in that manner.)

    If you want to do it anyway, you'd have to trick Google into thinking
    you're an actual web user. So you'd have to do some spoofing. I'll
    leave that as an exercise for the reader.

    Walter

    Comment

    • Jerry Stuckle

      #3
      Re: Trying to read google sercah page from python

      tedpottel@gmail .com wrote:
      Hi,
      >
      My program reads as follows
      import urllib
      >
      >
      print "-------- Google Web Page --------"
      print urllib.urlopen( 'http://www.google.com//').read()
      >
      print "-------- Google Search Web Page --------"
      print urllib.urlopen( 'http://www.google.com/search?
      hl=en&q=ted').r ead()
      >
      The first urlib read works fine. The second one, when I am trying to
      read in googles serach results, I get a web page saying I do not have
      permission.
      "Your client does not have permission to get URL "
      Is there a way to do this? I am trying to write a program to read in
      googles esercah results.
      >
      -Ted
      >
      Actually, the problem is not google blocking you. Your request is
      incorrect. But as Walter indicated, this is not a Python support group.
      Try comp.lang.pytho n.

      And also, as Walter indicated, it is against Google's TOS. They aren't
      blocking you now - but they will if they catch you.

      --
      =============== ===
      Remove the "x" from my email address
      Jerry Stuckle
      JDS Computer Training Corp.
      jstucklex@attgl obal.net
      =============== ===

      Comment

      • WalterGR

        #4
        Re: Trying to read google sercah page from python

        Actually, the problem is not google blocking you. Your request is
        incorrect.
        http://www.google.com// is strangely formed, but it works. Google
        doesn't appear to block automated requests to their front page.

        Google _is_ blocking the other request.

        Viewing http://www.google.com/search?hl=en&q=ted in Firefox works
        fine.

        "curl http://www.google.com/search?hl=en&q= ted" returns the error he
        mentioned previously. Probably returns the error via Python for the
        same reason.

        Walter

        Comment

        • Jerry Stuckle

          #5
          Re: Trying to read google sercah page from python

          WalterGR wrote:
          >Actually, the problem is not google blocking you. Your request is
          >incorrect.
          >
          http://www.google.com// is strangely formed, but it works. Google
          doesn't appear to block automated requests to their front page.
          >
          Google _is_ blocking the other request.
          >
          Viewing http://www.google.com/search?hl=en&q=ted in Firefox works
          fine.
          >
          "curl http://www.google.com/search?hl=en&q= ted" returns the error he
          mentioned previously. Probably returns the error via Python for the
          same reason.
          >
          Walter
          >
          Your crystal ball must be working better than mine. I can't tell that.
          But I could see a lot of other possibilities.

          But this is not a python group, so I won't discuss them here.


          --
          =============== ===
          Remove the "x" from my email address
          Jerry Stuckle
          JDS Computer Training Corp.
          jstucklex@attgl obal.net
          =============== ===

          Comment

          • WalterGR

            #6
            Re: Trying to read google sercah page from python

            On Aug 19, 7:10 pm, Jerry Stuckle <jstuck...@attg lobal.netwrote:
            Your crystal ball must be working better than mine. I can't tell that.
            But I could see a lot of other possibilities.
            Sorry to hear about your crystal ball. But you don't need one in this
            particular case.

            All one needs is knowledge of user agents and user agent overriding,
            and then one can test my hypothesis. (Which, given that I've now
            tested it, is in fact, fact.)

            Walter

            Comment

            Working...