http request doesn't work!

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bart3rNOSPAM@clinicdesignNOSPAM.com.au

    http request doesn't work!

    Could someone please let me know what i'm doing wrong here:


    #!/usr/bin/python

    import httplib

    WEB_SITE = 'adsl.internode .on.net'
    #WEB_SITE = 'www.google.com '
    #PAGE_PATH = '/about.html'
    PAGE_PATH = '/htm/un-metered-sites-ip-list.htm'
    http = httplib.HTTP(WE B_SITE)
    http.putrequest ('GET', PAGE_PATH)
    http.putheader( 'Accept', 'text/html')
    http.putheader( 'Accept', 'text/plan')
    http.endheaders ()
    httpcode, httpmsg, headers = http.getreply()
    print 'msg: ' + httpmsg
    print httpcode
    doc = http.getfile()
    data = doc.read()
    doc.close()
    print data




    The output of the above is different if you go to the following in
    your browser:



    Whats my problem?!?!??!
  • Ben Finney

    #2
    Re: http request doesn't work!

    bart3rNOSPAM@cl inicdesignNOSPA M.com.au wrote:[color=blue]
    > Could someone please let me know what i'm doing wrong here:[/color]

    Using the obsolete httplib.HTTP class.

    <http://www.python.org/doc/lib/module-httplib.html>

    Try using HTTPConnection and HTTPResponse objects:
    [color=blue][color=green][color=darkred]
    >>> import httplib
    >>>
    >>> webhost = 'adsl.internode .on.net'
    >>> pagepath = '/htm/un-metered-sites-ip-list.htm'
    >>>
    >>> http_conn = httplib.HTTPCon nection( webhost )
    >>> http_conn.reque st( 'GET', pagepath )
    >>>
    >>> http_resp = http_conn.getre sponse()
    >>> ( http_resp.statu s, http_resp.reaso n )[/color][/color][/color]
    (200, 'OK')

    --
    \ "I stayed up all night playing poker with tarot cards. I got a |
    `\ full house and four people died." -- Steven Wright |
    _o__) |
    Ben Finney <http://bignose.squidly .org/>

    Comment

    • f29

      #3
      Re: http request doesn't work!

      >[color=blue]
      > The output of the above is different if you go to the following in
      > your browser:
      > http://adsl.internode.on.net/htm/un-...es-ip-list.htm
      >
      >
      > Whats my problem?!?!??![/color]

      Try adding User-Agent header of some popular browser (e.g.
      "Mozilla/5.0 (Windows; U; Windows NT; en-US; rv:1.6) Gecko") so that
      remote site could not prevent fetching their content with a robot.

      Moreover, try looking at the urllib2 module, it has great power.

      rgrds,
      f29

      Comment

      Working...