Determine Whether File Exists On HTTP Server

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • OvErboRed

    Determine Whether File Exists On HTTP Server

    Hi, I'm trying to determine whether a given URL exists. I'm new to Python
    but I think that urllib is the tool for the job. However, if I give it a
    non-existent file, it simply returns the 404 page. Aside from grepping this
    for '404', is there a better way to do this? (Preferrably, there is a
    solution that can be applied to both HTTP and FTP.) Thanks in advance.
  • Troy Melhase

    #2
    Re: Determine Whether File Exists On HTTP Server

    On Saturday 22 May 2004 12:28 am, OvErboRed wrote:[color=blue]
    > Hi, I'm trying to determine whether a given URL exists. I'm new to Python
    > but I think that urllib is the tool for the job. However, if I give it a
    > non-existent file, it simply returns the 404 page. Aside from grepping this
    > for '404', is there a better way to do this? (Preferrably, there is a
    > solution that can be applied to both HTTP and FTP.) Thanks in advance.[/color]

    Try urllib2.urlopen , and put a try/except block around it. Here's what an
    unhandled exception from a 404 response looks like:

    Python 2.3.3 (#1, May 14 2004, 09:49:22)
    [GCC 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)] on linux2
    Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
    >>> import urllib2
    >>> handle = urllib2.urlopen ('http://google.com/this_page_doesn t_exist')[/color][/color][/color]
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "/usr/lib/python2.3/urllib2.py", line 129, in urlopen
    return _opener.open(ur l, data)
    File "/usr/lib/python2.3/urllib2.py", line 326, in open
    '_open', req)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
    return self.do_open(ht tplib.HTTP, req)
    File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
    return self.parent.err or('http', req, fp, code, msg, hdrs)
    File "/usr/lib/python2.3/urllib2.py", line 346, in error
    result = self._call_chai n(*args)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 472, in http_error_302
    return self.parent.ope n(new)
    File "/usr/lib/python2.3/urllib2.py", line 326, in open
    '_open', req)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
    return self.do_open(ht tplib.HTTP, req)
    File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
    return self.parent.err or('http', req, fp, code, msg, hdrs)
    File "/usr/lib/python2.3/urllib2.py", line 352, in error
    return self._call_chai n(*args)
    File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
    File "/usr/lib/python2.3/urllib2.py", line 412, in http_error_defa ult
    raise HTTPError(req.g et_full_url(), code, msg, hdrs, fp)
    urllib2.HTTPErr or: HTTP Error 404: Not Found

    --
    Troy Melhase, troy@gci.net
    --
    When Christ calls a man, he bids him come and die. - Dietrich Bonhoeffer


    Comment

    • FeU Hagen

      #3
      Re: Determine Whether File Exists On HTTP Server

      This works with HTTP:

      import sys # exc_info
      import httplib # HTTPConnection

      HOST = "www.python.org "
      PAGE = "/path/to/some/file.html"

      try:
      c = httplib.HTTPCon nection( HOST )
      # c._http_vsn = 10; c._http_vsn_str = "HTTP/1.0"
      c.connect( )
      c.putrequest ( "GET", PAGE )
      c.endheaders()
      r = c.getresponse()
      print "%s\n%s\n%s \n" % (r.status, r.reason, r.msg)
      if r.status == 200: # OK
      print "%s exists" % PAGE
      PageContent = r.read() # this is the requested html file in a
      string
      elif r.status == 404: # not found
      print "%s does not exist" % PAGE
      Page404 = r.read() # this is the 404 page in a string
      else:
      print "%s : status %s %s %s" % (PAGE, r.status, r.reason, r.msg)
      except:
      print sys.exc_info()[1]



      Greetings
      Harald Walter



      "OvErboRed" <publicNO@SPAMo verbored.net> wrote in message
      news:Xns94F1EA8 4483Byangstaove rbored@127.0.0. 1...[color=blue]
      > Hi, I'm trying to determine whether a given URL exists. I'm new to Python
      > but I think that urllib is the tool for the job. However, if I give it a
      > non-existent file, it simply returns the 404 page. Aside from grepping[/color]
      this[color=blue]
      > for '404', is there a better way to do this? (Preferrably, there is a
      > solution that can be applied to both HTTP and FTP.) Thanks in advance.[/color]


      Comment

      Working...