[urllib2 + Tor] How to handle 404?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Gilles Ganault

    [urllib2 + Tor] How to handle 404?

    Hello

    I'm using the urllib2 module and Tor as a proxy to download data
    from the web.

    Occasionnally, urlllib2 returns 404, probably because of some issue
    with the Tor network. This code doesn't solve the issue, as it just
    loops through the same error indefinitely:

    =====
    for id in rows:
    url = 'http://www.acme.com/?code=' + id[0]
    while True:
    try:
    req = urllib2.Request (url, None, headers)
    response = urllib2.urlopen (req).read()
    except HTTPError,e:
    print 'Error code: ', e.code
    time.sleep(2)
    continue
    =====

    Any idea of what I should do to handle this error properly?

    Thank you.
  • Chris Rebert

    #2
    Re: [urllib2 + Tor] How to handle 404?

    On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <nospam@nospam. comwrote:
    Hello
    >
    I'm using the urllib2 module and Tor as a proxy to download data
    from the web.
    >
    Occasionnally, urlllib2 returns 404, probably because of some issue
    with the Tor network. This code doesn't solve the issue, as it just
    loops through the same error indefinitely:
    >
    =====
    for id in rows:
    url = 'http://www.acme.com/?code=' + id[0]
    while True:
    try:
    req = urllib2.Request (url, None, headers)
    response = urllib2.urlopen (req).read()
    except HTTPError,e:
    print 'Error code: ', e.code
    time.sleep(2)
    continue
    else: #should align with the `except`
    break
    handle_success( response) #should align with `url =` line

    Cheers,
    Chris
    --
    Follow the path of the Iguana...

    =====
    >
    Any idea of what I should do to handle this error properly?
    >
    Thank you.
    --

    >

    Comment

    • Steven McKay

      #3
      Re: [urllib2 + Tor] How to handle 404?

      On Fri, Nov 7, 2008 at 2:28 AM, Chris Rebert <clp@rebertia.c omwrote:
      >
      On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <nospam@nospam. comwrote:
      Hello

      I'm using the urllib2 module and Tor as a proxy to download data
      from the web.

      Occasionnally, urlllib2 returns 404, probably because of some issue
      with the Tor network. This code doesn't solve the issue, as it just
      loops through the same error indefinitely:

      =====
      *snip*
      >
      Cheers,
      Chris
      --
      Follow the path of the Iguana...

      >
      =====

      Any idea of what I should do to handle this error properly?

      Thank you.
      --
      --
      http://mail.python.org/mailman/listinfo/python-list
      It sounds like Gilles may be having an issue with persistent 404s, in
      which case something like this could be more appropriate:

      for id in rows:
      url = 'http://www.acme.com/?code=' + id[0]
      retries = 0
      while retries < 10:
      try:
      req = urllib2.Request (url, None, headers)
      response = urllib2.urlopen (req).read()
      except HTTPError,e:
      print 'Error code: ', e.code
      retries += 1
      time.sleep(2)
      continue
      else: #should align with the `except`
      break
      else:
      print 'Fetch of ' + url + ' failed after ' + retries + 'tries.'
      handle_success( response) #should align with `url =` line

      Comment

      Working...