Agnostic fetching

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jorpheus

    Agnostic fetching


    OK, that sounds stupid. Anyway, I've been learning Python for some
    time now, and am currently having fun with the urllib and urllib2
    modules, but have run into a problem(?) - is there any way to fetch
    (urllib.retriev e) files from a server without knowing the filenames?
    For instance, there is smth like folder/spam.egg, folder/
    unpredictable.e gg and so on. If not, perhaps some kind of glob to
    create a list of existing files? I'd really appreciate some help,
    since I'm really out of my (newb) depth here.
  • Bruce Frederiksen

    #2
    Re: Agnostic fetching

    On Fri, 01 Aug 2008 17:05:00 -0700, jorpheus wrote:
    OK, that sounds stupid. Anyway, I've been learning Python for some
    time now, and am currently having fun with the urllib and urllib2
    modules, but have run into a problem(?) - is there any way to fetch
    (urllib.retriev e) files from a server without knowing the filenames?
    For instance, there is smth like folder/spam.egg, folder/
    unpredictable.e gg and so on. If not, perhaps some kind of glob to
    create a list of existing files? I'd really appreciate some help,
    since I'm really out of my (newb) depth here.
    You might try the os.path module and/or the glob module in the standard
    python library.

    Comment

    • Terry Reedy

      #3
      Re: Agnostic fetching



      jorpheus wrote:
      OK, that sounds stupid. Anyway, I've been learning Python for some
      time now, and am currently having fun with the urllib and urllib2
      modules, but have run into a problem(?) - is there any way to fetch
      (urllib.retriev e) files from a server without knowing the filenames?
      For instance, there is smth like folder/spam.egg, folder/
      unpredictable.e gg and so on. If not, perhaps some kind of glob to
      create a list of existing files? I'd really appreciate some help,
      since I'm really out of my (newb) depth here.
      If you are asking whether servers will let you go fishing around their
      file system, the answer is that http is not designed for that (whereas
      ftp is as long as you stay under the main ftp directory). You can try
      random file names, but the server may get unhappy and think you are
      trying to break in through a back door or something. You are *expected*
      to start at ..../index.html and proceed with the links given there. Or
      to use a valid filename that was retrieved by that method.

      Comment

      • Diez B. Roggisch

        #4
        Re: Agnostic fetching

        Bruce Frederiksen schrieb:
        On Fri, 01 Aug 2008 17:05:00 -0700, jorpheus wrote:
        >
        >OK, that sounds stupid. Anyway, I've been learning Python for some
        >time now, and am currently having fun with the urllib and urllib2
        >modules, but have run into a problem(?) - is there any way to fetch
        >(urllib.retrie ve) files from a server without knowing the filenames?
        >For instance, there is smth like folder/spam.egg, folder/
        >unpredictable. egg and so on. If not, perhaps some kind of glob to
        >create a list of existing files? I'd really appreciate some help,
        >since I'm really out of my (newb) depth here.
        >
        You might try the os.path module and/or the glob module in the standard
        python library.
        Not on remote locations. The only work on your local filesystem.

        Diez

        Comment

        • Michael Torrie

          #5
          Re: Agnostic fetching

          jorpheus wrote:
          OK, that sounds stupid. Anyway, I've been learning Python for some
          time now, and am currently having fun with the urllib and urllib2
          modules, but have run into a problem(?) - is there any way to fetch
          (urllib.retriev e) files from a server without knowing the filenames?
          For instance, there is smth like folder/spam.egg, folder/
          unpredictable.e gg and so on. If not, perhaps some kind of glob to
          create a list of existing files? I'd really appreciate some help,
          since I'm really out of my (newb) depth here.
          If you happen to have a URL that simply lists files, then what you have
          to do is relatively simple. Just fetch the html from the folder url,
          then parse the html and look for the anchor tags. You can then fetch
          those anchor urls that interest you. BeautifulSoup can help out with
          this. Should be able to list all anchor tags in an html string in just
          one line of code. Combine urllib2 and BeautifulSoup and you'll have a
          winner.

          Comment

          Working...