Get directory from http web site

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • rock69

    Get directory from http web site

    Hi all :)

    I was wondering if there's some neat and easy way to get the entire
    contents of a directory at a specific web url address.

    I have the following link:



    and as you can see it's just a list containing all the files (images)
    that I need. Is it possible to retrieve this list (not the physical
    files) and have it stored in a variable of type list or something?

    And, if so, what would be the easiest and most efficient way?

    Thank you so much in advance.

    Rock

  • Sybren Stuvel

    #2
    Re: Get directory from http web site

    rock69 enlightened us with:[color=blue]
    > I was wondering if there's some neat and easy way to get the entire
    > contents of a directory at a specific web url address. [...] Is it
    > possible to retrieve this list (not the physical files) and have it
    > stored in a variable of type list or something?[/color]

    Check out the chapter on HTML parsing at
    Explore the power of Python in the 2025: see our free tutorials and obtain the "Dive into Python" book by Mark Pilgrim in PDF and Kindle version.


    Sybren
    --
    The problem with the world is stupidity. Not saying there should be a
    capital punishment for stupidity, but why don't we just take the
    safety labels off of everything and let the problem solve itself?
    Frank Zappa

    Comment

    • Kent Johnson

      #3
      Re: Get directory from http web site

      rock69 wrote:[color=blue]
      > Hi all :)
      >
      > I was wondering if there's some neat and easy way to get the entire
      > contents of a directory at a specific web url address.
      >
      > I have the following link:
      >
      > http://www.infomedia.it/immagini/riviste/covers/cp
      >
      > and as you can see it's just a list containing all the files (images)
      > that I need. Is it possible to retrieve this list (not the physical
      > files) and have it stored in a variable of type list or something?[/color]

      BeautifulSoup and urllib do this easily:
      [color=blue][color=green][color=darkred]
      >>> from BeautifulSoup import BeautifulSoup
      >>> import urllib
      >>> data = urllib.urlopen( 'http://www.infomedia.i t/immagini/riviste/covers/cp/').read()
      >>> soup = BeautifulSoup(d ata)
      >>> anchors = soup.fetch('a')
      >>> len(anchors)[/color][/color][/color]
      164[color=blue][color=green][color=darkred]
      >>> for a in anchors[:10]:[/color][/color][/color]
      ... print a['href'], a.string
      ...
      ?N=D Name
      ?M=A Last modified
      ?S=A Size
      ?D=A Description
      /immagini/riviste/covers/ Parent Directory
      cp100.jpg cp100.jpg
      cp100sm.jpg cp100sm.jpg
      cp101.jpg cp101.jpg
      cp101sm.jpg cp101sm.jpg
      cp102.jpg cp102.jpg



      Kent

      Comment

      • lemon97@gmail.com

        #4
        Re: Get directory from http web site

        You might want to also modify your c:/python/Lib/urllib.py file.


        By adding/modifying the following headers.

        self.addheaders = [('User-agent', 'Mozilla/4.0')]
        #Trick the server into thinking it is explorer

        self.addheaders = [('Referer','htt p://www.infomedia.i t')]
        #Trick the site that you clicked on a link from their site.

        Comment

        Working...