php to spider a website

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Kyle Mizell

    php to spider a website

    I am looking for a script that I can use to spider a website, and then pull
    the images... I know how to do it for a single page, but, I would like to be
    able to do this for the entire site. Any suggestions?

    Thanks,
    Kyle Mizell



  • jn

    #2
    Re: php to spider a website

    "Kyle Mizell" <kyle@pimpinonl ine.comNOSPAM> wrote in message
    news:qewyb.1747 52$Dw6.686810@a ttbi_s02...[color=blue]
    > I am looking for a script that I can use to spider a website, and then[/color]
    pull[color=blue]
    > the images... I know how to do it for a single page, but, I would like to[/color]
    be[color=blue]
    > able to do this for the entire site. Any suggestions?
    >
    > Thanks,
    > Kyle Mizell
    > http://www.pimpinonline.com
    >
    >
    >[/color]

    I don't know about your question, but pimpinonline.co m is awesome.


    Comment

    • Kevin Thorpe

      #3
      Re: php to spider a website

      Kyle Mizell wrote:
      [color=blue]
      > I am looking for a script that I can use to spider a website, and then pull
      > the images... I know how to do it for a single page, but, I would like to be
      > able to do this for the entire site. Any suggestions?[/color]

      Why php? Use wget if all you want is a somple spider job.

      Comment

      • Andy Hassall

        #4
        Re: php to spider a website

        On Mon, 01 Dec 2003 00:49:26 GMT, "Kyle Mizell" <kyle@pimpinonl ine.comNOSPAM>
        wrote:
        [color=blue]
        >I am looking for a script that I can use to spider a website, and then pull
        >the images... I know how to do it for a single page, but, I would like to be
        >able to do this for the entire site. Any suggestions?[/color]

        PHP has HTTP client functions; you can simply use file() with a URL.

        However, to extract information from the HTML, you need an HTML parser
        (regular expressions alone are not sufficient). PHP doesn't have one built in
        or as one of the standard extensions. Personally I'd use Perl for this (e.g.
        HTML::Parser). I think there is an HTML parser for PHP called HTML-Sax, have a
        search for that.

        --
        Andy Hassall (andy@andyh.co. uk) icq(5747695) (http://www.andyh.co.uk)
        Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space)

        Comment

        • lazo

          #5
          Re: php to spider a website

          "Kyle Mizell" <kyle@pimpinonl ine.comNOSPAM> wrote in message news:<qewyb.174 752$Dw6.686810@ attbi_s02>...[color=blue]
          > I am looking for a script that I can use to spider a website, and then pull
          > the images... I know how to do it for a single page, but, I would like to be
          > able to do this for the entire site. Any suggestions?
          >
          > Thanks,
          > Kyle Mizell
          > http://www.pimpinonline.com[/color]

          As you do for one page do for all your pages.
          In one array store all links foud on first page (eliminate
          duplicates), then do for all this pages as for first page.
          I think the beset is to make function, which save one page and return
          found links, then call your function with all urls.
          While you are saving a page you have to replace links because static
          names will be diferent
          i.e.
          members.php?sea rch_sex=Male&se arch=kyle@pimpi nonline.com&uns et_search=true
          replace with
          members_php_sea rch_sex_Male_se arch_kyle_pimpi nonline_com_uns et_search_true. HTML

          and so name all stored pages.

          enjoy

          Comment

          Working...