Downloading and parsing web-stuff

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • David Rasmussen

    Downloading and parsing web-stuff

    Very basic:

    What is the easiest way in php to download the source code (HTML etc.)
    of a given URL (say, http://www.google.com) and parse this code for
    certain patterns?

    I guess my question can be split in two:

    1) How do I download a webpage (into a string or whatever)?

    2) How can I do string manupulation, regexp matching, information
    extraction etc. on the downloaded information?

    /David

  • BKDotCom

    #2
    Re: Downloading and parsing web-stuff


    David Rasmussen wrote:[color=blue]
    > I guess my question can be split in two:
    >
    > 1) How do I download a webpage (into a string or whatever)?[/color]

    $string = file_get_conten ts('http://some.url/blah');
    [color=blue]
    > 2) How can I do string manupulation, regexp matching, information
    > extraction etc. on the downloaded information?[/color]

    now look at the docs for preg_match or ereg
    I prefer preg_match

    if ( preg_match('|<t itle>(.*?)</title>|',$strin g,$matches) )
    {
    print_r($matche s);
    }

    Comment

    • Guest's Avatar

      #3
      Re: Downloading and parsing web-stuff

      Treat a full URL as a file.

      $contents = implode( file("http://www.google.com/", ''\n") );

      Then go to www.php.net/preg_match/ to read up on PCRE (Perl compatible
      regular expressions). See also ereg_* functions.

      HTH.

      -Mike

      --
      Melt away the Cellulite with Cellulean!



      "David Rasmussen" <david.rasmusse n@gmx.net> wrote in message
      news:42683c71$0 $158$edfadb0f@d text02.news.tel e.dk...[color=blue]
      > Very basic:
      >
      > What is the easiest way in php to download the source code (HTML etc.)
      > of a given URL (say, http://www.google.com) and parse this code for
      > certain patterns?
      >
      > I guess my question can be split in two:
      >
      > 1) How do I download a webpage (into a string or whatever)?
      >
      > 2) How can I do string manupulation, regexp matching, information
      > extraction etc. on the downloaded information?
      >
      > /David
      >[/color]


      Comment

      Working...