HTML Dom Parser

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • GoodMan

    HTML Dom Parser

    So I was looking for a way to be able to parse images (<img>) from a
    given url. It so happened that I stumbled upon a nice little piece of
    code called "PHP Simple HTML DOM Parser" found here :


    On the first page, I made a form where you can enter a URL, and then
    the script tries to fetch the images.
    It was indeed what I needed. A simple code like this did a portion of
    the job:

    <?php
    include('simple _html_dom.php') ; // This
    is the script, download it on their sourceforge.

    // Create DOM from URL or file
    $dom = file_get_dom('h ttp://www.google.com' ); // In my code,
    this was replaced by a url variable.

    // Find all <img>
    foreach($dom->find('img') as $element)

    echo $element->src . "<br />" ;

    $dom->clear();
    unset($dom);
    ?>

    I understood the code, but I'm still a newbie in PHP. What I still
    want to do is:

    *Be able to specify that it only fetches .jpeg files for example.
    *Only allow images that are bigger than a certain dimensions.
    *For now it only gives me the URL (relative or absolute, depending on
    the html of the source). What I also want is that it displays the
    images parsed.

    This is mainly for educational purposes, as the best way to learn PHP
    is to keep writing small applications with it. So if anyone can point
    me in the right direction, it'll be great. And if you know of another
    script with the same functionality, it'll be great, I like learning
    different ways to achieve something.

    Thanks!
  • Rik Wasmus

    #2
    Re: HTML Dom Parser

    On Mon, 19 May 2008 08:30:41 +0200, GoodMan <shoofionline@g mail.comwrote:
    So I was looking for a way to be able to parse images (<img>) from a
    given url. It so happened that I stumbled upon a nice little piece of
    code called "PHP Simple HTML DOM Parser" found here :
    http://simplehtmldom.sourceforge.net/
    Is it faster then PHP5's native DOM (don't mix up Dom & DOM in the manual
    though...).
    On the first page, I made a form where you can enter a URL, and then
    the script tries to fetch the images.
    It was indeed what I needed. A simple code like this did a portion of
    the job:
    >
    <?php
    include('simple _html_dom.php') ; // This
    is the script, download it on their sourceforge.
    >
    // Create DOM from URL or file
    $dom = file_get_dom('h ttp://www.google.com' ); // In my code,
    this was replaced by a url variable.
    >
    // Find all <img>
    foreach($dom->find('img') as $element)
    >
    echo $element->src . "<br />" ;
    >
    $dom->clear();
    unset($dom);
    ?>
    >
    I understood the code, but I'm still a newbie in PHP. What I still
    want to do is:
    >
    *Be able to specify that it only fetches .jpeg files for example.
    preg_match() the src attribute you found, or use DOM & XPATH with a more
    sofisticated XPATH query.
    *Only allow images that are bigger than a certain dimensions.
    getimagesize(), keep in mind relative URL's of the page, build a proper
    URL string for this.
    *For now it only gives me the URL (relative or absolute, depending on
    the html of the source). What I also want is that it displays the
    images parsed.
    Then output HTML, with img tags with the proper src attributes.
    --
    Rik Wasmus
    ....spamrun finished

    Comment

    Working...