about preg_match_all statement

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • swethak
    New Member
    • May 2008
    • 118

    about preg_match_all statement

    hi,

    i write the below code to capture the images from the website when i submit the url.In the same way i want to capture the Text information from the website.plz tell that whats the code for that.plz help me.


    [php]

    <?php

    $content= file_get_conten ts($url);
    preg_match_all( "/<img(.*)src=(\" |')(.*)(\"|\')( .*)[\/]?>/siU", $content, $match, PREG_PATTERN_OR DER);

    echo "<b>Capture Images :</b><br>";
    echo "<br>";
    print_r($match[0]);
    echo "<br>";
    echo "<br>";
    echo "<b>Capture Images URLS :</b><br><br>";
    preg_match_all( "/<img(.*)src=(\" |')(.*)(\"|\')( .*)[\/]?>/siU", $content, $match, PREG_PATTERN_OR DER);
    print_r($match[3]);
    [/php]
  • pbmods
    Recognized Expert Expert
    • Apr 2007
    • 5821

    #2
    Heya, Swethak.

    What is your code doing now that is different from what you want it to do?

    Comment

    • swethak
      New Member
      • May 2008
      • 118

      #3
      Originally posted by pbmods
      Heya, Swethak.

      What is your code doing now that is different from what you want it to do?

      It is for capture the images from website. i Want capture the text information from website.

      Comment

      • Gulzor
        New Member
        • Jul 2008
        • 27

        #4
        If you are working with PHP5, you can use the DOM API for that.

        Adapt this to your needs :
        [php]
        <?php
        $htmlString = file_get_conten ts('url_or_path _to_html_file') ;
        $htmlDoc = DOMDocument::lo adHTML($htmlStr ing);
        $xpath = new DOMXPath($htmlD oc);

        /* fetch the content of all <p> tags */
        $pNodesList = $xpath->query('//p');
        for ($i=0; $i<$pNodesList->length; $i++) {
        $pNode = $pNodesList->item($i);
        echo $pNode->nodeValue, "\n";
        }

        ?>
        [/php]

        May not be the best method but I prefer handling HTML document with the DOM API instead of knocking my head on the walls with regex :P

        Comment

        • swethak
          New Member
          • May 2008
          • 118

          #5
          Originally posted by Gulzor
          If you are working with PHP5, you can use the DOM API for that.

          Adapt this to your needs :
          [php]
          <?php
          $htmlString = file_get_conten ts('url_or_path _to_html_file') ;
          $htmlDoc = DOMDocument::lo adHTML($htmlStr ing);
          $xpath = new DOMXPath($htmlD oc);

          /* fetch the content of all <p> tags */
          $pNodesList = $xpath->query('//p');
          for ($i=0; $i<$pNodesList->length; $i++) {
          $pNode = $pNodesList->item($i);
          echo $pNode->nodeValue, "\n";
          }

          ?>
          [/php]

          May not be the best method but I prefer handling HTML document with the DOM API instead of knocking my head on the walls with regex :P
          I used like that way i got below errors.plz tell that whats the mistake.

          Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

          Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

          Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

          Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

          Comment

          • Gulzor
            New Member
            • Jul 2008
            • 27

            #6
            These are "just" warnings resulting in wrong or unsupported html entities or something else. It's just impossible to parse a html document without getting these warnings...

            If your texts are not between <p></p>, you can replace //p by //td. Like I said, you need to adapt it to your needs.

            Comment

            • swethak
              New Member
              • May 2008
              • 118

              #7
              Originally posted by Gulzor
              These are "just" warnings resulting in wrong or unsupported html entities or something else. It's just impossible to parse a html document without getting these warnings...

              If your texts are not between <p></p>, you can replace //p by //td. Like I said, you need to adapt it to your needs.

              If i use the condition as if the data is in between <p> tags it shows the data otherwise it didn't give any error.How i use the condition for that .Plz help me.

              Comment

              • Gulzor
                New Member
                • Jul 2008
                • 27

                #8
                Originally posted by swethak
                If i use the condition as if the data is in between <p> tags it shows the data otherwise it didn't give any error.How i use the condition for that .Plz help me.
                I don't understand what your problem is now... not only <p> tag hold texts. <li>, <td>, <span> and more also do.

                Comment

                • mobs
                  New Member
                  • Aug 2008
                  • 1

                  #9
                  Say that I just wanted to retrieve the number 30735 from the following code, how would you go about doing that?

                  Code:
                  <a href="/?item=30735">River Runner</a>

                  Comment

                  • pbmods
                    Recognized Expert Expert
                    • Apr 2007
                    • 5821

                    #10
                    Heya, Mobs. Welcome to Bytes!

                    The only part that we really care about is:
                    Code:
                    <a href="/?item=30735
                    Now, we have to make a couple of assumptions:
                    • The URL might have a path and/or other query variables prepended. E.g.:
                      Code:
                      <a href="/path/to/some.php?file=test&item=123456"
                    • The URL might have some stuff after it. E.g.,:
                      Code:
                      <a href="/?item=654321&amp;visitor=1"
                    • The anchor tag might have attributes before the href attribute. E.g.,:
                      Code:
                      <a target="_blank" href="/?item=13579"


                    We are going to assume that the tag is well-formed (ends with a '>' and the href attribute is properly-quoted with any quotes inside of it percent- or ampersand-escaped).

                    With that in mind, we need to be able to skip over anything we don't care about and focus only on what we want:

                    [code=regexp]
                    /<a[^>]*href="[^"]+item=(\d+)/
                    [/code]

                    This should be enough to harvest item IDs from anchor tags on the page.

                    Comment

                    • swethak
                      New Member
                      • May 2008
                      • 118

                      #11
                      about preg_match_all statement

                      hi,

                      i write a code to capture all the information in between <p> tags.But In between the <p> tags some <img> tags also there.And i write a condition as i capture all the information in between <p> tags and didn't take the img tags information.How i write the condition for that.plz help me.

                      [php]
                      <?php
                      $content= file_get_conten ts('http://www.website.com ');
                      preg_match_all( '/<p (.*)>(.*)<\/p>/s', $content, $match, PREG_PATTERN_OR DER);

                      echo "<b>Capture Images :</b><br>";
                      echo "<br>";
                      print_r($match[0]);
                      ?>
                      [/php]


                      In that preg_match_all( ( '/<p (.*)>(.* In that how i add the condition as not take image tags.Anybody plz give reply.

                      Comment

                      Working...