Could do with a little help with this preg_match()

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jeddiki
    Contributor
    • Jan 2009
    • 290

    Could do with a little help with this preg_match()

    Hi,
    I am wanting to do a simple extraction of the
    three key header elements from a web page namely these:


    <title>This the Title</title>
    <meta name="keywords" content="PHP, javascript, other keywords" />
    <meta name="descripti on" content="This is the description." />
    Is the preg_match() function the best way to find them and put them into variables ?

    If they are not found of the web page I would like to fill the relevant variable with "Not found".

    I have wriiten this code but I am not sure if it is the best approach or if the logic is correct.

    Code:
    $title = preg_match("/<title>(.*?)</title>/",$text,$matches);
    if ($title === false) {
       $title = "None found";
       }
    
    $descrip = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
    if ($descrip === false) {
       $descrip = "None found";
       }
    
    $keys = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
    if ($keys === false) {
       $keys = "None found";
       }
    Any suggestions, corrections most welcome. :)
  • Dormilich
    Recognized Expert Expert
    • Aug 2008
    • 8694

    #2
    question number one: does it work?

    Comment

    • jeddiki
      Contributor
      • Jan 2009
      • 290

      #3
      That a great question.

      I couldn't test it cos of server problems,

      Now back on line :)

      From the code below, I get this result:

      Warning: preg_match() [function.preg-match]: Unknown modifier 't' in /home/guru54gt5/public_html/sys/get_google.php on line 130

      Title: None found
      Descrip: 0
      Keys: 0

      Code:
      $title = preg_match("/<title>(.*?)</title>/",$text,$matches);
       if ($title === false) {
          $title = "None found";
          }
        
      $descrip = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
       if ($descrip === false) {
          $descrip = "None found";
         }
        
      $keys = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
       if ($keys === false) {
          $keys = "None found";
          }
      
      echo "<br>Title: $title<br>Descrip: $descrip<br>Keys: $keys";
      Must be something in the first preg_match() I guess.

      Comment

      • jeddiki
        Contributor
        • Jan 2009
        • 290

        #4
        I changed it and got rid of the errors
        but I am still not picking up content.

        this is what I have:

        Code:
        $title = "None found";
        $descrip = "None found";
        $keys = "None found";
        
        $flag = preg_match("/<title>(.*?)<\/title>/",$text,$matches);
         if ($flag == 1) {
            $title = $matches[0];
            }
          
        $flag = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
         if ($flag == 1) {
            $descrip = $matches[0];
            }
        
        $flag = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
         if ($flag == 1) {
            $keys = $matches[0];
            }
        
        echo "<br>Title: $title<br>Descrip: $descrip<br>Keys: $keys";
        Of course my output is:
        Title: None found
        Descrip: None found
        Keys: None found
        any ideas ??

        Comment

        • Markus
          Recognized Expert Expert
          • Jun 2007
          • 6092

          #5
          Originally posted by jeddiki
          That a great question.

          I couldn't test it cos of server problems,

          Now back on line :)

          From the code below, I get this result:

          Warning: preg_match() [function.preg-match]: Unknown modifier 't' in /home/guru54gt5/public_html/sys/get_google.php on line 130

          Title: None found
          Descrip: 0
          Keys: 0

          Code:
          $title = preg_match("/<title>(.*?)</title>/",$text,$matches);
           if ($title === false) {
              $title = "None found";
              }
            
          $descrip = preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$text,$matches);
           if ($descrip === false) {
              $descrip = "None found";
             }
            
          $keys = preg_match("/<meta name=\"keywords\" content=\"(.*?)\"/",$text,$matches);
           if ($keys === false) {
              $keys = "None found";
              }
          
          echo "<br>Title: $title<br>Descrip: $descrip<br>Keys: $keys";
          Must be something in the first preg_match() I guess.
          Ah - the problem is that the forward slash in </title> closes the regular expression pattern (if you notice that the opening delimiter is also the forward slash character, this will make sense). So then PHP sees the following 't' as being a modifier (a character that follows the closing delimiter and has some effect on the pattern).

          So, to solve this you could either
          • change the delimiters to some other character (I generally use #)
          • or, escape the forward slash with a back-slask; </title> would become <\/title>


          Mark.

          Comment

          • jeddiki
            Contributor
            • Jan 2009
            • 290

            #6
            Thanks Markus,
            But I beat you to it ;-)

            I fixed that and posted the updated version.

            Any idea why I don't pick up the data ?

            Comment

            Working...