[regex] Why doesn't preg_replace work?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Gilles Ganault

    [regex] Why doesn't preg_replace work?

    Hello

    I went through some examples, tried a bunch of things... but still
    can't figure out why I can't extract the TITLE section of a web page
    using preg_replace():

    -----------
    <?php

    $url = "http://www.cnn.com";

    $response = file_get_conten ts($url);

    $output=preg_re place("|<title> (.+?)</title>|smiU",
    "TITLE=$1",
    $response);

    $fp = fopen ("output.htm l", "w");
    fputs ($fp,$output);
    fclose($fp);
    -----------

    Any idea?

    Thanks!
  • klenwell

    #2
    Re: Why doesn't preg_replace work?

    Hi Gilles,

    I'm not a regex guru, but I can see a spot a couple problem areas in
    your expression:

    1. The core syntax could probably be simplified using something like
    this:

    |^<title>([^<]+)</title>$|i

    I hope I got that right -- I usually have to test my expression a few
    times before I get all the nuances right. :)

    2. smiU - That's modifier overkill. The U here and the ? in your
    expression are probably reacting to each other in unexpected ways. If
    you don't know about this page, it can help:



    I have a prefab function I've used for this very thing, but
    unfortunately I don't have access to it that moment. Hopefully,
    someone will be along shortly with the proper syntax. In the
    meantime, I hope this helps in a more general sense.

    Regards,
    Tom



    On May 7, 4:30 pm, Gilles Ganault <nos...@nospam. comwrote:
    Hello
    >
    I went through some examples, tried a bunch of things... but still
    can't figure out why I can't extract the TITLE section of a web page
    using preg_replace():
    >
    -----------
    <?php
    >
    $url = "http://www.cnn.com";
    >
    $response = file_get_conten ts($url);
    >
    $output=preg_re place("|<title> (.+?)</title>|smiU",
    "TITLE=$1",
    $response);
    >
    $fp = fopen ("output.htm l", "w");
    fputs ($fp,$output);
    fclose($fp);
    -----------
    >
    Any idea?
    >
    Thanks!

    Comment

    • Gilles Ganault

      #3
      Re: Why doesn't preg_replace work?

      On 7 May 2007 16:46:27 -0700, klenwell <klenwell@gmail .comwrote:
      >2. smiU - That's modifier overkill. The U here and the ? in your
      >expression are probably reacting to each other in unexpected ways.
      Ah, ah... Indeed, it seems like it's either using the U switch to make
      Preg non-greedy, or use the ? limiter (eg. ".+?"). Thanks for pointing
      it out.

      Found it: To extract bits, I shouldn't use preg_replace() but
      preg_match():

      --------------
      $url = "http://www.cnn.com";
      $response = file_get_conten ts($url);

      preg_match("|<t itle>(.+?)</title>|smi",$re sponse,$matches );
      $response = $matches[1];

      $fp = fopen ("output.htm l", "w");
      fputs ($fp,$response) ;
      fclose($fp);
      --------------

      Thank you.

      Comment

      Working...