Complex replace in php 4

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Fabri

    Complex replace in php 4

    I searched and tried to develop (with no luck) a function to do the
    following:


    I have a string that may be:

    "Le'ts go to <a href="my.htm">m y car</a>. Tomorrow I'll have to buy a
    new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"


    What I have to do is replace occurences of "car" with <a
    href="/...">car</aBUT in these cases:

    - if there is already a wrapped link
    - if car is part of another word


    Also, I'm using php4 so I can't use str_ireplace for case insensitive
    replace.

    Can you help me?

    Regards.

    --
    Fabri
    Tag Wii: 8680 1598 2246 2466

  • Mike P2

    #2
    Re: Complex replace in php 4

    On Apr 30, 6:13 pm, Fabri <farsi.i.cazzi. pro...@mai.ehwr ote:
    I searched and tried to develop (with no luck) a function to do the
    following:
    >
    I have a string that may be:
    >
    "Le'ts go to <a href="my.htm">m y car</a>. Tomorrow I'll have to buy a
    new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
    >
    What I have to do is replace occurences of "car" with <a
    href="/...">car</aBUT in these cases:
    >
    - if there is already a wrapped link
    - if car is part of another word
    Regular Expressions. preg_replace() is PERL compatible regular
    expressions, that's just my preference. You can use the normal PHP
    regex, too (ereg_replace() ).

    $string = preg_replace( '#([\s\(])my car([\s\)\.])#i', '$1<a
    href="...">my car</a>$2', $string );

    The character groups on either side of "my car" allow "my car" to be
    next to new lines, spaces, tabs, parenthesis, and periods. You can
    take away the character groups and $1 and $2 in the second argument to
    let it be replaced in any context, even in the middle of some text
    with no spaces. If you want to learn more on regular expressions,
    there might be a newsgroup for that too, and of coarse your favorite
    search engine will be a big help. There are entire books written on
    regex.

    preg_replace():


    Comment

    • Chung Leong

      #3
      Re: Complex replace in php 4

      On May 1, 12:13 am, Fabri <farsi.i.cazzi. pro...@mai.ehwr ote:
      I searched and tried to develop (with no luck) a function to do the
      following:
      >
      I have a string that may be:
      >
      "Le'ts go to <a href="my.htm">m y car</a>. Tomorrow I'll have to buy a
      new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
      >
      What I have to do is replace occurences of "car" with <a
      href="/...">car</aBUT in these cases:
      >
      - if there is already a wrapped link
      - if car is part of another word
      >
      Also, I'm using php4 so I can't use str_ireplace for case insensitive
      replace.
      >
      Can you help me?
      >
      Regards.
      >
      --
      Fabri
      Tag Wii: 8680 1598 2246 2466http://www.consolereco rds.it/forum/viewtopic.php?t =217
      Ah, you also need to avoid doing the replacement when the word appears
      in an HTML attribute. For example:

      <a href="http://www.google.pl/search?q=car".. . </a>

      A simple search and replace, even with regular expression, isn't going
      to work always. You will need to parse the HTML to some degree. Where
      is this text coming from? HTML that can contain Javascript will
      require a full-fledge parser.

      Comment

      • Mike P2

        #4
        Re: Complex replace in php 4

        I assumed that if it were wrapped in an anchor tag there would be no
        whitespace on the inside of the anchor tag. It won't replace the
        following:

        <a href="...">my car</a>

        unless he takes out the character groups

        -Mike PII

        Comment

        • Jon Slaughter

          #5
          Re: Complex replace in php 4


          "Fabri" <farsi.i.cazzi. propri@mai.ehwr ote in message
          news:46366a73$0 $4788$4fafbaef@ reader4.news.ti n.it...
          >I searched and tried to develop (with no luck) a function to do the
          >following:
          >
          >
          I have a string that may be:
          >
          "Le'ts go to <a href="my.htm">m y car</a>. Tomorrow I'll have to buy a new
          car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
          >
          >
          What I have to do is replace occurences of "car" with <a
          href="/...">car</aBUT in these cases:
          >
          - if there is already a wrapped link
          - if car is part of another word
          >
          >
          Also, I'm using php4 so I can't use str_ireplace for case insensitive
          replace.
          >
          Can you help me?
          >
          you want a basic recursive parser but its probably overly complicated for
          what you need.

          Its better if you can add some structural information to the tag that will
          be ignored by html. This will help you in more efficiently searching for
          car.

          You could do something like
          "Le'ts go to <a href="my.htm">m y car</a>. Tomorrow I'll have to buy a new
          car. My new <car>car</caris <em>red</em>! Please don't think to be in
          Nascar!!"
          and then just search for <car>car</carand replace it with the link.

          You could also do something like

          <span class="MyNewCar ">car</span>

          and essentially do the same with the added bonus that you can modify the
          style using css.

          i.e., search for <span class="MyNewCar "and replace it with

          <a href="..."><spa n class="MyNewCar ">car</span></a>


          Now if you can do the processing off line you want to write a simple
          recusive parser. What you do here is search for all instances of cars and
          then search backwards to make sure they are not contained in any <a href>
          tags. The issue here is that theoretically it could take very long to do
          this.

          Since you are making car something special I would imagine you could just
          add some structural information to it to make it special. If you are worried
          about apply the same thing twice so you get something like

          <a href="..."><a href="..."><spa n class="MyNewCar ">car</span></a></a>

          then its pretty easy to check to prevent that.

          I would suggest you play around with it using simple examples and see what
          you come up with. Its essentially just searching and I don't think you'll
          need more than that. (and I doubt you'll need regular expressions)

          Jon



          Comment

          • Toby A Inkster

            #6
            Re: Complex replace in php 4

            Fabri wrote:
            "Le'ts go to <a href="my.htm">m y car</a>. Tomorrow I'll have to buy a
            new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
            Try the following... [Credits due: Brad Choate, John Gruber, Matthew
            McGlynn and Alex Rosenberg for the _tokenize() function.]

            <?php

            // how many times do you want to allow the tokenizer to loop?
            // The higher the value, the longer your system could churn
            // given an infinite-loop bug (or really really really long text string).
            define('MAX_TOK ENIZER_LOOPS', 2000);

            // print error on tokenizer loop problem?
            define('ADVISE_ TOKENIZER_FAILU RE', FALSE);

            // keys for $tokens hash
            define('TOKENS_ TYPE_TEXT', 'text');
            define('TOKENS_ TYPE_TAG', 'tag');

            function _tokenize(&$str , &$tokens) {
            #
            # Parameter: Pointer to string containing HTML markup,
            # pointer to array to store results.
            #
            # Output array contains tokens comprising the input
            # string. Each token is either a tag (possibly with nested,
            # tags contained therein, such as <a href="<MTFoo>"> , or a
            # run of text between tags. Each element of the array is a
            # two-element array; the first is either 'tag' or 'text';
            # the second is the actual value.
            #
            # Based on the _tokenize() subroutine from Brad Choate's MTRegex plugin.
            # <http://www.bradchoate. com/past/mtregex.php>


            $len = strlen($str);

            $depth = 6;
            $nested_tags = str_repeat('(?: <(?:[^<>]|', $depth);
            $nested_tags = substr($nested_ tags, 0, -1);
            $nested_tags .= str_repeat(')*> )', $depth);

            $match = "/(?s: <! ( -- .*? -- \s* )+ ) |
            (?s: <\? .*? \?) |
            $nested_tags/x";

            $last_tag_end = -1;
            $loops = $offset = 0;

            //433 PHP 4.3.3 is required for this
            //433 while (preg_match($ma tch, $str, $hits, PREG_OFFSET_CAP TURE, $offset)) {
            while (preg_match($ma tch, substr($str, $offset), $hits, PREG_OFFSET_CAP TURE)) {

            $extracted_tag = $hits[0][0]; // contains the full HTML tag
            //433 $tag_start = (int)$hits[0][1]; // position of captured in string
            $tag_start = $offset + (int)$hits[0][1]; // position of captured in string
            $offset = $tag_start + 1; // tells preg_match where to start on next iteration

            // if this tag isn't next to the previous one, store the interstitial text
            if ($tag_start $last_tag_end) {
            $tokens[] = array('type' =TOKENS_TYPE_TE XT,
            'body' =substr($str, $last_tag_end+1 , $tag_start-$last_tag_end-1));
            }

            $tokens[] = array('type' =TOKENS_TYPE_TA G,
            'body' =$extracted_tag );

            $last_tag_end = $tag_start + strlen($extract ed_tag) - 1;

            if ($loops++ MAX_TOKENIZER_L OOPS) {

            if (ADVISE_TOKENIZ ER_FAILURE) {
            print "SmartyPant s _tokenize failure.";
            }
            return;
            }
            }


            // if text remains after the close of the last tag, grab it
            if ($offset < $len) {
            $tokens[] = array('type' =TOKENS_TYPE_TE XT,
            'body' =substr($str, $last_tag_end + 1));
            }

            return;

            }

            /**
            * Make a particular word in an HTML string into a link.
            *
            * @copyright Copyright (C) 2007 Toby A Inkster
            * @param string $haystack HTML string to search through.
            * @param string $needle Word or phrase to find.
            * @param string $link Link to add to this word. Opt; default Wikipedia.
            * @param boolean $case_sensitive Matching sensitivity. Opt; FALSE.
            */
            function linkity ($haystack, $needle, $link='', $case_sensitive =FALSE)
            {
            if ($link=='')
            $link = 'http://en.wikipedia.or g/wiki/'.ucfirst($word );

            $regexp = '#\b('.$word.') \b#'.($case_sen itive?'':'i');
            $inlink = FALSE;
            $out = '';

            $tokens = array();
            _tokenize($stri ng, $tokens);

            foreach ($tokens as $t)
            {
            if ($t['type']==TOKENS_TYPE_T AG)
            {
            if (preg_match('#< a#i', $t['body']))
            $inlink = TRUE;
            elseif (preg_match('#</a#i', $t['body']))
            $inlink = FALSE;
            $out .= $t['body'];
            }
            else
            {
            if ($inlink)
            $out .= $t['body'];
            else
            $out .= preg_replace($r egexp,
            "<a href=\"{$link}\ ">$1</a>",
            $t['body']);
            }
            }
            return $out;
            }

            # Test -- should only link the second and third occurances of the word 'car'.
            $str = 'Le\'ts go to <a href="my.htm">m y car</a>. Tomorrow I\'ll have to buy
            a new car. My new car is <em>red</em>! Please don\'t think to be in Nascar!!';
            print linkity($str, 'car')."\n";

            ?>


            --
            Toby A Inkster BSc (Hons) ARCS
            Fast withdrawal casino UK 2025 – Play now & cash out instantly! Discover the top sites for rapid, secure payouts with no delays.

            Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux

            Comment

            • Toby A Inkster

              #7
              Re: Complex replace in php 4

              Mike P2 wrote:
              I assumed that if it were wrapped in an anchor tag there would be no
              whitespace on the inside of the anchor tag. It won't replace the
              following:
              >
              <a href="...">my car</a>
              >
              unless he takes out the character groups
              However, yours will replace:

              <a href="...">my car is very fuel-efficient</a>

              --
              Toby A Inkster BSc (Hons) ARCS
              Fast withdrawal casino UK 2025 – Play now & cash out instantly! Discover the top sites for rapid, secure payouts with no delays.

              Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux

              Comment

              • Mike P2

                #8
                Re: Complex replace in php 4

                On May 1, 6:54 am, Toby A Inkster <usenet200...@t obyinkster.co.u k>
                wrote:
                Mike P2 wrote:
                I assumed that if it were wrapped in an anchor tag there would be no
                whitespace on the inside of the anchor tag. It won't replace the
                following:
                >
                <a href="...">my car</a>
                >
                unless he takes out the character groups
                >
                However, yours will replace:
                >
                <a href="...">my car is very fuel-efficient</a>
                Actually, it will not. '>' is not an accepted character in either
                character group.

                BTW, as I mentioned before, my idea assumes there will be no
                whitespace on the ends of the content of the link if there is one
                already. That can be fixed like this:

                <?php
                $search = 'my car';
                $link = '...';
                $string = 'my car is very fuel-efficient';

                $string = str_ireplace( $search, " $search ", $string );
                $string = preg_replace( '#(<a[^>]+>)(\s+)#i', '$2$1', " $string" );
                $string = preg_replace( '#(\s+)</a>#i', '</a>$1', $string );
                $string = preg_replace( "#([\s\(])$search([\s\)\.])#i", "$1<a
                href='$link'>$s earch</a>$2", $string );

                echo $string;
                ?>

                What I thought of is to add those two extra preg_replace()s before the
                main one that moves whitespace on edges from inside to outside of
                anchor tags. The middle preg_replace() may be optional, since the last
                one will not work if the words are butted up against the open tag
                anyway. Finally, I just added that str_ireplace() so it can even
                replace the keywords when next to or inside of some other tag. If you
                think this is too slow, consider taking out case insensitivity or the
                middle preg_replace() (or the first one maybe).

                -Mike PII

                Comment

                • Mike P2

                  #9
                  Re: Complex replace in php 4

                  Oh yea, and with that example I just posted, if you are going to
                  replace multiple keywords, you only need to run the first two (or one
                  of them if you are only going to use one of them) preg_replace()s
                  once. If you plan to make a function out of this, take out the first
                  two preg_replace()s and run them once separately before calling the
                  function.

                  -Mike PII

                  Comment

                  • gosha bine

                    #10
                    Re: Complex replace in php 4

                    On 01.05.2007 00:13 Fabri wrote:
                    I searched and tried to develop (with no luck) a function to do the
                    following:
                    >
                    >
                    I have a string that may be:
                    >
                    "Le'ts go to <a href="my.htm">m y car</a>. Tomorrow I'll have to buy a
                    new car. My new car is <em>red</em>! Please don't think to be in Nascar!!"
                    >
                    >
                    What I have to do is replace occurences of "car" with <a
                    href="/...">car</aBUT in these cases:
                    >
                    - if there is already a wrapped link
                    - if car is part of another word
                    >
                    >
                    Also, I'm using php4 so I can't use str_ireplace for case insensitive
                    replace.
                    >
                    Can you help me?
                    >
                    Regards.
                    >
                    Well, over 30 hours and still no correct answer... weird ;)

                    How about this:


                    $text = <<<EE
                    "Le'ts go to <a href="my.htm">m y car</a>.
                    Tomorrow I'll have to buy a
                    new car. My new car is <em>red</em>!
                    Please don't think to be in Nascar!!"
                    EE;

                    echo preg_replace(
                    '~\bcar\b(?![^<>]*</a>)~i',
                    "<a href='zzz'>$0</a>",
                    $text);

                    If you need comments, feel free to ask.



                    --
                    gosha bine

                    extended php parser ~ http://code.google.com/p/pihipi
                    blok ~ http://www.tagarga.com/blok

                    Comment

                    • Toby A Inkster

                      #11
                      Re: Complex replace in php 4

                      Mike P2 wrote:
                      Toby A Inkster wrote:
                      >
                      >However, yours will replace:
                      ><a href="...">my car is very fuel-efficient</a>
                      >
                      Actually, it will not. '>' is not an accepted character in either
                      character group.
                      Sorry -- hadn't noticed that you'd made "my car" the link target instead
                      of "car", which was what the OP had requested. OK then, yours will screw
                      up when it sees this as input:

                      <a href="...">and my car is very fuel-efficient</a>


                      --
                      Toby A Inkster BSc (Hons) ARCS
                      Fast withdrawal casino UK 2025 – Play now & cash out instantly! Discover the top sites for rapid, secure payouts with no delays.

                      Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux

                      Comment

                      • Toby A Inkster

                        #12
                        Re: Complex replace in php 4

                        gosha bine wrote:
                        Well, over 30 hours and still no correct answer... weird ;)
                        8-O (That's an "emoticon", not the Brazil/Andorra football results.)




                        --
                        Toby A Inkster BSc (Hons) ARCS
                        Fast withdrawal casino UK 2025 – Play now & cash out instantly! Discover the top sites for rapid, secure payouts with no delays.

                        Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux

                        Comment

                        Working...