Regex Nested Backreferences

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Allen

    Regex Nested Backreferences

    For my web-based php regex find/replace do-hickey, I need to match
    individual back references and wrap a tag around them so they'll be unique
    to the rest of the match for individual color markup. Initially this
    would seem easy enough, however not all of a potential regex match is
    going to be within a back reference. So it's necessary to replace the
    back reference, and only the back reference, while preserving the context
    of the match. For example, if I were to search the text

    fish this fish fish

    looking for
    ..*?(?<=this )(fish).*

    I'd match everything, capturing the second instance of fish into the back
    reference. I can't simply take the match and run a replace for fish in
    order to apply the highlighting, because then i'd end up with 3
    highlighted "fish", 2 of which weren't supposed to be. I also couldn't
    simply return the back reference with the markup, as that wouldn't return
    the non-back referenced stuff.

    My initial solution was to run the original find text over the match to
    get the back references, using an extra flag to have it return the offset
    of each back reference. So now I have the location of the text within the
    string, and can get the length of it from that point from the string
    itself. Going backwards so as not to mess with the numeric location with
    in the string, it captures back references without losing context or
    data. Perfect.

    .. . . until back references are nested.

    In this example:
    (.*?(?<=this )(fish).*)

    back reference 1 would be fish this fish fish, back reference 2 would be
    fish -- here's where the problem surfaces.

    If I wrap back reference 2 in the markup, when I apply back reference 1's
    markup it's going to apply the end tag in the wrong place since the string
    has increased and the original length calculated no longer applies. If I
    replace back reference 1 first, same problem. I'm sure there's some
    obvious, simple solution I'm overlooking having exhausted a bunch of
    complex attempts to compensate for it. Any fresh perspectives on the best
    way to markup nested groups while preserving the integrity of the return?

    Below is the function the matches are being passed through, you'll see I'm
    useing preg_match_all to get the capture groups as well as the match
    location and then using substr_repalce to insert the pseudo-markup.

    function hltr($text,$fin d) {
    preg_match_all( $find,$text,$hl ight,PREG_OFFSE T_CAPTURE+PREG_ SET_ORDER);
    if ( isset($_POST['debug']) || isset($_GET['debug']) ) {
    echo "<pre>";
    print_r($hlight );
    echo "</pre>";
    }
    $n=count($hligh t[0])-1;
    $text = $hlight[0][0][0];
    while ( $n > 0 ) {
    $text =
    substr_replace( $text,"back$n:: ".$hlight[0][$n][0]."::bk",$hli ght[0][$n][1],strlen($hlight[0][$n][0]));
    $n--;
    }
    return('<strong class="result"> '.$text.'</strong>');
    }

    To see it highlight backreferences correctly:

    And failing on nested groups


    Thanks . . .

    Allen
  • bobzimuta

    #2
    Re: Regex Nested Backreferences



    Comment

    • Allen

      #3
      Re: Regex Nested Backreferences

      On Mon, 06 Feb 2006 20:20:58 -0500, bobzimuta <ejmalone@gmail .com> wrote:
      [color=blue]
      > http://roblocher.com/technotes/regexp.aspx[/color]

      I don't believe you read my message, Bob -- I'm not asking for help with
      regex, I know regex. My problem is that I'm trying to take regex and
      highlight various aspects of the syntax, in this case the different sub
      groups. Had you read the post, you'd have seen that the links to what I'm
      working on can do everything and more than what you linked to. Thanks
      anyway.

      Allen

      Comment

      • bobzimuta

        #4
        Re: Regex Nested Backreferences

        I skimmed. I saw you wanted to do some highlighting of regex matches.
        This guy (Rob Locher) wrote a nice regex highlighter. Thought you could
        possibly get something useful out of it (i.e. analyze his algorithm).
        You're welcome anyway.

        Comment

        • Allen

          #5
          Re: Regex Nested Backreferences

          On Tue, 07 Feb 2006 19:50:12 -0500, bobzimuta <ejmalone@gmail .com> wrote:
          [color=blue]
          > I skimmed. I saw you wanted to do some highlighting of regex matches.
          > This guy (Rob Locher) wrote a nice regex highlighter. Thought you could
          > possibly get something useful out of it (i.e. analyze his algorithm).
          > You're welcome anyway.[/color]

          I'd have appreciated that explanation -- at any rate, I'm sorry for my
          curt response, I'd spent too many hours with code to be any good with
          people. I did put together a solution, The working model is linked
          below. I might have to check out his source to see if there's anything I
          can glean from it anyway. Thanks.

          A.

          --

          A Web based regular expressions powered find/replace utility

          Comment

          Working...