Grab Text Between Tags

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mgriggs13
    New Member
    • Apr 2007
    • 2

    Grab Text Between Tags

    I have researched the jist behind making this work but I am having a unique problem that I can't seem to fix.

    I use the following code to grab the text between these two tags.
    Code:
    $content = "[tag]Hello[/tag]";
    preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
    print_r($matches);
    This yields the following results succesfully:
    Code:
    Array
    (
        [0] => Array
            (
                [0] => [tag]Hello[/tag]
            )
        [1] => Array
            (
                [0] => [tag]
            )
        [2] => Array
            (
                [0] => tag
            )
        [3] => Array
            (
                [0] => Hello
            )
        [4] => Array
            (
                [0] => [/tag]
            )
    )
    The problem arises when there are duplicate tags.
    Code:
    $content = "[tag]Hello[/tag] More Text [tag]Hello2[/tag]";
    preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
    print_r($matches);
    With this I end up with the following:
    Code:
    Array
    (
        [0] => Array
            (
                [0] => [tag]Hello[/tag] More Text [tag]Hello2[/tag]
            )
        [1] => Array
            (
                [0] => [tag]
            )
        [2] => Array
            (
                [0] => tag
            )
        [3] => Array
            (
                [0] => Hello[/tag] More Text [tag]Hello2
            )
        [4] => Array
            (
                [0] => [/tag]
            )
    )
    I need to be able to have it stop at the first close tag and then register the second open and close tag in a different array. Any Ideas?
  • bucabay
    New Member
    • Apr 2007
    • 18

    #2
    Originally posted by mgriggs13
    I have researched the jist behind making this work but I am having a unique problem that I can't seem to fix.

    I use the following code to grab the text between these two tags.
    Code:
    $content = "[tag]Hello[/tag]";
    preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
    print_r($matches);
    This yields the following results succesfully:
    Code:
    Array
    (
        [0] => Array
            (
                [0] => [tag]Hello[/tag]
            )
        [1] => Array
            (
                [0] => [tag]
            )
        [2] => Array
            (
                [0] => tag
            )
        [3] => Array
            (
                [0] => Hello
            )
        [4] => Array
            (
                [0] => [/tag]
            )
    )
    The problem arises when there are duplicate tags.
    Code:
    $content = "[tag]Hello[/tag] More Text [tag]Hello2[/tag]";
    preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
    print_r($matches);
    With this I end up with the following:
    Code:
    Array
    (
        [0] => Array
            (
                [0] => [tag]Hello[/tag] More Text [tag]Hello2[/tag]
            )
        [1] => Array
            (
                [0] => [tag]
            )
        [2] => Array
            (
                [0] => tag
            )
        [3] => Array
            (
                [0] => Hello[/tag] More Text [tag]Hello2
            )
        [4] => Array
            (
                [0] => [/tag]
            )
    )
    I need to be able to have it stop at the first close tag and then register the second open and close tag in a different array. Any Ideas?
    What is happening is that PHP is matching as many characters as it can for the quantifier .* in your regular expression.

    To make the quantifiers match the least amount of characters use the "un-greedy" indicator, "?".

    eg:

    [PHP]preg_match_all( "/(\[([\w]+)\])(.*?)(\[\/\\2\])/", $content, $matches);[/PHP]

    notice you now have (.*?) matching the characters in between tags rather than the previous (.*)

    (.*?) will match until it reaches the first (\[\/\\2\])

    before you had:

    (.*) will match until it reaches the last (\[\/\\2\])

    Comment

    • mgriggs13
      New Member
      • Apr 2007
      • 2

      #3
      Originally posted by bucabay
      What is happening is that PHP is matching as many characters as it can for the quantifier .* in your regular expression.

      To make the quantifiers match the least amount of characters use the "un-greedy" indicator, "?".

      eg:

      [PHP]preg_match_all( "/(\[([\w]+)\])(.*?)(\[\/\\2\])/", $content, $matches);[/PHP]

      notice you now have (.*?) matching the characters in between tags rather than the previous (.*)

      (.*?) will match until it reaches the first (\[\/\\2\])

      before you had:

      (.*) will match until it reaches the last (\[\/\\2\])

      Rock on! Thanks a lot. I thought I had tried that but I guess working at 4 am with 0 sleep can screw with your head. Thanks again.

      Comment

      Working...