preg_replace help: removing target="_blank" from links

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Sugapablo

    preg_replace help: removing target="_blank" from links

    I admit, I'm terrible creating reg ex's.

    I'm trying to create a preg_replace that would remove from a <a href> tag
    that would replace the target attribute regardless of what the value might
    be.

    Any ideas?


    --
    [ Sugapablo ]
    [ http://www.sugapablo.net <--personal | http://www.sugapablo.com <--music ]
    [ http://www.2ra.org <--political | http://www.subuse.net <--discuss ]

  • Joe Estock

    #2
    Re: preg_replace help: removing target=&quot;_b lank&quot; from links

    Sugapablo wrote:[color=blue]
    > I admit, I'm terrible creating reg ex's.
    >
    > I'm trying to create a preg_replace that would remove from a <a href> tag
    > that would replace the target attribute regardless of what the value might
    > be.
    >
    > Any ideas?
    >
    >[/color]
    $foo = '<a href="http://www.foo.com" target="_blank" >www.foo.com</a>';
    $foo = preg_replace('/(<a.*?)[ ]?target="_blank "(.*?)/', '$1$2', $foo);

    That should work.

    Joe Estock

    Comment

    • Sugapablo

      #3
      Re: preg_replace help: removing target=&quot;_b lank&quot; from links

      On Tue, 12 Apr 2005 13:41:09 +0000, Joe Estock wrote:
      [color=blue]
      > $foo = '<a href="http://www.foo.com" target="_blank" >www.foo.com</a>';
      > $foo = preg_replace('/(<a.*?)[ ]?target="_blank "(.*?)/', '$1$2', $foo);
      >
      > That should work.[/color]

      Actually, now that I think about it, what about a preg_replace that would
      replace any attribute except for the href attribure?

      --
      [ Sugapablo ]
      [ http://www.sugapablo.net <--personal | http://www.sugapablo.com <--music ]
      [ http://www.2ra.org <--political | http://www.subuse.net <--discuss ]

      Comment

      • Joe Estock

        #4
        Re: preg_replace help: removing target=&quot;_b lank&quot; from links

        Sugapablo wrote:[color=blue]
        > On Tue, 12 Apr 2005 13:41:09 +0000, Joe Estock wrote:
        >
        >[color=green]
        >>$foo = '<a href="http://www.foo.com" target="_blank" >www.foo.com</a>';
        >>$foo = preg_replace('/(<a.*?)[ ]?target="_blank "(.*?)/', '$1$2', $foo);
        >>
        >>That should work.[/color]
        >
        >
        > Actually, now that I think about it, what about a preg_replace that would
        > replace any attribute except for the href attribure?
        >[/color]
        preg_replace('/(<a href=).*?(.*?)> (.*?)/', '$1$2$3', $text); should do
        it for you.

        Joe Estock

        Comment

        • Sugapablo

          #5
          Re: preg_replace help: removing target=&quot;_b lank&quot; from links

          On Tue, 12 Apr 2005 14:11:12 +0000, Joe Estock wrote:
          [color=blue]
          > preg_replace('/(<a href=).*?(.*?)> (.*?)/', '$1$2$3', $text); should do
          > it for you.[/color]

          No, sorry.

          This:
          <?php

          $text = "<a href=\"test.htm l\" target=\"_blank \">test</a>";

          echo preg_replace('/(<a href=).*?(.*?)> (.*?)/', '$1$2$3', $text);

          ?>

          produced this:
          <a href="test.html " target="_blank" test></a>

          --
          [ Sugapablo ]
          [ http://www.sugapablo.net <--personal | http://www.sugapablo.com <--music ]
          [ http://www.2ra.org <--political | http://www.subuse.net <--discuss ]

          Comment

          • Justin Koivisto

            #6
            Re: preg_replace help: removing target=&quot;_b lank&quot; from links

            Sugapablo wrote:
            [color=blue]
            > On Tue, 12 Apr 2005 13:41:09 +0000, Joe Estock wrote:
            >
            >[color=green]
            >>$foo = '<a href="http://www.foo.com" target="_blank" >www.foo.com</a>';
            >>$foo = preg_replace('/(<a.*?)[ ]?target="_blank "(.*?)/', '$1$2', $foo);
            >>
            >>That should work.[/color]
            >
            > Actually, now that I think about it, what about a preg_replace that would
            > replace any attribute except for the href attribure?
            >[/color]

            Try something like this (untested):

            $pattern='`<a\s +[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)</a>`isU';
            $string=preg_re place($pattern, "<a $1>$3</a>",$string);

            --
            Justin Koivisto - justin@koivi.co m

            Comment

            • Joe Estock

              #7
              Re: preg_replace help: removing target=&quot;_b lank&quot; from links

              Sugapablo wrote:[color=blue]
              > On Tue, 12 Apr 2005 14:11:12 +0000, Joe Estock wrote:
              >
              >[color=green]
              >>preg_replace( '/(<a href=).*?(.*?)> (.*?)/', '$1$2$3', $text); should do
              >>it for you.[/color]
              >
              >
              > No, sorry.
              >
              > This:
              > <?php
              >
              > $text = "<a href=\"test.htm l\" target=\"_blank \">test</a>";
              >
              > echo preg_replace('/(<a href=).*?(.*?)> (.*?)/', '$1$2$3', $text);
              >
              > ?>
              >
              > produced this:
              > <a href="test.html " target="_blank" test></a>
              >[/color]
              sorry, that should have been preg_replace('/(<a href=".*?").*?( >.*?)/',
              '$1$2', $text);

              Comment

              • Steven Vasilogianis

                #8
                Re: preg_replace help: removing target=&quot;_b lank&quot; from links

                Joe Estock <jestock@NOSPAM nutextonline.co m> writes:

                [remove all attributes except href from anchor tag]
                [color=blue]
                > sorry, that should have been preg_replace('/(<a
                > href=".*?").*?( >.*?)/', '$1$2', $text);[/color]

                This, along with every other solution thus far posted, relies on href
                being the first attribute. Parsing HTML with regexes is extremely
                painful and much better suited to an HTML Parser. Unfortunately, I've
                never used an HTML Parser in PHP (AFAIK, one was not available until
                about a year ago, which is actually kind of ironic).

                See <URL:http://us3.php.net/tidy>. Good luck though, there doesn't
                seem to be much documentation. There's a tutorial here:
                <URL:http://www.zend.com/php5/articles/php5-tidy.php>.

                Comment

                • Justin Koivisto

                  #9
                  Re: preg_replace help: removing target=&quot;_b lank&quot; from links

                  Steven Vasilogianis wrote:
                  [color=blue]
                  > This, along with every other solution thus far posted, relies on href
                  > being the first attribute. Parsing HTML with regexes is extremely
                  > painful and much better suited to an HTML Parser. Unfortunately, I've
                  > never used an HTML Parser in PHP (AFAIK, one was not available until
                  > about a year ago, which is actually kind of ironic).[/color]

                  Um... did you even look at the solution I posted?

                  `<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)</a>`isU'
                  ^^^^^

                  It only requires that href is there, and it is not case-sensitive, AND
                  the link doesn't need to be completely on one line...

                  --
                  Justin Koivisto - justin@koivi.co m

                  Comment

                  • Steven Vasilogianis

                    #10
                    Re: preg_replace help: removing target=&quot;_b lank&quot; from links

                    Justin Koivisto <justin@koivi.c om> writes:
                    [color=blue]
                    > Steven Vasilogianis wrote:
                    >[color=green]
                    > > This, along with every other solution thus far posted, relies on href
                    > > being the first attribute. [...][/color]
                    >
                    > Um... did you even look at the solution I posted?[/color]

                    Not closely enough, apparently :-(.
                    [color=blue]
                    > `<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)</a>`isU'
                    > ^^^^^
                    >
                    > It only requires that href is there, and it is not case-sensitive, AND
                    > the link doesn't need to be completely on one line...[/color]

                    There are still situations where your regex will break[1] (admittedly,
                    many of them are awfully contrived). I still maintain that regular
                    expressions are not very well suited for parsing HTML.

                    I have to admit though, as far as regexes for parsing HTML can go,
                    that's a pretty good one.

                    [1] Various combinations of < and/or >'s in attribute values, and/or as anchor
                    text.

                    Comment

                    Working...