regex question - can't get unclosed quote marks inside of HTML tags

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • lawrence k

    regex question - can't get unclosed quote marks inside of HTML tags

    I do not know much about regex.

    I'm worried about lines like this:

    <a href="myFile>my file</a>

    There is only one quote mark in that html.

    I wanted to fix this problem, so I tried this:


    function command($string =false) {
    $pattern = '/(.*)<a (.*)"(.*)>/i';
    $replacement = '$1<$2"$3">';
    $newString = preg_replace($p attern, $replacement, $string);
    return $newString;
    }


    This finds no matches. Even when I feed it the above as a test line.

    What have I done wrong.

  • dennis.alund@gmail.com

    #2
    Re: regex question - can't get unclosed quote marks inside of HTML tags

    This isn't really a solution to your problem... just a hint of what's
    wrong; a quote is also a character... i.e. in the expression (.*) will
    also match a quote.
    What you want to look into is negative lookarounds; (?<!x)y matches an
    'y' not preceeded by a 'x' and x(?!y) matches a 'x' not followed by an
    'y'.
    But if you're not a regexp-ninja I'd recommend you to find an easier
    solution... negative lookarounds aint trivial.

    Comment

    • lawrence k

      #3
      Re: regex question - can't get unclosed quote marks inside of HTML tags


      dennis.alund@gm ail.com wrote:[color=blue]
      > This isn't really a solution to your problem... just a hint of what's
      > wrong; a quote is also a character... i.e. in the expression (.*) will
      > also match a quote.
      > What you want to look into is negative lookarounds; (?<!x)y matches an
      > 'y' not preceeded by a 'x' and x(?!y) matches a 'x' not followed by an
      > 'y'.
      > But if you're not a regexp-ninja I'd recommend you to find an easier
      > solution... negative lookarounds aint trivial.[/color]

      I see. So if I have a string like this:

      <a href="myfile">m yfile</a>

      and I feed it to this function:

      function command($string =false) {
      $pattern = '/(.*)<a (.*)"(.*)>/i';
      $replacement = '$1<$2"$3">';
      $newString = preg_replace($p attern, $replacement,
      $string);
      return $newString;
      }

      I will get a match? But does that mean I'll end up with this:

      <a href="myfile""> myfile</a>

      With an extra quote mark? That is not the problem I was having.

      But you are saying that I need to replace the final (.*) with something
      that says "everything but a quote mark"?

      Comment

      • lawrence k

        #4
        Re: regex question - can't get unclosed quote marks inside of HTML tags


        dennis.alund@gm ail.com wrote:[color=blue]
        > This isn't really a solution to your problem... just a hint of what's
        > wrong; a quote is also a character... i.e. in the expression (.*) will
        > also match a quote.
        > What you want to look into is negative lookarounds; (?<!x)y matches an
        > 'y' not preceeded by a 'x' and x(?!y) matches a 'x' not followed by an
        > 'y'.
        > But if you're not a regexp-ninja I'd recommend you to find an easier
        > solution... negative lookarounds aint trivial.[/color]


        So I want this?

        function command($string =false) {

        $pattern = '/(.*)<a (.*)"[^"]+>/i';
        $replacement = '$1<a $2"$3">';
        $newString = preg_replace($p attern, $replacement, $string);
        return $newString;


        return $string;
        }

        Comment

        • lawrence k

          #5
          Re: regex question - can't get unclosed quote marks inside of HTML tags


          dennis.alund@gm ail.com wrote:[color=blue]
          > This isn't really a solution to your problem... just a hint of what's
          > wrong; a quote is also a character... i.e. in the expression (.*) will
          > also match a quote.
          > What you want to look into is negative lookarounds; (?<!x)y matches an
          > 'y' not preceeded by a 'x' and x(?!y) matches a 'x' not followed by an
          > 'y'.
          > But if you're not a regexp-ninja I'd recommend you to find an easier
          > solution... negative lookarounds aint trivial.[/color]

          Thanks for the reply. I just created this file and I uploaded it to my
          server for testing:

          <?php

          echo "hey";
          flush();

          function command($string =false) {
          echo "hey";
          $pattern = '/(.*)<a (.*)"[^"]+>/i';
          $replacement = '$1<a $2"$3">';
          // $newString = preg_replace($p attern, $replacement, $string);
          return $newString;
          }

          $string = "<p><a href=\"myfile>m yfile</a> ";
          echo $string;
          flush();
          $string = command($string );
          echo $string;

          ?>


          When I don't comment out the preg_replace line then I get one "hey"
          echoed to the screen and then the script apparently dies without error.

          Comment

          • lawrence k

            #6
            Re: regex question - can't get unclosed quote marks inside of HTML tags


            dennis.alund@gm ail.com wrote:[color=blue]
            > This isn't really a solution to your problem... just a hint of what's
            > wrong; a quote is also a character... i.e. in the expression (.*) will
            > also match a quote.
            > What you want to look into is negative lookarounds; (?<!x)y matches an
            > 'y' not preceeded by a 'x' and x(?!y) matches a 'x' not followed by an
            > 'y'.
            > But if you're not a regexp-ninja I'd recommend you to find an easier
            > solution... negative lookarounds aint trivial.[/color]


            Well, okay, for anyone interested, I worked it out and found the
            correct pattern is this:

            <a ([^">]+)"([^">]+)>

            Comment

            Working...