Help me with a regular expression for PHP

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • cendrizzi

    Help me with a regular expression for PHP

    I have no idea where to get help on RE stuff. Since it's for a PHP app
    I thought I would ask here to see if there was some RE pros. Basically
    I'm doing some template stuff and I wanted to use a
    preg_replace_ca llback function to call another function when the
    criteria of the RE expression is matched but have no idea how to
    accomplish it.

    So I start with this:
    /<(input|select| textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/

    but need to modify it so it only matches if it has '{' characters in
    the name but to not match if it does not.

    So this would not match:
    <input name="test">

    But this would match:
    <input name="test{0}">

    Thanks much in advance.

  • Pedro Graca

    #2
    Re: Help me with a regular expression for PHP

    cendrizzi wrote:
    So I start with this:
    /<(input|select| textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
    You'd better not use regular expressions to validate HTML.
    The following line is perfectly valid HTML (I think in any version)

    <input type="text" name="x><y" id="xy">
    but need to modify it so it only matches if it has '{' characters in
    the name but to not match if it does not.
    >
    So this would not match:
    <input name="test">
    >
    But this would match:
    <input name="test{0}">
    Get the name. Verify it has '{' and '}' (in that order and once only?)

    <?php
    $name = get_name('<inpu t name="test{0}"> '); // 'test{0}'
    if (name_is_valid( $name)) {
    // whatever
    }

    function get_name($html) {
    return 'test{0}'; // sorry!
    }

    function name_is_valid($ name) {
    if (($p1 = strpos($name, '{')) === false) return false;
    if (strpos($name, '{', $p1+1) !== false) return false;
    if (($p2 = strpos($name, '}')) === false) return false;
    if (strpos($name, '}', $p2+1) !== false) return false;
    return $p1 < $p2;
    }
    ?>

    --
    I (almost) never check the dodgeit address.
    If you *really* need to mail me, use the address in the Reply-To
    header with a message in *plain* *text* *without* *attachments*.

    Comment

    • cendrizzi

      #3
      Re: Help me with a regular expression for PHP

      It's not for validation. It's for some custom template stuff that
      tells my stuff where to store the value of the form element in the
      session. That may not make sense but it's what I need for my
      application. So I use the ob_start, etc functions and use regular
      expressions against the buffer to manipulate the html or change the
      behaivor of certain elements. I could just get the name of each
      element and check them using strpos or strstr for the '{' character but
      I hoped I could use RE to check from the start if it had that so it
      wouldn't require the extra string searches.

      Hope that makes sense, it's always a bit of a challenge to explain
      things clearly, especially if the program is quite a big one.

      On Oct 29, 4:17 pm, Pedro Graca <hex...@dodgeit .comwrote:
      cendrizzi wrote:
      So I start with this:
      /<(input|select| textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/You'd better not use regular expressions to validate HTML.
      The following line is perfectly valid HTML (I think in any version)
      >
      <input type="text" name="x><y" id="xy">
      >
      but need to modify it so it only matches if it has '{' characters in
      the name but to not match if it does not.
      >
      So this would not match:
      <input name="test">
      >
      But this would match:
      <input name="test{0}"> Get the name. Verify it has '{' and '}' (in that order and once only?)
      >
      <?php
      $name = get_name('<inpu t name="test{0}"> '); // 'test{0}'
      if (name_is_valid( $name)) {
      // whatever
      }
      >
      function get_name($html) {
      return 'test{0}'; // sorry!
      }
      >
      function name_is_valid($ name) {
      if (($p1 = strpos($name, '{')) === false) return false;
      if (strpos($name, '{', $p1+1) !== false) return false;
      if (($p2 = strpos($name, '}')) === false) return false;
      if (strpos($name, '}', $p2+1) !== false) return false;
      return $p1 < $p2;
      }
      ?>
      >
      --
      I (almost) never check the dodgeit address.
      If you *really* need to mail me, use the address in the Reply-To
      header with a message in *plain* *text* *without* *attachments*.

      Comment

      • Pedro Graca

        #4
        Re: Help me with a regular expression for PHP

        cendrizzi top-posted and totally messed it up:
        I hoped I could use RE to check from the start if it had that so it
        wouldn't require the extra string searches.

        <?php
        $data = array(
        '<input type="text" name="no!" id="test0" ',
        '<input type="text" name="no{!}" id="test0" ',
        '<input type="text" name="test0" id="test0" ',
        '<input type="text" name="test 0" id="test0" ',
        '<input type="text" name="test{0}" id="test0" ',
        '<input type="text" name="test {0}" id="test0" ',
        '<input type="text" name="test{0}te st" id="test0" ',
        '<input type="text" name="test {0} test" id="test0">',
        );
        $rx = '/<(input|select| textarea)[^>]*' .
        # 'name\s*\=\s*\"[_a-zA-Z0-9\s]*\"' . // your original version
        'name\s*\=\s*\"[_a-zA-Z0-9\s]*{[_a-zA-Z0-9\s]*}[_a-zA-Z0-9\s]*\"' .
        # ---^--- ---^---
        '[^>]*>/';
        ### I think there's a few \ too many in there,
        ### I didn't look at it very attentively

        foreach ($data as $val) {
        echo $val, ' :: ';
        if (preg_match($rx , $val)) {
        echo 'M';
        } else {
        echo 'No m';
        }
        echo "atch.\n";
        }
        ?>

        --
        I (almost) never check the dodgeit address.
        If you *really* need to mail me, use the address in the Reply-To
        header with a message in *plain* *text* *without* *attachments*.

        Comment

        • BKDotCom

          #5
          Re: Help me with a regular expression for PHP


          Pedro Graca wrote:
          The following line is perfectly valid HTML (I think in any version)
          >
          <input type="text" name="x><y" id="xy">
          I would have to disagree
          <input type="text" name="x is invalid: no closing quote around
          name value
          <y" id="xy" is invalid. y" isn't a valid cname (only
          alphanumeric?)

          if you want 'x><y' as a value you'd need to use name="x&gt;&lt; y"

          Comment

          • BKDotCom

            #6
            Re: Help me with a regular expression for PHP

            I had a similar RE problem and never figured it out, or found an
            answer. I basically ended up using two callbacks..or doing the 2nd
            check (does it contain "x") in the first callback

            Capture and send all name values to the first (whether or not they
            contain the {)
            check whether or not the name value contains "{" inside that

            cendrizzi wrote:
            I have no idea where to get help on RE stuff. Since it's for a PHP app
            I thought I would ask here to see if there was some RE pros. Basically
            I'm doing some template stuff and I wanted to use a
            preg_replace_ca llback function to call another function when the
            criteria of the RE expression is matched but have no idea how to
            accomplish it.
            >
            So I start with this:
            /<(input|select| textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
            >
            but need to modify it so it only matches if it has '{' characters in
            the name but to not match if it does not.
            >
            So this would not match:
            <input name="test">
            >
            But this would match:
            <input name="test{0}">
            >
            Thanks much in advance.

            Comment

            • Chung Leong

              #7
              Re: Help me with a regular expression for PHP


              cendrizzi wrote:
              I have no idea where to get help on RE stuff. Since it's for a PHP app
              I thought I would ask here to see if there was some RE pros. Basically
              I'm doing some template stuff and I wanted to use a
              preg_replace_ca llback function to call another function when the
              criteria of the RE expression is matched but have no idea how to
              accomplish it.
              >
              So I start with this:
              /<(input|select| textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
              >
              but need to modify it so it only matches if it has '{' characters in
              the name but to not match if it does not.
              >
              So this would not match:
              <input name="test">
              >
              But this would match:
              <input name="test{0}">
              >
              Thanks much in advance.
              Well, just change the [_a-zA-Z0-9\s]* part to [\w\s]*{[\w\s]*}. Of
              course, you'll need to do proper capturing in order to form the
              replacement string.

              \w is equivalent to [_a-zA-Z0-9] by the way.

              Comment

              • cendrizzi

                #8
                Re: Help me with a regular expression for PHP

                No I didn't know that \w was the same. What do you mean by proper
                capturing. I really am a 2 year old when it comes to RE stuff.

                Thanks!

                On Oct 29, 10:04 pm, "Chung Leong" <chernyshev...@ hotmail.comwrot e:
                cendrizzi wrote:
                I have no idea where to get help on RE stuff. Since it's for a PHP app
                I thought I would ask here to see if there was some RE pros. Basically
                I'm doing some template stuff and I wanted to use a
                preg_replace_ca llback function to call another function when the
                criteria of the RE expression is matched but have no idea how to
                accomplish it.
                >
                So I start with this:
                /<(input|select| textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
                >
                but need to modify it so it only matches if it has '{' characters in
                the name but to not match if it does not.
                >
                So this would not match:
                <input name="test">
                >
                But this would match:
                <input name="test{0}">
                >
                Thanks much in advance.Well, just change the [_a-zA-Z0-9\s]* part to [\w\s]*{[\w\s]*}. Of
                course, you'll need to do proper capturing in order to form the
                replacement string.
                >
                \w is equivalent to [_a-zA-Z0-9] by the way.

                Comment

                • John Dunlop

                  #9
                  Re: Help me with a regular expression for PHP

                  BKDotCom:
                  Pedro Graca wrote:
                  >
                  The following line is perfectly valid HTML (I think in any version)

                  <input type="text" name="x><y" id="xy">
                  Yes, yes it is. In any version.
                  I would have to disagree
                  Run it through a validator. You'll find it's valid.

                  The 'name' attribute is defined as CDATA, so pretty much anything goes
                  if the attribute value is quoted, including literal less-than and
                  greater-than signs.
                  <input type="text" name="x is invalid: no closing quote around
                  name value
                  Yes, as a start-tag _in itself_. That wasn't Pedro's example though;
                  his example was the whole

                  | <input type="text" name="x><y" id="xy">
                  <y" id="xy" is invalid. y" isn't a valid cname
                  As a tag in itself, it is invalid HTML, yes. It isn't invalid as part
                  of the example above.
                  (only alphanumeric?)
                  Generic identifiers (aka, element type names) must begin with upper- or
                  lowercase letters.
                  if you want 'x><y' as a value you'd need to use name="x&gt;&lt; y"
                  No. You only need to replace '<' and '>' with references where they
                  would be understood as something other than character data.

                  --
                  Jock

                  Comment

                  • Pedro Graca

                    #10
                    Re: Help me with a regular expression for PHP

                    Chung Leong wrote:
                    \w is equivalent to [_a-zA-Z0-9] by the way.
                    It is /almost/ equivalent:

                    ~$ php -r 'echo (preg_match("/^\w+$/", "Graça"))?("yes "):("no"), "\n";'
                    yes
                    ~$ php -r 'echo (preg_match("/^[_a-zA-Z0-9]+$/", "Graça"))?("yes "):("no"), "\n";'
                    no

                    --
                    I (almost) never check the dodgeit address.
                    If you *really* need to mail me, use the address in the Reply-To
                    header with a message in *plain* *text* *without* *attachments*.

                    Comment

                    • Jerry Stuckle

                      #11
                      Re: Help me with a regular expression for PHP

                      BKDotCom wrote:
                      Pedro Graca wrote:
                      >
                      >>The following line is perfectly valid HTML (I think in any version)
                      >>
                      > <input type="text" name="x><y" id="xy">
                      >
                      >
                      I would have to disagree
                      <input type="text" name="x is invalid: no closing quote around
                      name value
                      <y" id="xy" is invalid. y" isn't a valid cname (only
                      alphanumeric?)
                      >
                      if you want 'x><y' as a value you'd need to use name="x&gt;&lt; y"
                      >
                      Actually, it is legal. name="x><y" is a perfectly valid tag and value.
                      &lt; and &gt; aren't required here because they are within a quoted
                      string in a tag.

                      You do need &lt; and &gt; in plain text, however, when they may be
                      mistaken for the start/end of a tag.

                      --
                      =============== ===
                      Remove the "x" from my email address
                      Jerry Stuckle
                      JDS Computer Training Corp.
                      jstucklex@attgl obal.net
                      =============== ===

                      Comment

                      • Chung Leong

                        #12
                        Re: Help me with a regular expression for PHP


                        cendrizzi wrote:
                        No I didn't know that \w was the same. What do you mean by proper
                        capturing. I really am a 2 year old when it comes to RE stuff.
                        >
                        Thanks!
                        >
                        On Oct 29, 10:04 pm, "Chung Leong" <chernyshev...@ hotmail.comwrot e:
                        cendrizzi wrote:
                        I have no idea where to get help on RE stuff. Since it's for a PHP app
                        I thought I would ask here to see if there was some RE pros. Basically
                        I'm doing some template stuff and I wanted to use a
                        preg_replace_ca llback function to call another function when the
                        criteria of the RE expression is matched but have no idea how to
                        accomplish it.
                        So I start with this:
                        /<(input|select| textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
                        but need to modify it so it only matches if it has '{' characters in
                        the name but to not match if it does not.
                        So this would not match:
                        <input name="test">
                        But this would match:
                        <input name="test{0}">
                        Thanks much in advance.Well, just change the [_a-zA-Z0-9\s]* part to [\w\s]*{[\w\s]*}. Of
                        course, you'll need to do proper capturing in order to form the
                        replacement string.

                        \w is equivalent to [_a-zA-Z0-9] by the way.
                        By that I mean you need to grab the substrings which precedes and
                        follows the text inside the quotation marks. If the input is

                        <input name="test{0}" size="40">

                        you'd want

                        <input name="

                        and

                        " size="40">

                        so that you can form the replacement <input name=" + DATA + "
                        size="40">.

                        Presumably you'd want 'test' and '0' as well for looking up the data.

                        Comment

                        • BKDotCom

                          #13
                          Re: Help me with a regular expression for PHP


                          John Dunlop wrote:
                          Run it through a validator. You'll find it's valid.
                          Will I?

                          W3C's easy-to-use markup validation service, based on SGML and XML parsers.

                          Warning character "<" is the first character of a delimiter but
                          occurred as data
                          This message may appear in several cases:
                          * You tried to include the "<" character in your page: you should
                          escape it as "&lt;"
                          * You used an unescaped ampersand "&": this may be valid in some
                          contexts, but it is recommended to use "&amp;", which is always safe.
                          * Another possibility is that you forgot to close quotes in a
                          previous tag.

                          Comment

                          • Andy Hassall

                            #14
                            Re: Help me with a regular expression for PHP

                            On 30 Oct 2006 09:02:18 -0800, "BKDotCom" <bkfake-google@yahoo.co mwrote:
                            >John Dunlop wrote:
                            >Run it through a validator. You'll find it's valid.
                            >
                            >Will I?
                            You certainly should. I've just tried it against the W3C validator, and it
                            agreed it's valid.
                            >http://validator.w3.org/check
                            >Warning character "<" is the first character of a delimiter but
                            >occurred as data
                            >This message may appear in several cases:
                            * You tried to include the "<" character in your page: you should
                            >escape it as "&lt;"
                            * You used an unescaped ampersand "&": this may be valid in some
                            >contexts, but it is recommended to use "&amp;", which is always safe.
                            * Another possibility is that you forgot to close quotes in a
                            >previous tag.
                            Result: Passed validation
                            File: test.html
                            Encoding: iso-8859-1
                            Doctype: HTML 4.01 Transitional
                            This Page Is Valid HTML 4.01 Transitional!

                            Here's what I uploaded:

                            <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
                            "http://www.w3.org/TR/html4/loose.dtd">
                            <html>
                            <head>
                            <title>test</title>
                            <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
                            </head>
                            <body>
                            <form method="post" action="test.ph p">
                            <input type="text" name="x><y" id="xy">
                            </form>
                            </body>
                            </html>

                            (the <metabeing there because I validated it by upload rather than from a
                            real site that would have sent the relevant HTTP header instead)

                            What did you upload?

                            --
                            Andy Hassall :: andy@andyh.co.u k :: http://www.andyh.co.uk
                            http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool

                            Comment

                            Working...