Non-capturing parentheses

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michael Winter

    Non-capturing parentheses

    I'll be frank: what is the point? The use, and usefulness, of a set of
    capturing parentheses in a regular expression is clear. However, isn't an
    expression such as "/(?:x)/", exactly the same as "/x/", or is the former
    a better way of including a literal substring within an expression?

    I know there must be a valid reason for including them, but I'm not seeing
    it at the moment.

    Mike

    --
    Michael Winter
    M.Winter@blueyo nder.co.invalid (replace ".invalid" with ".uk" to reply)
  • Martin Honnen

    #2
    Re: Non-capturing parentheses



    Michael Winter wrote:
    [color=blue]
    > I'll be frank: what is the point? The use, and usefulness, of a set of
    > capturing parentheses in a regular expression is clear. However, isn't
    > an expression such as "/(?:x)/", exactly the same as "/x/", or is the
    > former a better way of including a literal substring within an expression?[/color]

    Often you use parentheses to group some more complex subexpression e.g.
    /Windows\s+(95|9 8|ME|NT|XP)/
    and there you can avoid capturing the value in parentheses if you don't
    need it:

    var pattern1 = /Windows\s+(95|9 8|ME|NT|XP)/;
    var string = "Windows XP";
    alert(pattern1. exec(string))
    var pattern2 = /Windows\s+(?:95 |98|ME|NT|XP)/;
    alert(pattern2. exec(string))

    --

    Martin Honnen


    Comment

    • Michael Winter

      #3
      Re: Non-capturing parentheses

      On Sun, 18 Jan 2004 16:19:57 +0100, Martin Honnen <mahotrash@yaho o.de>
      wrote:
      [color=blue]
      > Often you use parentheses to group some more complex subexpression e.g.
      > /Windows\s+(95|9 8|ME|NT|XP)/
      > and there you can avoid capturing the value in parentheses if you don't
      > need it:[/color]

      I thought that would be a sensible use for them - as evidenced by their
      common use in various programming languages - but I don't remember reading
      anything about parentheses, of either type, acting as grouping operators
      in either Netscape's JavaScript reference (v1.5) or ECMA-262[1].

      Mike


      [1] I find that document *incredibly* difficult to read. I've read various
      standards and specs covering a wide range of topics, from hardware and
      drivers to encryption algorithms, but it's the most obscure one yet.

      --
      Michael Winter
      M.Winter@blueyo nder.co.invalid (replace ".invalid" with ".uk" to reply)

      Comment

      • Martin Honnen

        #4
        Re: Non-capturing parentheses



        Michael Winter wrote:
        [color=blue]
        > On Sun, 18 Jan 2004 16:19:57 +0100, Martin Honnen <mahotrash@yaho o.de>
        > wrote:
        >[color=green]
        >> Often you use parentheses to group some more complex subexpression e.g.
        >> /Windows\s+(95|9 8|ME|NT|XP)/
        >> and there you can avoid capturing the value in parentheses if you
        >> don't need it:[/color]
        >
        >
        > I thought that would be a sensible use for them - as evidenced by their
        > common use in various programming languages - but I don't remember
        > reading anything about parentheses, of either type, acting as grouping
        > operators in either Netscape's JavaScript reference (v1.5) or ECMA-262[1].[/color]

        As far as ECMAScript edition 3 is your concern then look into section
        15.10.1 where the grammar for regular epxression patterns is given, you
        will certainly find that the above example is correct syntax.
        As for evaluating disjunctions that is explained in section 15.10.2.3.
        --

        Martin Honnen


        Comment

        • Judas

          #5
          Re: Non-capturing parentheses

          Michael Winter <M.Winter@bluey onder.co.invali d> wrote in message news:<opr1x4u5t g5vklcq@news-text.blueyonder .co.uk>...[color=blue]
          > I'll be frank: what is the point? The use, and usefulness, of a set of
          > capturing parentheses in a regular expression is clear. However, isn't an
          > expression such as "/(?:x)/", exactly the same as "/x/", or is the former
          > a better way of including a literal substring within an expression?
          >
          > I know there must be a valid reason for including them, but I'm not seeing
          > it at the moment.
          >
          > Mike[/color]

          Mike,
          am really confused by your question. As far as I can see your
          first regular expression is eqivalent to
          "/({0,1}:x)/"
          As the {0,1}, or the ? don't actually follow anything, I'd be
          surprised if it was a valid regular expression, and if it was,
          something very funny would be going on. If you want to match
          "agivenstri ng", then "/agivenstring/" seems to be a fine way
          of doing it.
          If you do choose to use parenthesis, then, as you said the
          usefulness is obvious, particularly if your using wildcards,
          numbers of matches, or searching for a non fixed string. The
          parenthesis enable you to retrieve the string that was actually
          found.
          HTH,
          Judas.

          Comment

          • Lasse Reichstein Nielsen

            #6
            Re: Non-capturing parentheses

            jsrobins@sghms. ac.uk (Judas) writes:
            [color=blue]
            > am really confused by your question. As far as I can see your
            > first regular expression is eqivalent to
            > "/({0,1}:x)/"[/color]
            [color=blue]
            > As the {0,1}, or the ? don't actually follow anything, I'd be
            > surprised if it was a valid regular expression, and if it was,
            > something very funny would be going on.[/color]

            Since you can't write ? (or {0,1}) without a token before, the syntax
            "(?" was preciously illegal. That made it a prime candidate use for
            when new features were needed.

            In new and improved regular expressions, the sequence "(?" starts one of
            the new features.
            (?: ... ) - is a non-capturing grouping, just as ( ... ) but without
            counting as a match
            (?= ... ) - is a positive lookeahead. It has zero width but matches
            if the following is matched by the expression inside.
            (?! ... ) - is the negative lookeahed.


            Example:
            /^(?=\d{10}$)0*1 *$/
            This matches a sequence of ten 0's and 1's where all the zeros comes
            before the ones. The (?=\d{10}$) matches a zero-width string where
            the remainder of the string consists of 10 digits. The 0*1* matches
            a string of any length of 0's and 1's where all the 0's comes before
            all the 1's.

            Example:
            /^(?!\d{10}).{10 }$/
            This matches a string of ten characters that are not all digits.
            The (?!\d{10}) matches a zero-width string that is not followed
            by ten digit. The .{10} matches any ten characters (except newline).
            So, a string is matched only if it is ten characters, and just
            before it, the lookahead is not ten digits.

            Both examples are of tests that are longer if you need to do them
            without lookahead (at least I couldn't find a shorter way, and I'm
            pretty good with regular expressions :).

            /L
            --
            Lasse Reichstein Nielsen - lrn@hotpop.com
            DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
            'Faith without judgement merely degrades the spirit divine.'

            Comment

            • rh

              #7
              Re: Non-capturing parentheses

              Lasse Reichstein Nielsen <lrn@hotpop.com > wrote in message news:<hdytrky2. fsf@hotpop.com> ...
              [color=blue]
              > Since you can't write ? (or {0,1}) without a token before, the syntax
              > "(?" was preciously illegal. That made it a prime candidate use for
              > when new features were needed.
              >[/color]

              Well, "can't write" is what I'd expect and, as I read it, what the
              ECMA-262/3 syntax specifies. However, both Opera (7.11) and Netscape
              (7.1) will each allow:

              a = /{0,1}/;

              and a number of other variations. So it appears neither excludes an
              unescaped "{" as a pattern character. Otherwise, Opera's RegExp
              support seems to be very good.

              ../rh

              Comment

              Working...