[RegExp] Making non-greedy; Escaping parentheses?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jane Doe

    [RegExp] Making non-greedy; Escaping parentheses?

    Hello,

    I need to browse a list of hyperlinks, each followed by an
    author, and remove the links only for certain authors.

    1. I searched the archives on Google, but didn't find how to tell the
    RegExp object to be non-greedy as using the ? quantifier doesn't seem
    to work.

    --------- SAMPLE ----------------
    var items = new Array("johndoe" ,"janedoe"
    // Add parentheses to match any item in items()
    var list = '('
    list += items.join("|")
    list += ')'

    //Example: <A href="dummy.php ?page=10#934569 ">TITLE
    </A>, AUTHOR, April 12, 2003<br>

    pattern = '<A href=".+?#[0-9]+?">.+?</A>, '
    pattern += list
    pattern += ',.+?<br>'

    var input = new RegExp(temp,"gi ");
    var output = 'TROLL<br>'
    document.body.i nnerHTML = body.replace(in put,output);
    --------- SAMPLE ----------------

    Does somebody know how to do this?

    2. Also, I notice that when using (johndoe|janedo e) in a pattern, the
    value is copied into one of the $x variables. In this particular case,
    I don't need this.
    Is there a way to escape parentheses to tell RegEx _not_ to put this
    item into a variable? I tried "\(" and "((", to no avail.

    Thank you very much for any help
    JD.
  • Martin Honnen

    #2
    Re: [RegExp] Making non-greedy; Escaping parentheses?



    Jane Doe wrote:

    [color=blue]
    >
    > 2. Also, I notice that when using (johndoe|janedo e) in a pattern, the
    > value is copied into one of the $x variables. In this particular case,
    > I don't need this.
    > Is there a way to escape parentheses to tell RegEx _not_ to put this
    > item into a variable? I tried "\(" and "((", to no avail.
    >[/color]

    I think you are looking for non-capturing parentheses e.g.
    /(?:john|jane)do e/
    but that is only supported with IE5.5+ and Netscape 6+

    --

    Martin Honnen


    Comment

    • Lasse Reichstein Nielsen

      #3
      Re: [RegExp] Making non-greedy; Escaping parentheses?

      Jane Doe <jane.doe@acme. com> writes:
      [color=blue]
      > I need to browse a list of hyperlinks, each followed by an
      > author, and remove the links only for certain authors.
      >
      > 1. I searched the archives on Google, but didn't find how to tell the
      > RegExp object to be non-greedy as using the ? quantifier doesn't seem
      > to work.[/color]

      It should, if the browser is sufficiently new. The improved regular
      expressions (non-greedy +,*,? and {}, non capturing bracketsa and
      lookahead) are part of Javascript 1.5 and ECMAScript, not the eariler
      Javascript versions.
      [color=blue]
      > --------- SAMPLE ----------------
      > var items = new Array("johndoe" ,"janedoe"[/color]

      Missing end parenthesis (and semicolon! Always end your sentences
      with a semicolon.).
      [color=blue]
      > // Add parentheses to match any item in items()
      > var list = '('
      > list += items.join("|")
      > list += ')'
      >
      > //Example: <A href="dummy.php ?page=10#934569 ">TITLE
      > </A>, AUTHOR, April 12, 2003<br>[/color]

      Is the entire string always on one line?
      As a stupid convention, the regular expression "." matches
      all non-EOL characters, but there is no shorthand for matching
      any character. If the text contains newlines, you may need to
      change "." to, e.g., "[\s\S]".
      [color=blue]
      > pattern = '<A href=".+?#[0-9]+?">.+?</A>, '
      > pattern += list
      > pattern += ',.+?<br>'[/color]

      If your code is inside a script tag, and not in an external file,
      you should escape your "</"'s as "<\/". Most browsers are forgiving.
      [color=blue]
      > var input = new RegExp(temp,"gi ");[/color]

      Do you mean "pattern" instead of "temp"?
      [color=blue]
      > var output = 'TROLL<br>'
      > document.body.i nnerHTML = body.replace(in put,output);
      > --------- SAMPLE ----------------
      >
      > Does somebody know how to do this?[/color]

      One problem is, that a minimal match will still be as early as possible.
      If you have two entries in a row, and the second has an author on your
      hit-list, it will find a match starting at the first "<A". It finds
      the minimal match starting there, which includes both entries, so
      both are replaced.
      To avoid this, you can restrict the .'s so they can't match too far:

      pattern = '<A href="[^"]+?#\\d+?">[^<]+?</A>, ';
      pattern += list;
      pattern += ',[^>]+?<br>';

      This prevents matching further than we want it. If there are tags
      inside the TITLE or in the date after the author name, then "[^<]"
      isn't sufficient as a restriction.
      [color=blue]
      > 2. Also, I notice that when using (johndoe|janedo e) in a pattern, the
      > value is copied into one of the $x variables. In this particular case,
      > I don't need this.
      > Is there a way to escape parentheses to tell RegEx _not_ to put this
      > item into a variable? I tried "\(" and "((", to no avail.[/color]

      Yes.
      (?: ... )
      This pair of parentheses are purely grouping, and the match won't be
      remembered.

      /L
      --
      Lasse Reichstein Nielsen - lrn@hotpop.com
      Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit. html>
      'Faith without judgement merely degrades the spirit divine.'

      Comment

      • Jane Doe

        #4
        Re: [RegExp] Making non-greedy; Escaping parentheses?

        On 12 Sep 2003 18:35:18 +0200, Lasse Reichstein Nielsen
        <lrn@hotpop.com > wrote:[color=blue]
        >It should, if the browser is sufficiently new. The improved regular
        >expressions (non-greedy +,*,? and {}, non capturing bracketsa and
        >lookahead) are part of Javascript 1.5 and ECMAScript, not the eariler
        >Javascript versions.[/color]

        Thank you very much Martin and Lasse :-) Finally got it working thanks
        to you. I didn't know non-greedy regexes were so recent in JS.

        Thanks again
        JD.

        Comment

        Working...