Regex help please

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Tim Nash (aka TMN)

    Regex help please

    Hi

    Can anyone help me match this div below - my regex does not work - if
    you could tell me why I would appreciate it.

    var aStr = "<div class='feedflar e'>dfgdg dg</div>";
    var reg = new RegExp("<div class='feedflar e'.*?</div>'","gim");


    thanks
    Tim
  • pr

    #2
    Re: Regex help please

    Tim Nash (aka TMN) wrote:
    Can anyone help me match this div below - my regex does not work - if
    you could tell me why I would appreciate it.
    >
    var aStr = "<div class='feedflar e'>dfgdg dg</div>";
    var reg = new RegExp("<div class='feedflar e'.*?</div>'","gim");
    -------------------------------------------------------^
    That apostrophe shouldn't be there.

    The 'm' flag is unnecessary.

    Comment

    • Tim Nash (aka TMN)

      #3
      Re: Regex help please

      After a fresh start this morning I got this to work taking into
      account white spaces around 'class' and '=' etc and also
      al/ow ' or " to be used

      var reg = new RegExp("<div[^>]class\\s*=\
      \s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

      Tim

      Comment

      • Thomas 'PointedEars' Lahn

        #4
        Re: Regex help please

        Tim Nash (aka TMN) wrote:
        After a fresh start this morning I got this to work taking into
        account white spaces around 'class' and '=' etc and also
        al/ow ' or " to be used
        >
        var reg = new RegExp("<div[^>]class\\s*=\
        \s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
        Single-escaping the apostrophe within a double-quoted string literal is
        useless ("\'" == "'"), and attr=['"]...['"]* is pointless (the star repeats
        the previous expression zero or more times; here: ['"]). It would also be a
        lot easier to maintain if you used a RegExp literal instead.

        var reg = /<div[^>]class\s*=\s*['"]feedflare['"]>(.*?)<\/div>/gi;

        That still does not exclude the possibility of e.g.

        <divaclass="fee dflare'>...</div>

        which is not Valid. As for the element type identifier followed by optional
        attributes, you should use

        <ident(|\s+attr ...)>

        because whitespace after the identifier is required if there are attributes.
        As for the matching quotes, you should use

        ('foo'|"foo")

        However, RegExp literals and non-greedy matching (`.*?') are not universally
        supported, with the latter being the more important fact here. See also:

        <http://pointedears.de/scripts/es-matrix/>

        Also note that a single regular expression cannot be used to parse an
        *arbitrary* fragment of an SGML-based markup language; either it is too
        greedy or not greedy enough. For example, in

        <div class="foo"><di v>bar</div></div>

        this non-greedy expression would match `<div class="foo"><di v>bar</div>'.
        with the outer `div' element not being closed.

        So, for reliable parsing, you will need to implement a push-down automaton;
        however, its parsing algorithm can be made more efficient with regular
        expressions.

        Unsurprisingly, all this has been discussed here before. Please search
        before you post.

        <http://jibbering.com/faq/>


        PointedEars
        --
        Use any version of Microsoft Frontpage to create your site.
        (This won't prevent people from viewing your source, but no one
        will want to steal it.)
        -- from <http://www.vortex-webdesign.com/help/hidesource.htm>

        Comment

        • pr

          #5
          Re: Regex help please

          Tim Nash (aka TMN) wrote:
          After a fresh start this morning I got this to work taking into
          account white spaces around 'class' and '=' etc and also
          al/ow ' or " to be used
          >
          var reg = new RegExp("<div[^>]class\\s*=\
          \s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
          To match a string starting with any of the following common permutations:

          <div class="feedflar e">
          <div style="color: red;" class="feedflar e">
          <div class="feedflar e" id="div1">
          <div class="class1 feedflare class3">

          you will instead need something like:

          /<div\b[^>]+\bclass\s*=\s* (['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi

          I have simplified it by presuming you won't use the characters '.-:' in
          class names. But as PointedEars points out, '.*?' is a problem in old
          browsers and you're in trouble if there's a nested div in your string.

          Possibly you would be better served by reading the string into the DOM
          (using a DOMParser or innerHTML, for e.g.) and extracting information
          from it there.

          Comment

          • Tim Nash (aka TMN)

            #6
            Re: Regex help please

            Thank you PointedEars and pr for your input.

            Tim

            pr wrote:
            Tim Nash (aka TMN) wrote:
            After a fresh start this morning I got this to work taking into
            account white spaces around 'class' and '=' etc and also
            al/ow ' or " to be used

            var reg = new RegExp("<div[^>]class\\s*=\
            \s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');
            >
            To match a string starting with any of the following common permutations:
            >
            <div class="feedflar e">
            <div style="color: red;" class="feedflar e">
            <div class="feedflar e" id="div1">
            <div class="class1 feedflare class3">
            >
            you will instead need something like:
            >
            /<div\b[^>]+\bclass\s*=\s* (['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi
            >
            I have simplified it by presuming you won't use the characters '.-:' in
            class names. But as PointedEars points out, '.*?' is a problem in old
            browsers and you're in trouble if there's a nested div in your string.
            >
            Possibly you would be better served by reading the string into the DOM
            (using a DOMParser or innerHTML, for e.g.) and extracting information
            from it there.

            Comment

            • pr

              #7
              Re: Regex help please

              Thomas 'PointedEars' Lahn wrote:
              As for the matching quotes, you should use
              >
              ('foo'|"foo")
              Or

              (['"])foo\1
              >
              However, RegExp literals and non-greedy matching (`.*?') are not universally
              supported, with the latter being the more important fact here.
              Does this seem a reasonable feature test to you?

              var ngq = /.+?/.exec("ab");
              var hasNonGreedyQua ntifiers = ngq && ngq[0].length == 1;

              I can only lay hands on one browser old enough to fail. I assume the
              presence of literal notation, obviously.

              Comment

              • Thomas 'PointedEars' Lahn

                #8
                Re: Regex help please

                pr wrote:
                Thomas 'PointedEars' Lahn wrote:
                > As for the matching quotes, you should use
                >>
                > ('foo'|"foo")
                >
                Or
                >
                (['"])foo\1
                Correct. To my surprise, this feature, standardized only with ECMAScript
                Ed. 3 (like regular expressions in general), appears to be widely supported:

                The bookmarklet

                javascript:wind ow.alert(/^(["'])a\1b$/.test("'a'b"));

                shows `true' in all my test environments, which currently are:

                - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1)
                Gecko/2008070208 Firefox/3.0.1
                - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)
                Gecko/20080404 Firefox/2.0.0.14
                - Mozilla/4.78 [de] (Windows NT 5.0; U)

                - Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE)
                AppleWebKit/525.19 (KHTML, like Gecko) Version/3.1.2 Safari/525.21
                - Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; de-de)
                AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22

                - Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; {...};
                .NET CLR 1.1.4322; .NET CLR 2.0.50727) (IE 8 beta 1)
                - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; {...};
                .NET CLR 1.1.4322; .NET CLR 2.0.50727)
                - Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {...};
                .NET CLR 1.1.4322; .NET CLR 2.0.50727)
                - Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.1; {...};
                .NET CLR 1.1.4322; .NET CLR 2.0.50727)
                - Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0;
                .NET CLR 1.1.4322; .NET CLR 2.0.50727)
                - Mozilla/4.0 (compatible; MSIE 4.01; Windows NT 5.0; {...})

                - Opera/9.52 (Windows NT 5.1; U; de)
                - Opera/9.51 (Windows NT 5.1; U; de)
                - Opera/9.27 (Windows NT 5.1; U; en)
                - Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0
                - Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1)
                Opera 7.02 [en]
                >However, RegExp literals and non-greedy matching (`.*?') are not universally
                >supported, with the latter being the more important fact here.
                >
                Does this seem a reasonable feature test to you?
                >
                var ngq = /.+?/.exec("ab");
                var hasNonGreedyQua ntifiers = ngq && ngq[0].length == 1;
                No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
                parsed, before execution (you can test that with IE 5.0, for example). And
                I have yet to devise a bullet-proof test for possibly unsupported syntax (a
                more sophisticated application of eval() comes to mind), one that does not
                break the ECMAScript program then.

                However,

                var ngq = null;

                try
                {
                ngq = new RegExp(".+?");
                }
                catch (e)
                {
                }

                if (nqg)
                {
                // ...
                }

                would work for script engines that support basic exception handling but not
                non-greedy quantifiers (such as JScript 5.1 in IE 5.01; tested positive).


                PointedEars
                --
                Use any version of Microsoft Frontpage to create your site.
                (This won't prevent people from viewing your source, but no one
                will want to steal it.)
                -- from <http://www.vortex-webdesign.com/help/hidesource.htm>

                Comment

                • pr

                  #9
                  Re: Regex help please

                  Thomas 'PointedEars' Lahn wrote:
                  pr wrote:
                  >Does this seem a reasonable feature test to you?
                  >>
                  > var ngq = /.+?/.exec("ab");
                  > var hasNonGreedyQua ntifiers = ngq && ngq[0].length == 1;
                  >
                  No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
                  parsed, before execution (you can test that with IE 5.0, for example).
                  You're right. IE 5 reports "Unexpected quantifier".

                  Comment

                  Working...