Capitalize regex

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Marek Mand

    Capitalize regex

    <script>
    var newval = '';
    var name = 'marek mänd-österreich a';

    // http://www.faqts.com/knowledge_base/...html/aid/15940
    correctedname = name.replace(/\b\w+b/g, function(word) {
    return word.substring( 0,1).toUpperCas e()
    + word.substring( 1);
    });
    alert(corrected name);
    </script>


    How would I get "intended output" out from it taht is:
    "Marek Mänd-Österreich A"
    Mozilla works as intended and desired, the MSIE fails totally.
    So what is the "word boundary" regex for other languages than English?
    I thing Javascript regular expressions are weak and unusable.
  • Janwillem Borleffs

    #2
    Re: Capitalize regex

    Marek Mand wrote:[color=blue]
    > How would I get "intended output" out from it taht is:
    > "Marek Mänd-Österreich A"
    > Mozilla works as intended and desired, the MSIE fails totally.
    > So what is the "word boundary" regex for other languages than English?
    > I thing Javascript regular expressions are weak and unusable.[/color]

    var name = 'marek mänd-österreich a';

    name = name.replace(/(^|\s|\-)(.)/g, function (c) { return
    c.toUpperCase() ; });
    alert(name);


    JW



    Comment

    • Marek Mand

      #3
      Re: Capitalize regex

      Marek Mand wrote:[color=blue]
      > // http://www.faqts.com/knowledge_base/...html/aid/15940
      > correctedname = name.replace(/\b\w+b/g, function(word) {[/color]

      Just a correction for original post afterwards:

      ofcourse what I had in my file was (missing \)
      correctedname = name.replace(/\b\w+\b/g, function(word) {
      just writing it here caused typo, but correcting that
      still shows how MSIE differs from Mozilla.

      Comment

      • Marek Mand

        #4
        Re: Capitalize regex

        Janwillem Borleffs wrote:[color=blue]
        > Marek Mand wrote:[color=green]
        >>How would I get "intended output" out from it taht is:
        >>"Marek Mänd-Österreich A"
        >>Mozilla works as intended and desired, the MSIE fails totally.
        >>So what is the "word boundary" regex for other languages than English?[/color][/color]
        [color=blue]
        > var name = 'marek mänd-österreich a';
        > name = name.replace(/(^|\s|\-)(.)/g, function (c) { return
        > c.toUpperCase() ; });
        > alert(name);[/color]

        So I understand it is important by my own explicitly define
        what makes up a separator between words.
        I understand I have to add comma and semicolon and colon and lots of
        things more to that in order to work more reliably.

        However I have no idea what all those symbols should be taking in
        account there are lots of dots and 'fullstops' adn weird symbols in
        Unicode, is there somewhere a pregiven list to be read what says
        how word boundaries should be treated in the sense of what splits words?


        Thanks for the answer! =D

        --
        marekmand

        Comment

        • Lasse Reichstein Nielsen

          #5
          Re: Capitalize regex

          Marek Mand <cador.soft@mai l.ee> writes:
          [color=blue]
          > So I understand it is important by my own explicitly define
          > what makes up a separator between words.[/color]

          Yes. Javascript, or more precisely: ECMAScript v3, defines the \b
          regexp as a boundary between a word character and a non-word
          character. It also defines word character as [0-9a-zA-Z_]
          (section 15.10.2.6).

          Apparently Mozilla isn't following the ECMAScript standard.
          I like their approach better, but it's not something you can
          rely on.

          /L
          --
          Lasse Reichstein Nielsen - lrn@hotpop.com
          DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
          'Faith without judgement merely degrades the spirit divine.'

          Comment

          • Marek Mand

            #6
            Re: Capitalize regex

            Lasse Reichstein Nielsen wrote:[color=blue]
            > Marek Mand <cador.soft@mai l.ee> writes:[color=green]
            >>So I understand it is important by my own explicitly define
            >>what makes up a separator between words.[/color][/color]
            [color=blue]
            > Yes. Javascript, or more precisely: ECMAScript v3, defines the \b
            > regexp as a boundary between a word character and a non-word
            > character. It also defines word character as [0-9a-zA-Z_]
            > (section 15.10.2.6).[/color]

            it would be a blast if in later jasvcript version I could define
            character classes on my own at runtime for each regex object individually.

            [[:estoninacharac ters:]]
            [[:germancharacte rs:]]
            [[:danishcharacte rs:]]
            [[:finishcharacte rs:]]

            This would for the programmer definately save time and make
            the code more readable in the context of that every occurence of
            \w with complement of 'foreign language' (other than English)
            specific characters shouldnt be written out every time one needs them,
            but can be replaced with 'virtual own defined character class'.

            I dont know whether it would be death to the regexengine performance
            but the english-language-centerness and less options for easy
            customisation is what makes the javascript regexes weak.

            [color=blue]
            > Apparently Mozilla isn't following the ECMAScript standard.[/color]

            Just for others FYI. Opera does it like MSIE.
            On the other hand not related very much but a bit fun watching is
            argument on css text-transform:capit alize, what is a word

            [color=blue]
            > I like their approach better, but it's not something you can
            > rely on.[/color]

            Me too ;D


            Comment

            • Fox

              #7
              Re: Capitalize regex



              Marek Mand wrote:[color=blue]
              >
              > <script>
              > var newval = '';
              > var name = 'marek mänd-österreich a';
              >
              > // http://www.faqts.com/knowledge_base/...html/aid/15940
              > correctedname = name.replace(/\b\w+b/g, function(word) {
              > return word.substring( 0,1).toUpperCas e()
              > + word.substring( 1);
              > });
              > alert(corrected name);
              > </script>
              >
              > How would I get "intended output" out from it taht is:
              > "Marek Mänd-Österreich A"
              > Mozilla works as intended and desired, the MSIE fails totally.
              > So what is the "word boundary" regex for other languages than English?
              > I thing Javascript regular expressions are weak and unusable.[/color]

              \S is better at capturing "foreign" characters [charcodes > 128 (they're
              not foreign to you)] than \w -- however, it *includes* the hyphen
              character in its set.
              \S is the same as [^ \f\n\r\t\v] (or [^\s])

              if you add the hyphen to the list:

              String.prototyp e.initialCaps = function()
              {
              return this.replace(/[^\s-]+/g,
              function(str)
              {
              return str.charAt(0).t oUpperCase() + str.substring(1 );
              });
              }

              it should work as expected (at least for your example):

              var name = 'marek mänd-österreich a';

              alert( name.initialCap s() );

              Comment

              • Dr John Stockton

                #8
                Re: Capitalize regex

                JRS: In article <y8oalmag.fsf@h otpop.com>, seen in
                news:comp.lang. javascript, Lasse Reichstein Nielsen <lrn@hotpop.com >
                posted at Sun, 2 May 2004 20:24:39 :[color=blue]
                >Marek Mand <cador.soft@mai l.ee> writes:
                >[color=green]
                >> So I understand it is important by my own explicitly define
                >> what makes up a separator between words.[/color]
                >
                >Yes. Javascript, or more precisely: ECMAScript v3, defines the \b
                >regexp as a boundary between a word character and a non-word
                >character. It also defines word character as [0-9a-zA-Z_]
                >(section 15.10.2.6).[/color]

                That ought to be changed in future ECMA.

                The underline character, and the digits, cannot normally be part of a
                word; but, at least since the days of the fabulous Æsop, certain other
                characters (joined and accented letters, for example) can.

                The simple fix is to change the word "word" to "identifier " or other
                suitable computer-jargon term (not "name").

                After that, the term "word separator" becomes available to mean what it
                means to the ordinary literate Briton, Dane, Estonian, etc.

                --
                © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
                <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang. javascript
                <URL:http://www.merlyn.demo n.co.uk/js-index.htm> jscr maths, dates, sources.
                <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

                Comment

                Working...