regex with accents

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • albert

    regex with accents

    Hi,

    I can't get the characters with accents in a regex. This is my code :
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    var MyText1 = "éléphant1" ;
    var MyText2 = "elephant1" ;
    var MyReg = /^[\w]+$/ ;

    if(MyReg.test(M yText1))
    alert(MyText1 + " is OK") ;
    else
    alert(MyText1 + " is not valid") ;


    if(MyReg.test(M yText2))
    alert(MyText2 + " is OK") ;
    else
    alert(MyText2 + " is not valid") ;
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Here's what I get :
    éléphant1 is not valid
    elephant1 is OK

    I'd like éléphant1 to be OK, but I can't.
    Can you help me ?

    Thanks in advance,

    Albert


  • Douglas Crockford

    #2
    Re: regex with accents

    albert wrote:
    I can't get the characters with accents in a regex. This is my code :
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    var MyText1 = "�l�pha nt1" ;
    var MyText2 = "elephant1" ;
    var MyReg = /^[\w]+$/ ;
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Here's what I get :
    �l�phant1 is not valid
    elephant1 is OK
    >
    I'd like �l�phant1 to be OK, but I can't.
    Can you help me ?
    ECMA262 15.10.2.12 defines \w as being equivalent to the character class
    [0-1A-za-z_]. The w suggests word, but that is deceptive. Support for
    internationaliz ation in JavaScript's RegExp is virtually nonexistent.

    You need to define your own character class.


    Comment

    • albert

      #3
      Re: regex with accents

      ECMA262 15.10.2.12 defines \w as being equivalent to the character class
      [0-1A-za-z_]. The w suggests word, but that is deceptive. Support for
      internationaliz ation in JavaScript's RegExp is virtually nonexistent.
      >
      You need to define your own character class.
      How can I do so ?


      albert


      Comment

      • Evertjan.

        #4
        Re: regex with accents

        albert wrote on 22 sep 2007 in comp.lang.javas cript:
        >ECMA262 15.10.2.12 defines \w as being equivalent to the character
        >class [0-1A-za-z_]. The w suggests word, but that is deceptive.
        >Support for internationaliz ation in JavaScript's RegExp is virtually
        >nonexistent.
        >>
        >You need to define your own character class.
        >
        How can I do so ?
        var MyReg = /^[\wáéíóäëiöúàèìì ù]+$/i;

        Depending on your local requirements.



        --
        Evertjan.
        The Netherlands.
        (Please change the x'es to dots in my emailaddress)

        Comment

        • albert

          #5
          Re: regex with accents

          var MyReg = /^[\wáéíóäëiöúàèìì ù]+$/i;
          >
          Depending on your local requirements.
          >
          --
          Evertjan.
          The Netherlands.
          (Please change the x'es to dots in my emailaddress)
          I've got french... that's no pb.
          But I also have arabic & hebrew, this is more difficult.


          albert


          Comment

          • Evertjan.

            #6
            Re: regex with accents

            albert wrote on 22 sep 2007 in comp.lang.javas cript:
            >var MyReg = /^[\wáéíóäëiöúàèìì ù]+$/i;
            >>
            >Depending on your local requirements.
            [please do not quote signatures on usenet. removed]
            >
            I've got french... that's no pb.
            pb? [please no sms-language on usenet]
            But I also have arabic & hebrew, this is more difficult.
            Why should it be easy?

            Javascript accommodates unicode.

            --
            Evertjan.
            The Netherlands.
            (Please change the x'es to dots in my emailaddress)

            Comment

            • Dr J R Stockton

              #7
              Re: regex with accents

              In comp.lang.javas cript message <S_8Ji.28213$eY .19207@newssvr1 3.news.pro
              digy.net>, Sat, 22 Sep 2007 13:44:18, Douglas Crockford
              <nospam@sbcglob al.netposted:
              >
              >ECMA262 15.10.2.12 defines \w as being equivalent to the character
              >class [0-1A-za-z_]. The w suggests word, but that is deceptive. Support
              >for internationaliz ation in JavaScript's RegExp is virtually
              >nonexistent.
              <URL:http://www.merlyn.demo n.co.uk/humourous.htm#F redHoyleadvises <G>
              :-
              Fred Hoyle (1915-2001) :-
              "'Dam’ good idea. Always force foreigner to learn English.'"
              Alexis Ivan Alexandrov, in "The Black Cloud", Chap. 10, para 4.

              --
              (c) John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v6.05 MIME.
              Web <URL:http://www.merlyn.demo n.co.uk/- FAQqish topics, acronyms & links;
              Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc.
              No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.

              Comment

              • albert

                #8
                Re: regex with accents

                >I've got french... that's no pb.
                >
                pb? [please no sms-language on usenet]
                pb = problem (sorry, I thought it was obvious).
                >
                >But I also have arabic & hebrew, this is more difficult.
                >
                Why should it be easy?
                I've never said it should be easy. Don't waste time to answer here...
                >
                Javascript accommodates unicode.
                >
                Well I tried a simple word in Arabic with the following regex :

                ^[\w]+$

                still, the "test" function always returned false. Do you have any good
                working example about it ?


                thx, oops, soory I meant "Thanks" ;-)


                albert


                Comment

                • Evertjan.

                  #9
                  Re: regex with accents

                  albert wrote on 23 sep 2007 in comp.lang.javas cript:
                  >>I've got french... that's no pb.
                  >>
                  >pb? [please no sms-language on usenet]
                  >
                  pb = problem (sorry, I thought it was obvious).
                  Not to me. Usenet has it's own limited set of abbreviations.
                  If any Pb perhaps would be lead.
                  >>But I also have arabic & hebrew, this is more difficult.
                  >>
                  >Why should it be easy?
                  >
                  I've never said it should be easy. Don't waste time to answer here...
                  You are the OP, so ...
                  >Javascript accommodates unicode.
                  >>
                  >
                  Well I tried a simple word in Arabic with the following regex :
                  >
                  ^[\w]+$
                  Would you allow for figures 0-9?
                  Otherwise this is better for simple Latin chars:

                  /^[a-z]+$/i
                  still, the "test" function always returned false.
                  I showed you how to do that with accents,
                  did you understand the regex?

                  Why would Arabic characters match
                  where accented characters do not?
                  Do you have any good
                  working example about it ?
                  I am not into working examples, but will gve you a hint.

                  Arabic should work the same as accented ones:

                  /^[a-z\u0600-\u06ff]+$/

                  [http://unicode.org/charts/PDF/U0600.pdf]

                  Not knowing Arabic I cannot test that.
                  thx, oops, soory I meant "Thanks" ;-)
                  --
                  Evertjan.
                  The Netherlands.
                  (Please change the x'es to dots in my emailaddress)

                  Comment

                  • albert

                    #10
                    Re: regex with accents

                    You are the OP, so ...

                    Now it's my turn :-)
                    What does OP mean ?
                    >>
                    >Well I tried a simple word in Arabic with the following regex :
                    >>
                    >^[\w]+$
                    >
                    Would you allow for figures 0-9?
                    Yes
                    Otherwise this is better for simple Latin chars:
                    >
                    /^[a-z]+$/i
                    >
                    >still, the "test" function always returned false.
                    >
                    I showed you how to do that with accents,
                    did you understand the regex?
                    Yes
                    >
                    Why would Arabic characters match
                    where accented characters do not?
                    You're right.
                    >
                    >Do you have any good
                    >working example about it ?
                    >
                    I am not into working examples, but will gve you a hint.
                    >
                    Arabic should work the same as accented ones:
                    >
                    /^[a-z\u0600-\u06ff]+$/
                    >
                    [http://unicode.org/charts/PDF/U0600.pdf]
                    >
                    Not knowing Arabic I cannot test that.
                    I tested. It works :-)

                    Thank you for your help !


                    albert


                    Comment

                    Working...