javascript - regular expression - foreign characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Smash

    javascript - regular expression - foreign characters

    i have this function:

    ------------------------------------------------------------
    function isAlfaNumeric(v nos,space) {
    if (space==false) {
    validRegExp = /^[a-zA-Z0-9]{0,}$/;
    }
    else {
    validRegExp = /^[a-zA-Z0-9\s]{0,}$/;
    }
    return vnos.search(val idRegExp)
    }
    -------------------------------------------------------------

    the function is checking if the string "vnos" contains any non-alfanumeric
    characters... it works fine except it returns true if the string contains
    my country characters like ž,š.....i tried to do the following

    validRegExp = /^[a-zA-Z0-9žš]{0,}$/; and also

    validRegExp = /^[a-zA-Z0-9\ž\š]{0,}$/; but result was the same

    Does anyone know how to check for foreign characters in string using regular
    expression??
  • Evertjan.

    #2
    Re: javascript - regular expression - foreign characters

    Smash wrote on 20 jan 2004 in comp.lang.javas cript:
    [color=blue]
    > function isAlfaNumeric(v nos,space) {
    > if (space==false) {
    > validRegExp = /^[a-zA-Z0-9]{0,}$/;
    > }
    > else {
    > validRegExp = /^[a-zA-Z0-9\s]{0,}$/;
    > }
    > return vnos.search(val idRegExp)
    >}
    > -------------------------------------------------------------
    >
    > the function is checking if the string "vnos" contains any
    > non-alfanumeric characters... it works fine except it returns true if
    > the string contains my country characters like z,s.....i tried to do
    > the following
    >
    > validRegExp = /^[a-zA-Z0-9zs]{0,}$/; and also
    >
    > validRegExp = /^[a-zA-Z0-9\z\s]{0,}$/; but result was the same[/color]

    for {0,} use +
    for 0-9 use \d
    \s is all kinds of whitespace, like tabs etc.
    use test, if you test for a string

    try this:

    <SCRIPT>
    function isAlfaNumeric(s ,sp) {
    r = /^[a-zA-Z\džš]+$/;
    rs = /^[a-zA-Z\džš\s]+$/;
    return (sp)? rs.test(s) : r.test(s);
    };

    alert(isAlfaNum eric("12astš",t rue));
    alert(isAlfaNum eric("34astš",f alse));
    alert(isAlfaNum eric("56as tš",true));
    alert(isAlfaNum eric("78as tš",false));
    </SCRIPT>

    If you want to accept empty strings as true, use:

    r = /^[a-zA-Z\džš]*$/;
    rs = /^[a-zA-Z\džš\s]*$/;

    this on works the other way around, accepts empty strings:

    <SCRIPT>
    function isAlfaNumeric(s ,sp) {
    r = /[^a-zA-Z\džš]/;
    rs = /[^a-zA-Z\džš\s]/;
    return ! ((sp)? rs.test(s) : r.test(s));
    };

    alert(isAlfaNum eric("12astš",t rue));
    alert(isAlfaNum eric("34astš",f alse));
    alert(isAlfaNum eric("56as tš",true));
    alert(isAlfaNum eric("78as tš",false));
    </SCRIPT>



    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)

    Comment

    • Lasse Reichstein Nielsen

      #3
      Re: javascript - regular expression - foreign characters

      smashit@email.s i (Smash) writes:
      [color=blue]
      > Does anyone know how to check for foreign characters in string using regular
      > expression??[/color]

      I think the safest is to use the \w esacpe, which matches "word characters".
      That includes letters, international included, digits and the underscore.
      If you can live with that:

      if (space==false) {
      validRegExp = /^[\w]*$/;
      }
      else {
      validRegExp = /^[\w\s]*$/;
      }

      /L
      --
      Lasse Reichstein Nielsen - lrn@hotpop.com
      DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
      'Faith without judgement merely degrades the spirit divine.'

      Comment

      • Dr John Stockton

        #4
        Re: javascript - regular expression - foreign characters

        JRS: In article <72913601.04012 00037.4af29a0@p osting.google.c om>, seen
        in news:comp.lang. javascript, Smash <smashit@email. si> posted at Tue, 20
        Jan 2004 00:37:31 :-
        [color=blue]
        >function isAlfaNumeric(v nos,space) {
        > if (space==false) {[/color]

        if (!space) { // or if (space) and swap the rest

        [color=blue]
        >Does anyone know how to check for foreign characters in string using regular
        >expression??[/color]


        "Foreign" does not mean "non-Anglo"; Americans & British are foreigners
        too.

        AIUI, a string can contain any Unicode character, and there are tens of
        thousands of those, a large proportion of which are letters in some
        language or other. Therefore, to test fully for letters outside A-Za-z,
        one needs in some form or another either a list of *all* letters or a
        list of *all* non-letters, or both.

        I don't know Slovenian; but I guess that it has a relatively small
        number of non-Anglo letters; those could be listed and tested for, but
        that would not be entirely helpful to a Scandinavian visitor.

        There *should* be a javascript function to test whether the current font
        has a specific glyph for a given character, or for all those in a
        string; but AFAIK there is not.

        --
        © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
        <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang. javascript
        <URL:http://www.merlyn.demo n.co.uk/js-index.htm> jscr maths, dates, sources.
        <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

        Comment

        • Evertjan.

          #5
          Re: javascript - regular expression - foreign characters

          Dr John Stockton wrote on 20 jan 2004 in comp.lang.javas cript:[color=blue]
          > There *should* be a javascript function to test whether the current font
          > has a specific glyph for a given character, or for all those in a
          > string; but AFAIK there is not.[/color]

          If we had a Regex syntax for a character above-a/below-a/in-a-range-of
          certain char number(s), even without the knowledge of the specific font,
          that would be nice.

          regex.defineran ge('%3','>#80')
          regex.defineran ge('%5','>#0',' <#20')

          boolean = /aa\%5+bb[^\%3]?/.test(string)


          --
          Evertjan.
          The Netherlands.
          (Please change the x'es to dots in my emailaddress)

          Comment

          • Lasse Reichstein Nielsen

            #6
            Re: javascript - regular expression - foreign characters

            "Evertjan." <exjxw.hannivoo rt@interxnl.net > writes:
            [color=blue]
            > If we had a Regex syntax for a character above-a/below-a/in-a-range-of
            > certain char number(s), even without the knowledge of the specific font,
            > that would be nice.
            >
            > regex.defineran ge('%3','>#80')[/color]
            [color=blue]
            > regex.defineran ge('%5','>#0',' <#20')
            >
            > boolean = /aa\%5+bb[^\%3]?/.test(string)[/color]

            Try:
            var boolean = /aa[\x01-\x1f]+bb[^\x81-\uffff]?/.test(string);
            It says true for
            var string = "aa\n\rbb\u1268 ";
            (which is 7 characters long).

            /L
            --
            Lasse Reichstein Nielsen - lrn@hotpop.com
            DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
            'Faith without judgement merely degrades the spirit divine.'

            Comment

            • Evertjan.

              #7
              Re: javascript - regular expression - foreign characters

              Lasse Reichstein Nielsen wrote on 21 jan 2004 in comp.lang.javas cript:
              [color=blue]
              > Try:
              > var boolean = /aa[\x01-\x1f]+bb[^\x81-\uffff]?/.test(string);
              > It says true for
              > var string = "aa\n\rbb\u1268 ";
              > (which is 7 characters long).
              >[/color]

              [\x01-\x1f] etc

              Nice, never thought of that !

              --
              Evertjan.
              The Netherlands.
              (Please change the x'es to dots in my emailaddress)

              Comment

              • Dr John Stockton

                #8
                Re: javascript - regular expression - foreign characters

                JRS: In article <8yk2uway.fsf@h otpop.com>, seen in
                news:comp.lang. javascript, Lasse Reichstein Nielsen <lrn@hotpop.com >
                posted at Tue, 20 Jan 2004 22:47:33 :-[color=blue]
                >smashit@email. si (Smash) writes:
                >[color=green]
                >> Does anyone know how to check for foreign characters in string using regular
                >> expression??[/color]
                >
                >I think the safest is to use the \w esacpe, which matches "word characters".
                >That includes letters, international included, digits and the underscore.[/color]

                In MSIE4, it does not match É (E-acute), ä (a-umlait), Å (A-ring); and,
                I suppose, others.

                A Netscape 1.3 reference page include(s|d) :
                Matches any alphanumeric character including the underscore.
                Equivalent to [A-Za-z0-9_].

                It would be nice to be able to match *any* letter, including non-anglo;
                but ISTM that \w is fundamentally matching the characters that normally
                appear in identifiers, and there it would be very wrong for that to be
                altered.

                --
                © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
                <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang. javascript
                <URL:http://www.merlyn.demo n.co.uk/js-index.htm> jscr maths, dates, sources.
                <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

                Comment

                • Lasse Reichstein Nielsen

                  #9
                  Re: javascript - regular expression - foreign characters

                  Dr John Stockton <spam@merlyn.de mon.co.uk> writes:
                  [color=blue]
                  > In MSIE4, it does not match É (E-acute), ä (a-umlait), Å (A-ring); and,
                  > I suppose, others.[/color]

                  Yes, that was me misremembering. Bummer. I would have been nice with
                  an escape that matches alphanumeric unicode characters, and not just
                  ASCII ones, and I though ECMAScript had it. That was apparently
                  just wishful thinking.

                  /L
                  --
                  Lasse Reichstein Nielsen - lrn@hotpop.com
                  DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
                  'Faith without judgement merely degrades the spirit divine.'

                  Comment

                  • Dr John Stockton

                    #10
                    Re: javascript - regular expression - foreign characters

                    JRS: In article <ptddnrk3.fsf@h otpop.com>, seen in
                    news:comp.lang. javascript, Lasse Reichstein Nielsen <lrn@hotpop.com >
                    posted at Wed, 21 Jan 2004 18:24:12 :-[color=blue]
                    >
                    >Try:
                    > var boolean = /aa[\x01-\x1f]+bb[^\x81-\uffff]?/.test(string);
                    >It says true for
                    > var string = "aa\n\rbb\u1268 ";
                    >(which is 7 characters long).[/color]

                    But for that approach to do the original job in full, one needs to read
                    the entire Unicode table and decide which squashed spiders are foreign
                    letters and which are foreign non-letters.

                    I've seen AJF's Unicode table in HTML; but I don't recall seeing one
                    written in ISO-7 and intended for simple machine-reading.



                    --
                    © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
                    <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang. javascript
                    <URL:http://www.merlyn.demo n.co.uk/js-index.htm> jscr maths, dates, sources.
                    <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

                    Comment

                    • Evertjan.

                      #11
                      Re: javascript - regular expression - foreign characters

                      Dr John Stockton wrote on 21 jan 2004 in comp.lang.javas cript:[color=blue]
                      > But for that approach to do the original job in full, one needs to read
                      > the entire Unicode table and decide which squashed spiders are foreign
                      > letters and which are foreign non-letters.[/color]

                      A perfect solution is impossible, as long as the unicode is not redesigned
                      to have seperate ranges for both types. And that probably will not happen.

                      For an imperfect solution, say for most European languages, could be done
                      in a standard string along the lines of [\x01-\x1f].

                      Seems a perfect job for you, John, to collect suggestions from many of us
                      about their local lingo needs. ;-}

                      Would the same unicode number stand for different [letter vs nonletter]
                      types in different European languages ?
                      Or in different fonts ?

                      I hope not.

                      --
                      Evertjan.
                      The Netherlands.
                      (Please change the x'es to dots in my emailaddress)

                      Comment

                      • Dr John Stockton

                        #12
                        Re: javascript - regular expression - foreign characters

                        JRS: In article <Xns947863B5276 80eejj99@194.10 9.133.29>, seen in
                        news:comp.lang. javascript, Evertjan. <exjxw.hannivoo rt@interxnl.net >
                        posted at Thu, 22 Jan 2004 08:47:39 :-[color=blue]
                        >Dr John Stockton wrote on 21 jan 2004 in comp.lang.javas cript:[color=green]
                        >> But for that approach to do the original job in full, one needs to read
                        >> the entire Unicode table and decide which squashed spiders are foreign
                        >> letters and which are foreign non-letters.[/color]
                        >
                        >A perfect solution is impossible, as long as the unicode is not redesigned
                        >to have seperate ranges for both types. And that probably will not happen.
                        >
                        >For an imperfect solution, say for most European languages, could be done
                        >in a standard string along the lines of [\x01-\x1f].
                        >
                        >Seems a perfect job for you, John, to collect suggestions from many of us
                        >about their local lingo needs. ;-}
                        >
                        >Would the same unicode number stand for different [letter vs nonletter]
                        >types in different European languages ?
                        >Or in different fonts ?[/color]

                        Read AJF's cited page, and others, on Unicode. Look and see what is
                        actually in Unicode.

                        AIUI, the idea of Unicode is that a given character has a given number,
                        independently of font, size, and language; \u0041 is 'A' and \u0061 is
                        'a' *everywhere*. If it's not \u0061, it's not our 'a', whatever it
                        looks like.

                        In practice, though, a letter only counts as a letter if it is a letter
                        of the current language. In English, Nijmegen has eight letters; I
                        suspect it of having only seven in Dutch, only six of which are English.

                        --
                        © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
                        <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang. javascript
                        <URL:http://www.merlyn.demo n.co.uk/js-index.htm> jscr maths, dates, sources.
                        <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

                        Comment

                        • Evertjan.

                          #13
                          Re: javascript - regular expression - foreign characters

                          Dr John Stockton wrote on 22 jan 2004 in comp.lang.javas cript:[color=blue]
                          > In practice, though, a letter only counts as a letter if it is a
                          > letter of the current language.[/color]

                          I do not think o. It depends on definition, of cource. I would say a
                          letter in computerstrings can also be a letter in another language and
                          be counted as a letter. the u-umlaut [ü] is definitly a letter in
                          English in the sense that it is definitly not a non-letter, like
                          !?#%&.,.
                          [color=blue]
                          > In English, Nijmegen has eight letters; I
                          > suspect it of having only seven in Dutch, only six of which are
                          > English.[/color]

                          This is long since left concept in this time of computer generated and
                          sorted telephone books. The "ij", though it counts a one letter in
                          linguistic Dutch sense has definitely become a two letter "thing" like
                          the "ph".

                          The "ph" however, can also be pronounced in a two letter fassion in
                          words like:

                          poephark
                          ophaalbrug
                          Generaal van Opheusden ;-)

                          If there were a word with the ij pronounced as seperate letters, the j
                          should have two little points [de trema] like an umlaut. This is not
                          available in current fonts, I definitely presume, because the j is
                          usually thought as a consonant.

                          [The above thoughts are not tested on recent or old versions of eastern
                          languages, nor on Netscape]

                          --
                          Evertjan.
                          The Netherlands.
                          (Please change the x'es to dots in my emailaddress)

                          Comment

                          Working...