Regex for languages other than english

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ricky Romaya

    Regex for languages other than english

    Hi,

    Anybody could show me a regex for capturing words (alphas, without
    numerics) in languages other than english (languages with special
    characters i.e. french, german)? I've tried '[a-zA-Z]+' but the special
    letters for some language (i.e. french) are not captured. The '\w+' works
    fine, but it also include numerics, which I don't want.

    TIA
  • John Dunlop

    #2
    Re: Regex for languages other than english

    Ricky Romaya wrote:
    [color=blue]
    > Anybody could show me a regex for capturing words (alphas, without
    > numerics) in languages other than english (languages with special
    > characters i.e. french, german)? I've tried '[a-zA-Z]+' but the special
    > letters for some language (i.e. french) are not captured. The '\w+' works
    > fine, but it also include numerics, which I don't want.[/color]

    You'd need to specify what characters you want individually
    or, carefully, with a character range, as in [a-z].

    --
    Jock

    Comment

    • Ricky Romaya

      #3
      Re: Regex for languages other than english

      John Dunlop <usenet+2004@jo hn.dunlop.name> wrote in
      news:MPG.1c6d5b b090cb526f98983 5@News.Individu al.NET:
      [color=blue]
      >
      > You'd need to specify what characters you want individually
      > or, carefully, with a character range, as in [a-z].
      >[/color]
      Well, that much I know. The problem is in my native tongue, and english
      (as 2nd language), there are no such special characters. My work requires
      me to also include supports for other languages (such as french, german,
      etc) which I can't speak, let alone write. I don't know the list of those
      special characters and how to input them with ordinary 101 US keyboard.
      Care to point me to a (internet) resource where the complete list of
      those special characters are listed and how to input them?

      BTW, as I said '\w+' works fine, except it also include numerics. Are
      there ways to simulate '\w' without including the numerics, and without
      knowing the list of all special characters?

      TIA

      Comment

      • John Dunlop

        #4
        Re: Regex for languages other than english

        Ricky Romaya wrote:
        [color=blue]
        > The problem is in my native tongue, and english (as 2nd language), there
        > are no such special characters. My work requires me to also include supports
        > for other languages (such as french, german, etc) which I can't speak, let
        > alone write. I don't know the list of those special characters and how to
        > input them with ordinary 101 US keyboard. Care to point me to a (internet)
        > resource where the complete list of those special characters are listed[/color]

        If it isn't English, then I'm afraid I'm not overly familiar
        with it. I think, though you'd better check yourself, that
        German is covered by the Latin-1 alphabet, lists of which
        are abundant on the web; French I think, again I'm not sure,
        uses a character or two, such as the oe ligature, which are
        outside Latin-1.
        [color=blue]
        > and how to input them?[/color]

        How you enter those special characters depends on your
        system. On Windows I would press and hold down the Alt key
        and type the character's position in the native character
        set, in decimal, with a leading zero, on the numeric keypad,
        not on the numbers above the letters. So to type the
        character 'é' (SMALL LETTER E WITH ACUTE ACCENT), hold down
        Alt and using the numeric keypad type 0233.

        In PCREs, you can also enter characters indirectly, by way
        of an escape notation: A backslash followed by the letter
        'x' followed by the code position in hexadecimal (case
        insensitive) of the character; e.g., \xE9 represents 'é'.
        This works both inside and outside of character classes.

        So the regular expressions `^[a-zA-Zé]+$` and `^[a-zA-
        Z\xE9]+$` are equivalent, and can be extended to match other
        special characters.
        [color=blue]
        > BTW, as I said '\w+' works fine, except it also include numerics. Are
        > there ways to simulate '\w' without including the numerics, and without
        > knowing the list of all special characters?[/color]

        There is no PCRE metacharacter for that. Although you can
        specify a character class that would simulate that, you'd
        need to know what characters you want to include.

        Maybe there's another way. PHP keeps on surprising me.

        --
        Jock

        Comment

        • R. Rajesh Jeba Anbiah

          #5
          Re: Regex for languages other than english

          Ricky Romaya wrote:[color=blue]
          > Hi,
          >
          > Anybody could show me a regex for capturing words (alphas, without
          > numerics) in languages other than english (languages with special
          > characters i.e. french, german)? I've tried '[a-zA-Z]+' but the[/color]
          special[color=blue]
          > letters for some language (i.e. french) are not captured. The '\w+'[/color]
          works[color=blue]
          > fine, but it also include numerics, which I don't want.[/color]

          Use something like [\xc8-\xcb]+
          <http://in2.php.net/manual/en/reference.pcre. pattern.syntax. php>

          --
          <?php echo 'Just another PHP saint'; ?>
          Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

          Comment

          Working...