Translate UTF16 into lower ascii

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Bob

    Translate UTF16 into lower ascii

    Is there an easy way to translate odd UTF8/16 characters (like letters
    with umlauts, vowels with accent symbols above) into the closest
    'look-alike' lower ascii equivalent (A-Z, a-z)?

    This is something that has probably been done, but I can't think of a
    good search key for finding the code.

  • Bob

    #2
    Re: Translate UTF16 into lower ascii

    On Fri, 21 Nov 2008 02:44:24 -0500, "Michael B. Trausch"
    <mike@trausch.u swrote:
    >On Fri, 21 Nov 2008 01:56:04 -0500
    >Bob <Bob@nospam.com wrote:
    >
    >Is there an easy way to translate odd UTF8/16 characters (like letters
    >with umlauts, vowels with accent symbols above) into the closest
    >'look-alike' lower ascii equivalent (A-Z, a-z)?
    >>
    >This is something that has probably been done, but I can't think of a
    >good search key for finding the code.
    >
    >There may be a library out there somewhere, but I am sure that it is so
    >obscure that I can't find it.
    >
    >Your best best would be to try to transliterate what you can and drop
    >what you can't transliterate. A table-based approach would be the only
    >way I can see being able to do it reasonably. Maybe looking for a list
    >of transliteration s that you could preprocess into a table would be
    >ideal?
    >
    > --- Mike
    Very likely that someone has already done this, as there are occasions
    that plain 'lower ascii' must be used, like on cell phone keypads. If
    someone wanted to enter the name "Andre" on a cell phone, there would
    be no access to an E with the accent over it.

    Now, to find it...

    Comment

    • Bob

      #3
      Re: Translate UTF16 into lower ascii

      On Fri, 21 Nov 2008 09:00:53 +0100, Jérémy Jeanson
      <jeremy.jeanson @free.frwrote:
      >System.Text.AS CIIEncoding have some methodes to convert, translate
      >chars. you can find many exmeple in MSDN
      Entirely appropriate to hear from someone with two accents in their
      name. <G Good example here, as I wouldn't know how to type your name
      as you have it spelled above. And you wouldn't want to drop the two
      E's...you'd translate to lower ascii E when necessary.

      I presume that you're referring to the Decoder.Convert functions via
      ASCIIEncoding classes. I didn't see anything that looked like it would
      do this.

      Comment

      • Michael Justin

        #4
        Re: Translate UTF16 into lower ascii

        Bob wrote:
        Very likely that someone has already done this, as there are occasions
        that plain 'lower ascii' must be used, like on cell phone keypads. If
        someone wanted to enter the name "Andre" on a cell phone, there would

        Really? I have all umlauts available on my mobile (and it is not a
        special or expensive model). It depends on the language setting, if it
        is set to English then there are no special characters of course. Think
        about Chinese or Japanese mobiles, they do not have 2000+ tiny keys -
        but I guess you can send Chinese text using the keypad somehow...

        Michael

        Comment

        • MC

          #5
          Re: Translate UTF16 into lower ascii


          "Bob" <Bob@nospam.com wrote in message news:8dlci45ll6 5ietjf373srgng9 3u747cqt1@4ax.c om...
          Is there an easy way to translate odd UTF8/16 characters (like letters
          with umlauts, vowels with accent symbols above) into the closest
          'look-alike' lower ascii equivalent (A-Z, a-z)?

          This is something that has probably been done, but I can't think of a
          good search key for finding the code.
          Check an earlier thread here about "Remove accents" or somesuch.

          The key idea is to "normalize" the Unicode in such a way that the accents become combining characters (e.g., the acute accent is a separate character from the letter it appears on), then remove the combining characters (which have codes in a particular, high-numbered range).

          Comment

          Working...