Unicode characters and System.Globalization.CultureInfo

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Dilip

    Unicode characters and System.Globalization.CultureInfo


    I ran into a situation at work regarding unicode character encodings
    and .NET cultures that left me a tad bit confused.

    I was trying to instantiate a CultureInfo object from a locale
    identifying a south chinese destination called the Hmong. Its usually
    represented as hm-HMN. I peruse the NativeName property to extract
    the name of the culture and display it on my application. When I do:

    CultureInfo ci = new CultureInfo("hm-HMN");

    ci.NativeName prints something that looks like this:

    H'mong

    However between the letter 'o' and 'n' I see what the unicode
    consortium calls the replacement character (http://en.wikipedia.org/
    wiki/Replacement_cha racter), which is basically a diamond with a
    question mark inside it. Reading through that section on Replacement
    Character in the wikipedia link it appears that the character appears
    whenever the application is not able to decode the original byte
    stream correctly and when it can't it replaces it with 0xfffd.

    What I would like to know is what exactly is causing this problem?

    1) Does the native windows API or whatever is called when I
    instantiate a new CultureInfo (I haven't had a chance to reflector
    into it yet) object encodes that character differently but .NET is not
    able to display it because it is trying to decode it using UTF-16
    rules?

    2) Or is it because the character cannot be displayed because the
    default code page is set at 1252?

    Can anyone offer some insights on how to get it to display the
    characters correctly and also clue me in on the differences between
    encodings and code pages?

    thanks!

Working...