UTF8 encoding - Problem

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Frank Esser

    UTF8 encoding - Problem

    Hello!

    On a PC with German Codepage settings I want to get UTF8 out of string in my
    application.

    I use this function:

    Byte[] array = Encoding.UTF8.G etBytes("à");

    When I look at the Unicode tables then this character is in the Latin table
    and has a hex value of 0x00E0.

    But when I look at my byte array then I see 0xC3A0.

    What's wrong ???

    Thanks!


  • Morten Wennevik

    #2
    Re: UTF8 encoding - Problem

    Hi Frank,

    What you are seeing is correct. You can verify that the value is stored correctly by translating it back using

    Encoding.UTF8.G etString(array)

    You can't assume UTF8 characters will have the same byte values as Unicode for characters above the ASCII range, and 'á' falls into the extended ASCII range. For an explanation to how an UTF8 character is calculated take a look at this page.




    On Thu, 28 Apr 2005 15:19:25 +0200, Frank Esser <Mistral@nurfue rspam.de> wrote:
    [color=blue]
    > Hello!
    >
    > On a PC with German Codepage settings I want to get UTF8 out of string in my
    > application.
    >
    > I use this function:
    >
    > Byte[] array = Encoding.UTF8.G etBytes("à");
    >
    > When I look at the Unicode tables then this character is in the Latin table
    > and has a hex value of 0x00E0.
    >
    > But when I look at my byte array then I see 0xC3A0.
    >
    > What's wrong ???
    >
    > Thanks!
    >
    >
    >[/color]



    --
    Happy coding!
    Morten Wennevik [C# MVP]

    Comment

    Working...