UTF-8 and C string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mike

    UTF-8 and C string

    Hi there,

    Here is my question:

    If I pass a value to a string, like "xyz\xc2\xbfwww ", then the runtime
    value (VC++)of this string is "xyz¿www". Is this runtime value in
    UTF-8 encoding? How can I check this?

    Thanks a lot.

    Mike
  • Roger Leigh

    #2
    Re: UTF-8 and C string

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Mike.Hao@gmail. com (Mike) writes:
    [color=blue]
    > If I pass a value to a string, like "xyz\xc2\xbfwww ", then the
    > runtime value (VC++)of this string is "xyz¿www ". Is this runtime
    > value in UTF-8 encoding? How can I check this?[/color]

    Walk the string and print it out as hex, byte by byte.

    On my Linux system, GCC encodes all narrow strings as UTF-8 and all
    wide strings as UCS-4. How they are displayed to the user (the output
    encoding) depends on the locale, which causes them to be recoded on
    the fly if required.

    The following test should be portable, but does require that your
    compiler accept UTF-8 source (recode it if required)

    Regards,
    Roger


    #include <locale.h>
    #include <stdio.h>
    #include <string.h>
    #include <wchar.h>

    int main(void)
    {
    setlocale(LC_AL L, "");

    const char *narrow = "Test Unicode (narrow): ïàý Ноя けたいと願 う!\n";
    fprintf(stdout, "%s\n", narrow);

    fprintf(stdout, "Narrow bytes:\n");
    for (int i = 0; i< strlen(narrow); ++i)
    fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)narrow+i));

    if (fwide (stderr, 1) <= 0)
    fprintf(stdout, "Failed to set stderr to wide orientation\n") ;

    const wchar_t *wide = L"Test Unicode (wide): ïàý Ноя けたいと願 う!\n";
    fwprintf(stderr , L"\n%ls\n", wide);

    fwprintf(stderr , L"\nNarrow-to-wide: %s\n", narrow);

    fprintf(stdout, "\nWide-to-narrow: %ls\n", wide);

    fprintf(stdout, "Wide bytes:\n");
    for (int i = 0; i< (wcslen(wide) * sizeof(wchar_t) ); ++i)
    fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)wide+i));

    return 0;
    }

    - --
    Roger Leigh
    Printing on GNU/Linux? http://gimp-print.sourceforge.net/
    Debian GNU/Linux http://www.debian.org/
    GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.0 (GNU/Linux)
    Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourc eforge.net/>

    iD8DBQFCNMIuVcF caSW/uEgRAneFAJwLvrX idezttj2ZdhTer4 50Q796wQCgjrDL
    SfeNBsrg/ggtOoA7s0iU8ew=
    =0zUE
    -----END PGP SIGNATURE-----

    Comment

    Working...