wchar_t -> UTF-8?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jon Willeke

    wchar_t -> UTF-8?

    This feels like a FAQ, but I've been unable to find a satisfactory
    answer. Given a Unicode encoding (such as UCS-4) in wchar_t, I want to
    convert to UTF-8 (or another locale-specific encoding) in a manner not
    entirely unlike the following:

    wstring w = L"H\xe9llo";

    locale loc( "en_US.UTF-8" );
    wcout.imbue( loc );
    wcout << w << endl;

    I've tried Visual C++ 6.0 and Borland C++ 5.6.4 on Windows, as well as
    GCC 3.3.1 on Linux. They don't seem to do anything approximately like
    this. I tried plugging in a hand-written codecvt subclass, but it
    doesn't seem to be used.
  • John Ericson

    #2
    Re: wchar_t -&gt; UTF-8?

    "Jon Willeke" <j.dot.willeke@ verizon.dot.net > wrote in
    message news:PLxVb.8466 $M8.7247@nwrdny 02.gnilink.net. ..[color=blue]
    > This feels like a FAQ, but I've been unable to find a[/color]
    satisfactory[color=blue]
    > answer. Given a Unicode encoding (such as UCS-4) in[/color]
    wchar_t, I want to[color=blue]
    > convert to UTF-8 (or another locale-specific encoding) in[/color]
    a manner not[color=blue]
    > entirely unlike the following:
    >
    > wstring w = L"H\xe9llo";
    >
    > locale loc( "en_US.UTF-8" );
    > wcout.imbue( loc );
    > wcout << w << endl;
    >
    > I've tried Visual C++ 6.0 and Borland C++ 5.6.4 on[/color]
    Windows, as well as[color=blue]
    > GCC 3.3.1 on Linux. They don't seem to do anything[/color]
    approximately like[color=blue]
    > this. I tried plugging in a hand-written codecvt[/color]
    subclass, but it[color=blue]
    > doesn't seem to be used.[/color]

    IIRC, Dinkumware has a library for various code conversions.
    You might want to Google a bit in comp.lang.c++ and
    comp.lang.c++.m oderated, since there are some good threads
    in there on the various issues. Be prepared for some quirks,
    depending on your system. Best regards, JE


    Comment

    • Tilman Kuepper

      #3
      Re: wchar_t -&gt; UTF-8?

      Hello Jon,
      [color=blue]
      > [...] I tried plugging in a hand-written codecvt
      > subclass, but it doesn't seem to be used.[/color]

      You can find some codecvt-facets as part of the
      arabica xml parser toolkit:


      Boost has similar facets ready for download:
      Latest news coverage, email, free stock quotes, live scores and video are just the beginning. Discover more every day at Yahoo!


      Plauger wrote a pair of columns in the April and
      May 1999 editions of the C/C++ Users Journal. You
      can find the source code on the cuj server:
      £300 loans – perfect for unexpected events Featuring a variety of repayment methods, you can borrow £300 to suit your...


      Tilman


      Comment

      Working...