What are the Minimum requirements for all 17 planes of Unicode in C++?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • SwissProgrammer
    New Member
    • Jun 2020
    • 220

    What are the Minimum requirements for all 17 planes of Unicode in C++?

    What are the Minimum requirements for all 17 planes of Unicode in C++?


    In YOUR EXPERIENCE !
    (Sometimes official descriptions have not been accurate. I want experienced answers.)


    C++0 to C++20; Which version is the minimum that can work with ALL of the 17 planes without requiring a work-around or a third party dll?
    I am not interested in Visual Studio or .net . I used these in the past and I am aware that they are powerful but I specifically do not want them now. Just C++.

    In case you might not know what Unicode "planes" are, see https://en.wikipedia.org/wiki/Plane_%28Unicode%29


    I am currently only using plane 0 and C++11. I want to be able to use all of the planes 0-16. I want to know the minimum requirements in C++.


    Thank you.
  • Banfa
    Recognized Expert Expert
    • Feb 2006
    • 9067

    #2
    I am partly replying because I'd like to know the answer if someone else replies and partly because I'm not sure you are asking the right question.

    Support for Unicode is not a programming language or version matter but rather it is related to the execution environment supported character encoding.

    Execution environment and source code (or build environment) character sets can be different, although they often aren't in the case of building and executing on the same platform.

    So take this example program

    Code:
    int main()
    {
        wchar_t c = '\u0444';
    
        cout << "cout: ф" << endl;
        cout << "cout: " << u8"\u0444" << endl;
        cout << "cout: " << c << endl;
    
        wcout << "wcout: ф" << endl << flush;
        wcout << "wcout: " << u8"\u0444" << endl << flush;
        wcout << "wcout: " << c << endl << flush;
    
        return 0;
    }
    Compiled as C++14 and run in Power Shell I get the following output

    Code:
    cout: Ðä
    cout: Ðä
    cout: d184
    wcout: Ð
    Because Power Shell does not understand Unicode characters; run this command in the Power Shell

    Code:
    $OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
    To tell Power Shell to use UTF8; re-run the same program without recompilation and you get this output

    Code:
    cout: ф
    cout: ф
    cout: d184
    wcout:
    Recompile the program using C++98 and you get this output
    Code:
    cout: ф
    cout: ф
    cout: 53636
    wcout:
    The only thing that has changed is the wchar_t variable is being displayed in decimal instead of hexidecimal.

    Support for Unicode is depressingly non-standard across platforms so it is hard to write portable code using Unicode.

    P.S. I have no idea why only 1 of the 3 wcout lines is producing output in all cases.

    Comment

    • Banfa
      Recognized Expert Expert
      • Feb 2006
      • 9067

      #3
      Sorry that post didn't even try to answer the question, the point I was trying to make was, ignoring your problem of how to put plane 1+ Unicode characters into standard C++ code, outputting them requires a system that understands them.

      Actually using them in code is complicated by standard C++ only really having support for UTF8 (and in theory that only came in with C++11) so outputting a character from plane 1+ then becomes rather painful because, for example character U+1FA0F (Black King rotate from plane 1), even if the system understands this plane and can display it, which isn't a given, in standard C++ your only option would be to use UTF8 encoding which looks something like u8"\xF0\x9f\xA8 \x8F" which has to be looked up by hand and is a pain to type and because I haven't got an environment that knows how to interpret it I don't even know if it is correct.

      I realise this post also doesn't answer the question, (still hoping someone else can) but at least it doesn't answer the question as opposed to not answering a different question.

      Comment

      • SwissProgrammer
        New Member
        • Jun 2020
        • 220

        #4
        Banfa,

        I did not want to lead someone with an answer, but your answer says close to what I have found.
        UTF-8, in my opinion, is the most universal of the UTF options. I have not found any limit to the expandability of the UTF-8 encoding.

        UTF-8, if my memory is correct, was what I was using back in Windows 2000 and (I think) in Windows NT. But, I was not programming in C++ at that time. Therefore, I thought to ask the local C++ experts here.

        Before the Unicode consortium expanded their published scope past plane 0, I used Unicode a lot. Currently, as I transition into C++11, my coding time is greatly limited by my struggling through the learning curve. I was wanting someone with experience in the planes above 0 to speak to the issues encountered.

        Your response, though you might not think it was so appropriate, I enjoyed.

        Thank you.



        One, but not the only, goal that I have with C++11 and Unicode is to be able to have a text box in which someone pastes a Unicode character or sentence and my program automatically shows in another text box the Unicode representation of that input:

        Example input 办法 .

        My program would show the UTF-8 encoding for that and maybe even split it apart into the two words that it contains 办 (ban) , and 法 (fa), each with their own UTF8 encoding.

        But, first I wanted to know more about what I am dealing with in C++. Am I using at least the C++ version that can give me that response? Am I using the version that can give me the correct response for every plane?

        So, I started with the simplest of the questions: Am I using the minimum version of C++ that can do all of this.

        Later, I might have struggled to get the example to work, being confident that the C++ version was capable of doing the job.

        Again, Thank you Banfa.

        .

        Comment

        • dev7060
          Recognized Expert Contributor
          • Mar 2017
          • 655

          #5
          P.S. I have no idea why only 1 of the 3 wcout lines is producing output in all cases.
          Code:
          int main() {
            wchar_t c = '\u0444';
            wcout << "wcout: ф" << endl << flush;
            if (wcout.fail()) {
              cout << "\nwide to narrow conversion didn't succeed; Unicode is not representable in the codepage";
              cout << endl;
              wcout << "\nThis won't get printed. Other wcouts don't have any effect at this point";
              wcout.clear();
            }
            wcout << "wcout: " << u8"\u0444" << endl << flush;
            if (wcout.fail()) {
              cout << "\nattempt #2 didn't succeed";
              cout << endl;
              wcout << "not shown on the console";
              wcout.clear();
              wcout << "hello user\n";
            }
            wcout << "wcout: " << c << endl << flush;
            wcout << "not available on the console as well ";
            return 0;
          }
          also,
          A program should not mix output operations on wcout with output operations on cout
          (or with other narrow-oriented output operations on stdout): Once an output operation has
          been performed on either, the standard output stream acquires an
          orientation (either narrow or wide) that can only be safely changed by calling freopen on stdout.


          In YOUR EXPERIENCE !
          (Sometimes official descriptions have not been accurate. I want experienced answers.)
          Disclaimer: As you specifically asked for an experienced answer, I'm a student and not experienced at all when it comes to professional development. The below is just how I view it with my understanding.

          My understanding of Unicode is that it isn't concerned with a language, as pointed out by Banfa. Every system or environment has kind of its way of dealing with it, has its own character set, and uses workarounds to set up compatibilities with others for exchanging the data. It depends on how encoding is done; what code points are being used, how many bytes for a character, which two code points are combined to represent a new character, what byte order, endian system, etc. Mapping is implementation-dependent.

          Here's a char : 🮕 (it is not showing up on my screen, just copied a random off of Wikipedia)
          In JS console,
          Code:
          console.log("🮕".length)
          shows the output 2.
          In PHP,
          Code:
          echo strlen("🮕")
          shows the output 4.

          One solution is to use a fixed-length encoding across everything like UTF-32 that uses 4 bytes per code point. the con is that it's space inefficient. Imagine a 5 bytes character array in UTF-8 taking 20 bytes in the UTF-32 representation. literally a mess on a larger scale. ASCII's representation will have many leading 0s consuming memory for no reason. Variable-length encoding like in UTF-8 or UTF-16 allocates memory bytes according to the needs and situation.

          Let's say you write a program in an ide. You converted the encodings between char* and wchar_t* back and forth in between function calls. The libraries would process em (or not?) all using the implemented mappings, but the output produced on the terminal may show undefined behavior because it may be supporting the encoding and mappings of the OS. Whatever representation code is sent by our binary to show may not be available in the character map of the OS to produce a relevant output. Windows have UTF-16 implementation hence apps use the same. If you run the same code in Unix or Linux, the output may be different (in UTF-8).

          For having uniformity; I guess the engine, language, os, environment, third party compilers, linkers, ide, libraries, dependencies, databases, binaries, web connections, etc. (whatever is interacting with your encoded data in between) all have to agree on a common set of rules to represent the chars; which would be a hypothetical concept (maybe). I mean, for example, to communicate over the networks, you'd need maximum compression for the fast travel of the packets, hence would choose a variable size encoding. And if you see Java, it stores data as UTF-16 internally and on the other hand, UTF-16 is not used in internet websites because it's incompatible with ASCII. Workarounds seem to be the only solution to build the bridge and for the devs; trial and error if you don't know how the encodings are being done in a system. For example, Java docs states clearly:

          The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. Unicode is an international character set standard which supports all of the major scripts of the world, as well as common technical symbols. The original Unicode specification defined characters as fixed-width 16-bit entities, but the Unicode standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF. An encoding defined by the standard, UTF-16, allows to represent all Unicode code points using one or two 16-bit units.
          The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16. The various types and classes in the Java platform that represent character sequences - char[], implementations of java.lang.CharS equence (such as the String class), and implementations of java.text.Chara cterIterator - are UTF-16 sequences. Most Java source code is written in ASCII, a 7-bit character encoding, or ISO-8859-1, an 8-bit character encoding, but is translated into UTF-16 before processing.
          Ref: https://www.oracle.com/technical-res...lementary.html

          I've used java references just for the demonstration of a system.

          One, but not the only, goal that I have with C++11 and Unicode is to be able to have a text box in which someone pastes a Unicode character or sentence and my program automatically shows in another text box the Unicode representation of that input:
          Here's my guess: Whatever gui library you're using is probably making calls to winapi behind the scenes and using the OS's layout and character set to display everything. When you paste something in the text box if the OS's encoding character set couldn't map it, you probably won't see it properly in the text box field in the first place. Even though you may be able to pass the character to the called event handlers' functions of the library and the background processing may interpret everything correctly (or not), but you have to depend on an external OS to see through the output i.e. you need an external environment to interact with your app anyway same as you need a third party compiler (mingw, gcc, etc.) to process the text and produce the binaries. That's where workarounds come into play. You need a way to make the text recognizable to be displayed properly using third part libs or your logic if you can figure out how stuff is happening behind the scenes.
          Last edited by dev7060; Oct 28 '20, 11:24 AM. Reason: code formatting

          Comment

          • SwissProgrammer
            New Member
            • Jun 2020
            • 220

            #6
            dev7060,

            You pointed directly to the issues. I agree.

            Maybe if I get help one tiny step at a time.

            I do not yet know how to make a Unicode capable, drag-n-drop text box in C++11.

            I would like to be able to input to, and to read from, that text box with both wchar_t* and TCHAR*.

            Help.

            4 hours later (I told you I am new at this) I have the following.

            Code:
            #define _UNICODE
            #define UNICODE
            
            #include <windows.h>
            
            #include<iostream>
            #include <string>
            using namespace std;
            
            #define MAX_LOADSTRING 100
            
            HWND Handle_Main_Window = NULL;
            
            #include <windows.h>
            
            void create_controls( const HWND hwnd );
            
            
            LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
            
            wchar_t g_szClassName[] = L"myWindowClass";
            
            int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
                LPSTR lpCmdLine, int nCmdShow)
            {
                WNDCLASSEX wc;
                MSG Msg;
            
                wc.cbSize        = sizeof(WNDCLASSEX);
                wc.style         = 0;
                wc.lpfnWndProc   = WndProc;
                wc.cbClsExtra    = 0;
                wc.cbWndExtra    = 0;
                wc.hInstance     = hInstance;
                wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
                wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
                wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
                wc.lpszMenuName  = nullptr;
                wc.lpszClassName = g_szClassName;
                wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
            
                if(!RegisterClassEx(&wc))
                {
                    MessageBox(nullptr, L"Window Registration Failed!", L"Error!",
                        MB_ICONEXCLAMATION | MB_OK);
                    return 0;
                }
            
            	Handle_Main_Window = CreateWindowEx(
            		WS_EX_CLIENTEDGE,
                    g_szClassName,
                    L"Title",
                    WS_OVERLAPPEDWINDOW,
                    CW_USEDEFAULT,
                    CW_USEDEFAULT,
                    500,
                    500,
                    nullptr,
                    nullptr,
                    hInstance,
                    nullptr);
            
                if(Handle_Main_Window == NULL)
                {
                    MessageBox(nullptr, L"Window Creation Failed!", L"Error!",
                        MB_ICONEXCLAMATION | MB_OK);
                    return 0;
                }
            
                ShowWindow(Handle_Main_Window, nCmdShow);
                UpdateWindow(Handle_Main_Window);
            
                while(GetMessage(&Msg, nullptr, 0, 0) > 0)
                {
                    TranslateMessage(&Msg);
                    DispatchMessage(&Msg);
                }
                return Msg.wParam;
            }
            
            
            LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
                {
                    switch (msg)
                    {
            
                        case WM_CREATE:
                            create_controls( hWnd );
                            break;
            
                        case WM_COMMAND:
                            switch(LOWORD(wParam)) {
                            case 1:{
                                    ::MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
                                    break;
                                }
            
                            case 2:{
                                    const HWND text_box = GetDlgItem( hWnd, 3 );
                                    const int n = GetWindowTextLength( text_box );
                                    wstring text( n + 1, L'#' );
                                    if( n > 0 )
                                        {
                                            GetWindowText( text_box, &text[0], text.length() );
                                        }
                                    text.resize( n );
                                    ::MessageBox(hWnd, text.c_str(), L"The INPUT TEXT WINDOW", MB_SETFOREGROUND );
                                    break;
                                }
            
                            case 3:{
                                    break;
                                }
            
                            case 4:{
                                    break;
                                }
            
                            case 5:{
                                    const HWND text_box = GetDlgItem( hWnd, 5 );
                                    const int n = GetWindowTextLength( text_box );
                                    wstring text( n + 1, L'#' );
                                    if( n > 0 )
                                        {
                                            GetWindowText( text_box, &text[0], text.length() );
                                        }
                                    text.resize( n );
                                    ::MessageBox(hWnd, L"SAVE BUTTON was clicked", L"message from SAVE BUTTON", MB_SETFOREGROUND );
                                    break;
                                }
            
                            default:{
                                }
                        }
                        break;
            
                        case WM_CLOSE:{
                                DestroyWindow(hWnd);
                                break;
                            }
            
                        case WM_DESTROY:{
                                PostQuitMessage(0);
                            }
            
                        default:{
                                return DefWindowProc(hWnd, msg, wParam, lParam);
                            }
                    }
                    return FALSE;
                }
            
            void create_controls( const HWND hwnd )
                {
            
                    CreateWindow( L"BUTTON",
                        L"PUSH BUTTON 1",
                        WS_VISIBLE | WS_CHILD | WS_BORDER,
                        10,10,
                        130,20,
                        hwnd, (HMENU) 1, GetModuleHandle( nullptr ), nullptr
                        )  ;
            
                    CreateWindow( L"EDIT",
                        L"INPUT TEXT WINDOW",
                        WS_VISIBLE | WS_CHILD | WS_BORDER,
                        10,50,
                        200,25,
                        hwnd, (HMENU) 3, GetModuleHandle( nullptr ), nullptr
                        );
            
                    CreateWindow( L"BUTTON",
                        L"SAVE BUTTON",
                        WS_VISIBLE | WS_CHILD | WS_BORDER,
                        10,80,
                        110,20,
                        hwnd, (HMENU) 5, GetModuleHandle( nullptr ), nullptr
                        );
            
                    CreateWindow( L"EDIT",
                        L"OUTPUT TEXT WINDOW",
                        WS_VISIBLE | WS_CHILD | WS_BORDER,
                        10,130,
                        300,300,
                        hwnd, (HMENU) 4, GetModuleHandle( nullptr ), nullptr
                        );
                }
            I can paste "Example input 办法 ." into the INPUT BOX, but what do I do with it next? I want to be able to click the button below that and see the UTF8 representation in the bottom box.




            Help please.

            Banfa said: "Support for Unicode is not a programming language or version matter but rather it is related to the execution environment supported character encoding."
            I agree.
            This is a start to having my program to be able to adapt to that.
            I am trying to get to the final answer of my original question in this post. One step at a time.

            Thank you.
            Attached Files

            Comment

            • SioSio
              Contributor
              • Dec 2019
              • 272

              #7
              I did some research.
              UTF-8: In order to be compatible with ASCII characters, the same part as ASCII is encoded with 1 byte, and the other parts are encoded with 2-6 bytes. In a 4-byte sequence, up to 21 bits (0x1FFFFF) can be expressed, but those representing 17 or more planes outside the Unicode range (larger than U + 10FFFF) are not accepted.
              UTF-16, UTF-32: Unlike UTF-8, it is not ASCII compatible.
              Therefore, the condition that meets the requirement of # 1 is to look for a version of C++ that supports UTF-8.

              Support status of UTF-8 depending on the version of C++

              C++17 can process UTF-8 data as "char" data. This allows you to use std::regex, std::fstream, std::cout, etc. without loss.
              In C++20, we added char8_t and std::u8string for UTF-8. However, UTF is not supported at all due to the lack of std::u8fstream. Therefore, we need a way to convert between UTF-8 and the execution character set.

              Comment

              • SioSio
                Contributor
                • Dec 2019
                • 272

                #8
                I forgot to write.
                In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code.

                Comment

                • Banfa
                  Recognized Expert Expert
                  • Feb 2006
                  • 9067

                  #9
                  Looks like you are using WIN32 API. Windows GUI natively uses UTF16, I believe and the WIN32 API has wide char and multibyte versions of many characters, signified by a post fix W or A.

                  It also has a set of helper functions Unicode and Character Set Functions and I think the one you are interested in is WideCharToMulti Byte.

                  I know there are Googleable examples out there.

                  Comment

                  • dev7060
                    Recognized Expert Contributor
                    • Mar 2017
                    • 655

                    #10
                    I can paste "Example input 办法 ." into the INPUT BOX, but what do I do with it next? I want to be able to click the button below that and see the UTF8 representation in the bottom box.
                    Like this?
                    Code:
                    case 5: {
                      HWND InputTextBox = GetDlgItem(hWnd, 3);
                      const int n = GetWindowTextLength(InputTextBox);
                      wstring text(n + 1, L '#');
                      if (n > 0) {
                        GetWindowText(InputTextBox, & text[0], text.length());
                      }
                      const wchar_t * wcs = text.c_str();
                      SetDlgItemText(hWnd, 4, wcs);
                    }


                    If you're using UTF-8 chars inside the Code::Blocks,
                    Settings -> Editor -> Encoding -> Change it from 'default' to UTF-8

                    Code::Blocks is smart enough to change the encoding automatically to prevent losing data. But it would do that temporarily for every time you click on build.

                    Cygwin environment can be used for CLI testing. It supports UTF-8. https://www.cygwin.com/

                    Attached Files

                    Comment

                    • Banfa
                      Recognized Expert Expert
                      • Feb 2006
                      • 9067

                      #11
                      How's it going?

                      You probably need a couple of helper functions

                      Convert wide character to multibyte character aka UTF16 to UTF8
                      Code:
                      // Convert a wide Unicode string to an UTF8 string
                      std::string utf8_encode(const std::wstring &wstr)
                      {
                          if (wstr.empty())
                          {
                              return std::string();
                          }
                      
                          int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), NULL, 0, NULL, NULL);
                      
                          char buffer[size_needed+1];
                          WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, NULL, NULL);
                      
                          std::string strTo( buffer );
                          return strTo;
                      }
                      Get the character values of the multibyte character string
                      Code:
                      std::wstring utf8_byte_values(const std::string &str)
                      {
                          if (str.empty())
                          {
                              return std::wstring();
                          }
                      
                          bool first = true;
                          std::wstringstream out;
                      
                          for(auto iter = str.begin(); iter != str.end(); ++iter)
                          {
                              if (first)
                              {
                                  first = false;
                              }
                              else
                              {
                                  out << L" ";
                              }
                      
                              unsigned int value = ((unsigned)*iter) & 0xFF;
                              out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
                          }
                      
                          return out.str();
                      }
                      Then you Save Button code could look something like
                      Code:
                             case BTN_SAVE:
                              {
                                  const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
                                  const int n = GetWindowTextLength( in_box  );
                                  if( n > 0 )
                                  {
                                      wchar_t text[n+1]; // +1 for terminator
                                      GetWindowText( in_box, text, n+1 );
                                      string utf8 = utf8_encode(wstring(text));
                                      // Force calling of ASCII/UTF8 version
                                      SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
                                  }
                      //            text.resize( n );
                                  break;
                              }
                      Note I defined symbols for all your dialog item ids to aid readability.

                      More importantly note that WIN32 API is a C API and expects C style strings, that is '\0' terminated. This does not play nicely with C++ particularly C++ strings because they are not '\0' terminated which makes passing &text[0] to a WIN32 API where text is a std::string or std::wstring a very risky business. Instead, if the WIN32 API accepts a constant pointer prefer text.c_str() or if the WIN32 API function expects a non-constant pointer use a standard C array and convert to a std::(w)string later.

                      Of course I have been slightly naughty in my code and used variable length arrays which a C rather than a C++ feature but my GNU compiler lets me get away with that with a warning :D

                      Comment

                      • SwissProgrammer
                        New Member
                        • Jun 2020
                        • 220

                        #12
                        SioSio, Thank you.

                        I feel like I should parse the input into ASCII and non-ASCII first.

                        Then, I should parse the non-ASCII incoming text and characters and test each as to how well they work in UTF-8 first, then in UTF-16 (to see if it is larger than U + 10FFFF). Compare the results. Thus at least finding out if I am receiving input that is in plane 0 or plane 1+.

                        Then respond into the second text box with the resultant U.



                        Separately:
                        You said, "In the C++11 Standard Library, UTF-8 is not supported for string and integer conversion functions, and I/O functions. Therefore, it needs to be converted to the system multibyte character code."
                        In my CODE::BLOCKS 17.12 Settings/Editor/General settings/Encoding settings I have been using UTF-8 with the following choices chosen:
                        / "As default encoding (bypassing C::B's auto-detection)"
                        / "If conversion fails using the settings above, try system local settings".

                        But, I am concerned about system local settings on a user's computer that is different from my tested systems. Maybe then I should just catch any errors of such and deal with that separately.

                        I think that this is correct. What do you think? How would you handle this?


                        Thank you.

                        Comment

                        • SwissProgrammer
                          New Member
                          • Jun 2020
                          • 220

                          #13
                          Banfa, Thank you.

                          When I started learning C++11 I used WideCharToMulti Byte and MultiByteToWide Char.

                          They seemed to work. But I read, maybe 2 or 3 places, that these should be avoided. I should have asked here at that time, but I did not. Since I see you using them, I shall use them with more confidence.


                          You used:
                          Code:
                                  std::wstringstream out;
                          For that I got
                          error: aggregate 'std::wstringst ream out' has incomplete type and cannot be defined

                          For future readers:
                          I added
                          Code:
                          #include <sstream>
                          which fixed that.
                          You used:
                          Code:
                                       out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
                          For that I got
                          error: 'setw' is not a member of 'std'

                          For future readers:
                          I added
                          Code:
                          #include <iomanip>
                          which fixes that.


                          You used:
                          Code:
                                          const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
                          which I changed to:
                          Code:
                          const HWND in_box = GetDlgItem(hWnd, 3);
                          I like the EDT_INPUT_TEXT but I am not certain how to get my CreateWindow to use that. So, I used 3 instead.


                          You used:
                          Code:
                                              SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
                          which I changed to
                          Code:
                                              SetDlgItemText(hWnd, 4, utf8_byte_values(utf8).c_str());
                          Again, I like the way that you did it, but I am having difficulty getting your line to work.


                          I am not certain what the
                          Code:
                          //            text.resize( n );
                          is. But thank you.


                          It works. Thank you. I am getting closer to being able to test on different platforms in different versions of C++.

                          For 办
                          I get 0xe5 0x8a 0x9e

                          Getting closer.

                          Lots of times I have wanted to see an update of the progress of code changes that other people were working on.

                          For future readers here is what currently works for me:
                          Code:
                          #define _UNICODE
                          #define UNICODE
                          
                          #include <windows.h>
                          
                          #include <iostream>
                          #include <sstream>      // for std::wstringstream
                          #include <iomanip>      // for std::setw
                          #include <string>
                          using namespace std;
                          
                          #define MAX_LOADSTRING 100
                          
                          HWND Handle_Main_Window = NULL;
                          
                          #include <windows.h>
                          
                          void create_controls( const HWND hwnd );
                          
                          
                          LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam);
                          
                          wchar_t g_szClassName[] = L"myWindowClass";
                          
                          // Previous declarations
                              std::string utf8_encode(const std::wstring &wstr);
                              std::wstring utf8_byte_values(const std::string &str);
                          
                          
                              // Convert a wide Unicode string to an UTF8 string
                              std::string utf8_encode(const std::wstring &wstr)
                              {
                                  if (wstr.empty())
                                  {
                                      return std::string();
                                  }
                          
                                  int size_needed = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), nullptr, 0, nullptr, nullptr);
                          
                                  char buffer[size_needed+1];
                                  WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), (int)wstr.size(), buffer, size_needed+1, nullptr, nullptr);
                          
                                  std::string strTo( buffer );
                                  return strTo;
                              }
                          
                          
                          
                              std::wstring utf8_byte_values(const std::string &str)
                              {
                                  if (str.empty())
                                  {
                                      return std::wstring();
                                  }
                          
                                  bool first = true;
                                  std::wstringstream out;
                                  // error: aggregate 'std::wstringstream out' has incomplete type and cannot be defined
                          
                                  // I found this in <iosfwd>
                                  // Class for @c wchar_t mixed input and output memory streams.
                                  //   typedef basic_stringstream<wchar_t> 	wstringstream;
                                  // Is that something from Visual Studio or maybe a later version of Code:Blocks?
                          
                                  for(auto iter = str.begin(); iter != str.end(); ++iter)
                                  {
                                      if (first)
                                      {
                                          first = false;
                                      }
                                      else
                                      {
                                          out << L" ";
                                      }
                          
                                      unsigned int value = ((unsigned)*iter) & 0xFF;
                                      out << L"0x" << std::hex << std::setw(2) << std::setfill(L'0') << value;
                                  }
                          
                                  return out.str();
                              }
                          
                          
                          
                          int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
                              LPSTR lpCmdLine, int nCmdShow)
                          {
                              WNDCLASSEX wc;
                              MSG Msg;
                          
                              wc.cbSize        = sizeof(WNDCLASSEX);
                              wc.style         = 0;
                              wc.lpfnWndProc   = WndProc;
                              wc.cbClsExtra    = 0;
                              wc.cbWndExtra    = 0;
                              wc.hInstance     = hInstance;
                              wc.hIcon         = LoadIcon(nullptr, IDI_APPLICATION);
                              wc.hCursor       = LoadCursor(nullptr, IDC_ARROW);
                              wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
                              wc.lpszMenuName  = nullptr;
                              wc.lpszClassName = g_szClassName;
                              wc.hIconSm       = LoadIcon(nullptr, IDI_APPLICATION);
                          
                              if(!RegisterClassEx(&wc))
                              {
                                  MessageBox(nullptr, L"Window Registration Failed!", L"Error!",
                                      MB_ICONEXCLAMATION | MB_OK);
                                  return 0;
                              }
                          
                          	Handle_Main_Window = CreateWindowEx(
                          		WS_EX_CLIENTEDGE,
                                  g_szClassName,
                                  L"Title",
                                  WS_OVERLAPPEDWINDOW,
                                  CW_USEDEFAULT,
                                  CW_USEDEFAULT,
                                  500,
                                  500,
                                  nullptr,
                                  nullptr,
                                  hInstance,
                                  nullptr);
                          
                              if(Handle_Main_Window == NULL)
                              {
                                  MessageBox(nullptr, L"Window Creation Failed!", L"Error!",
                                      MB_ICONEXCLAMATION | MB_OK);
                                  return 0;
                              }
                          
                              ShowWindow(Handle_Main_Window, nCmdShow);
                              UpdateWindow(Handle_Main_Window);
                          
                              while(GetMessage(&Msg, nullptr, 0, 0) > 0)
                              {
                                  TranslateMessage(&Msg);
                                  DispatchMessage(&Msg);
                              }
                              return Msg.wParam;
                          }
                          
                          
                          LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
                              {
                                  switch (msg)
                                  {
                          
                                      case WM_CREATE:
                                          create_controls( hWnd );
                                          break;
                          
                                      case WM_COMMAND:
                                          switch(LOWORD(wParam)) {
                                          case 1:{
                                                  ::MessageBox( hWnd, L"PUSH BUTTON 1 was clicked", L"message from PUSH BUTTON 1", MB_SETFOREGROUND );
                                                  break;
                                              }
                          
                                          case 2:{
                                                  const HWND text_box = GetDlgItem( hWnd, 3 );
                                                  const int n = GetWindowTextLength( text_box );
                                                  wstring text( n + 1, L'#' );
                                                  if( n > 0 )
                                                      {
                                                          GetWindowText( text_box, &text[0], text.length() );
                                                      }
                                                  text.resize( n );
                                                  ::MessageBox(hWnd, text.c_str(), L"The INPUT TEXT WINDOW", MB_SETFOREGROUND );
                                                  break;
                                              }
                          
                                          case 3:{
                                                  break;
                                              }
                          
                                          case 4:{
                                                  break;
                                              }
                          
                                         case 5:  //BTN_SAVE:
                                          {
                          //                    const HWND in_box = GetDlgItem( hWnd, EDT_INPUT_TEXT );
                                              const HWND in_box = GetDlgItem(hWnd, 3);
                                              const int n = GetWindowTextLength( in_box  );
                                              if( n > 0 )
                                              {
                                                  wchar_t text[n+1]; // +1 for terminator
                                                  GetWindowText( in_box, text, n+1 );
                                                  string utf8 = utf8_encode(wstring(text));
                                                  // Force calling of ASCII/UTF8 version
                          //                        SetDlgItemText( hWnd, EDT_OUTPUT_TEXT, utf8_byte_values(utf8).c_str());
                                                  SetDlgItemText(hWnd, 4, utf8_byte_values(utf8).c_str());
                          
                                              }
                                  //            text.resize( n );
                                              break;
                                          }
                          
                                          default:{
                                              }
                                      }
                                      break;
                          
                                      case WM_CLOSE:{
                                              DestroyWindow(hWnd);
                                              break;
                                          }
                          
                                      case WM_DESTROY:{
                                              PostQuitMessage(0);
                                          }
                          
                                      default:{
                                              return DefWindowProc(hWnd, msg, wParam, lParam);
                                          }
                                  }
                                  return FALSE;
                              }
                          
                          void create_controls( const HWND hwnd )
                              {
                          
                                  CreateWindow( L"BUTTON",
                                      L"PUSH BUTTON 1",
                                      WS_VISIBLE | WS_CHILD | WS_BORDER,
                                      10,10,
                                      130,20,
                                      hwnd, (HMENU) 1, GetModuleHandle( nullptr ), nullptr
                                      )  ;
                          
                                  CreateWindow( L"EDIT",
                                      L"办",
                                      WS_VISIBLE | WS_CHILD | WS_BORDER,
                                      10,50,
                                      200,25,
                                      hwnd, (HMENU) 3, GetModuleHandle( nullptr ), nullptr
                                      );
                          
                                  CreateWindow( L"BUTTON",
                                      L"SAVE BUTTON",
                                      WS_VISIBLE | WS_CHILD | WS_BORDER,
                                      10,80,
                                      110,20,
                                      hwnd, (HMENU) 5, GetModuleHandle( nullptr ), nullptr
                                      );
                          
                                  CreateWindow( L"EDIT",
                                      L"OUTPUT TEXT WINDOW",
                                      WS_VISIBLE | WS_CHILD | WS_BORDER,
                                      10,130,
                                      300,300,
                                      hwnd, (HMENU) 4, GetModuleHandle( nullptr ), nullptr
                                      );
                              }

                          Thank you.

                          Comment

                          • Banfa
                            Recognized Expert Expert
                            • Feb 2006
                            • 9067

                            #14
                            You used:

                            Code:
                            **SetDlgItemText(*hWnd,*EDT_OUTPUT_TEXT,*utf8_byte_values(utf8).c_str());
                            which I changed to

                            Code:
                            **SetDlgItemText(hWnd,*4,*utf8_byte_values(utf8).c_str());
                            Again, I like the way that you did it, but I am having difficulty getting your line to work.
                            Code:
                            #define EDT_OUTPUT_TEXT 4
                            At the top of the file.

                            If you have to use a number more than once it is a magic number. Magic numbers are very poor practice and you remove them by assigning them to a symbol, actually in C++ a const variable should be preferred to create this type of constant.

                            Code:
                            const int EDT_OUTPUT_TEXT = 4;
                            But this is WIN32 which I used with C so a reverted to #define.
                            Last edited by Banfa; Oct 31 '20, 05:50 PM.

                            Comment

                            • SioSio
                              Contributor
                              • Dec 2019
                              • 272

                              #15
                              Tip 1.
                              An example of determining whether a character string contains non-alphanumeric symbols.
                              Code:
                              #include <iostream>
                              #include <regex>
                              
                              /**
                               * @brief Determine if it is an alphanumeric symbol.
                               *
                               * @return true:only alphanumeric / false:Contains non-alphanumeric symbols
                               */
                              bool IsAlphabetNumericSymbol(std::string src)
                              {
                              	std::regex pattern("^[a-zA-Z0-9!-/:-@\[-`{-~]+$");
                              	std::smatch sm;
                              	if (std::regex_match(src, sm, pattern))
                              	{
                              		return true;
                              	}
                              	else
                              	{
                              		return false;
                              	}
                              }
                              
                              int main()
                              {
                              	// Only alphanumeric case
                              	std::cout << IsAlphabetNumericSymbol("abc012@") << std::endl;
                              
                              	// Contains non-alphanumeric symbols case
                              	std::cout << IsAlphabetNumericSymbol("1漢字A") << std::endl;
                              	return 0;
                              }
                              Tip 2.
                              "123漢字ABC" shown in UTF-16 is 16 bytes.

                              Tip 3.
                              Mutual conversion UTF-8 <=> UTF-16
                              Code:
                              inline std::wstring convertUtf8ToUtf16(char const* iString)
                              {
                                  std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
                                  return converter.from_bytes(iString);
                              }
                              
                              inline std::string convertUtf16ToUtf8(wchar_t const* iString)
                              {
                                  std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
                                  return converter.to_bytes(iString);
                              }
                              Referenced URL.


                              I hope you find this information helpful.

                              Comment

                              Working...