Unicode Characters Size

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Gobi Sakthivel
    New Member
    • Jan 2013
    • 26

    Unicode Characters Size

    Code:
    std::string str ="����������";
    std::cout<<str.length()<<std::endl;

    this code returns 20, hence each unicode character is considered as 4 bytes, but the same code in java returns 10, where each unicode character is considered as 2 bytes.

    My Question why java and c++ treats different size for the same kind of character, also if it is because of their own implementation, then give me a way to convert the c++ unicode strings to java readable unicode strings.

    Eg:
    Code:
    std::string cpluscplustoJava(std::string str)
    {
    /** 
     * This function should return the same 
     * string with length as 10 byes..:P, It is possible??
     */
    }
  • Gobi Sakthivel
    New Member
    • Jan 2013
    • 26

    #2
    Actually my string contains only 5 characters which is of square shape (unicode character), the bytes.com viewer shows like 10 characters as '?'. Don't get confused by it.

    Comment

    • weaknessforcats
      Recognized Expert Expert
      • Mar 2007
      • 9214

      #3
      You can't use string for Unicode. Unicode characters are 16 bits (2 bytes).

      Instead of char you use wchar_t and instead of string and cout you use wstring and wcout. the w is for "wide character".

      Only if the 16 bit pattern in the wchar_t matches a valid Unicode character do you have a Unicode character. Be advised that mapping (called encoding) varies in Unicode so some characters are 8 while others are 16.

      With Windows you convert using WideCharToMulti Byte().

      You usually need to call OS functions to create Unicode strings.

      Thus is a bigger topic than can be addressed in a thread like this.

      Comment

      • Gobi Sakthivel
        New Member
        • Jan 2013
        • 26

        #4
        @weeknessforcat s Thanks for your reply, can you give me a link to understand the unicode much better in C++.

        Comment

        • weaknessforcats
          Recognized Expert Expert
          • Mar 2007
          • 9214

          #5
          Try this: http://www.cprogramming.com/tutorial/unicode.html

          Comment

          Working...