char * to wchar_t * conversion

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Prasad1980
    New Member
    • Aug 2012
    • 2

    char * to wchar_t * conversion

    Hi all,

    I am new to this form. I am developing an application where
    I need to parse the below oracle line and extract the
    UNISTR('\9091\0 908\090A'abc\90 9B') the unicodes(utf16 codes) out of it and convert all to utf8 multi bytes.
    The problem I am facing is :

    1. I have the char *string. I have prefixed \x to each of the unicode in the above line.
    like
    string [i] = '//';
    string [i + 1 ] = 'x';

    Now I have the string modified as /x9091/x0909/x090Aabc/x909B.
    I am passing this string to

    mbstowcs(&wchar _ptr, char_ptr, length);

    then i am passing wchar_ptr to

    wcstombs (output_char_pt r, wchar_ptr,lengt h);

    But I am not getting the actual unicode chars or equivalent hex bytes in utf8 . I getting the same thing as above
    /x9091/x0909/x090Aabc/x909B.

    But if i do wchar_ptr = L"/x9091/x0909/x090Aabc/x909B". it works. constant works.

    But I want to pass the dynaminc data. How to do this.?

    why wcstombs is not converting the above unicodes to respective utf8 chars?


    Am I doing anything wrong here. Please help me.
  • weaknessforcats
    Recognized Expert Expert
    • Mar 2007
    • 9214

    #2
    Code:
    string [i] = '//';   <<<<<----------!
     string [i + 1 ] = 'x';
    You cannot use a string object for Unicode. A string object has a char ( CHAR at MSFT) whereas a Unicode string has a wchar_t (WCHAR at MSFT).

    The equivalent in C++ is a wstring which has wchar_t elements.

    Try using that and post again. I fou are using Windows, then create your Unicode strings with a Microsoft allocator like SysAllocString and avoid the generic C++ string and wstring objects.

    There's a whole chapter about this in the book Windows via C/C++ by Jeffrey Richter.

    Comment

    • Prasad1980
      New Member
      • Aug 2012
      • 2

      #3
      Thanks weaknessforcats for your reply.

      I am on Linux unfortunately. I don;t know about the magic behind L"" it works only for static strings. It works for me. But I wanted something dynamic. I have framed the string look like sequence of hex strings. If I put under L and declare it statically it works. I see the characters (unicode) if pass it thru string without L"" it doesn't work. is there way I can do it Linux?

      Comment

      • weaknessforcats
        Recognized Expert Expert
        • Mar 2007
        • 9214

        #4
        The L is an instruction to the compiler to generate wchar_t for your literal istead of char it's no wonder it won't work during run time. In any cae I advise against using L because while it uses 16 bits for each character, ir does not encrypt the character correctly in all cases.

        Here's a place to start: http://hektor.umcs.lublin.pl/~mikosm...x-unicode.html

        I caution that you need to use already written API functions to parse your Unicode strings unless it is your intent to rewrite the Universe.

        I would search the web for an example I could copy and change the variabe names to my own or perhaps embed the example inisde a function I would write.

        Some Unicode encrytpions do not use 16 bits for each character but will use 8 or 16 depending upon the character. Unless you know the encryption you simple cannot read the string. That's why you need an already written library of functions.
        Last edited by weaknessforcats; Aug 7 '12, 01:43 AM. Reason: typo

        Comment

        Working...