unicode (UCS-2 encoded)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • wael

    unicode (UCS-2 encoded)

    hello all,
    i want convert w_char to UCS2 encoded (0041) this is a char encoded UCS2
    please look at this



    every language has a chart
    bye example
    char 'A' = 0041--> (UCS encoded)

    char 'any other language' = 0628 (this is in differ language)
    i hope you uderstand what i mean



    --
    thank you for take time read this
    best regards
    wael ahmed
  • Jason

    #2
    Re: unicode (UCS-2 encoded)

    Do you mean that you want to convert locale specific strings like ASCII,
    utf8, big5, etc into unicode UCS2 two byte entities, and then store them in
    a wchar_t?

    When porting a web browser, we had a type tChar16 to put unicode into; it's
    worthwhile using typedefs anyway, even though C++ has wchar_t. We wrote our
    own language conversion libraries, it depends what you want to do. It's
    probably not so much of a C++ question. Look up internationaliz ation on the
    web, in relation to C++


    Comment

    • wael

      #3
      Re: unicode (UCS-2 encoded)

      "Jason" <@> wrote in message news:<3f468dcd@ shknews01>...[color=blue]
      > Do you mean that you want to convert locale specific strings like ASCII,
      > utf8, big5, etc into unicode UCS2 two byte entities, and then store them in
      > a wchar_t?
      >
      > When porting a web browser, we had a type tChar16 to put unicode into; it's
      > worthwhile using typedefs anyway, even though C++ has wchar_t. We wrote our
      > own language conversion libraries, it depends what you want to do. It's
      > probably not so much of a C++ question. Look up internationaliz ation on the
      > web, in relation to C++[/color]


      i have wchar_t charcter
      suppose it is (16 bit) and like this

      wchar_t ch = L'A';
      //char A encoded UCS-2 is 0041
      how do i get ch in 2 bytes UCS encoded ??


      some data stored in data base UCS-2
      0041 0065 etc ... 0628 0627 ....
      thank you for take time read this

      Comment

      • John Harrison

        #4
        Re: unicode (UCS-2 encoded)


        "wael" <twaeltt@hotmai l.com> wrote in message
        news:f7db3cfc.0 308240115.59f86 b1@posting.goog le.com...[color=blue]
        > "Jason" <@> wrote in message news:<3f468dcd@ shknews01>...[color=green]
        > > Do you mean that you want to convert locale specific strings like ASCII,
        > > utf8, big5, etc into unicode UCS2 two byte entities, and then store them[/color][/color]
        in[color=blue][color=green]
        > > a wchar_t?
        > >
        > > When porting a web browser, we had a type tChar16 to put unicode into;[/color][/color]
        it's[color=blue][color=green]
        > > worthwhile using typedefs anyway, even though C++ has wchar_t. We wrote[/color][/color]
        our[color=blue][color=green]
        > > own language conversion libraries, it depends what you want to do. It's
        > > probably not so much of a C++ question. Look up internationaliz ation on[/color][/color]
        the[color=blue][color=green]
        > > web, in relation to C++[/color]
        >
        >
        > i have wchar_t charcter
        > suppose it is (16 bit) and like this
        >
        > wchar_t ch = L'A';
        > //char A encoded UCS-2 is 0041
        > how do i get ch in 2 bytes UCS encoded ??
        >
        >
        > some data stored in data base UCS-2
        > 0041 0065 etc ... 0628 0627 ....
        > thank you for take time read this[/color]

        Like this?

        wchar_t ch = L'A';
        char bytes[2];
        bytes[0] = ch/256; // byte[0] == 0x00
        bytes[1] = ch%256; // byte[1] == 0x41

        Still not completely clear what you are trying to do.

        john


        Comment

        • Jason

          #5
          Re: unicode (UCS-2 encoded)

          I am afraid I don't understand the question either. Perhaps you can tell us
          what data your reading, and if it's in Unicode or ASCII and what you want to
          do with it. Instead of talking about wchar_t's and UCS-2 straight away.


          Comment

          • John Harrison

            #6
            Re: unicode (UCS-2 encoded)


            "wael" <twaeltt@hotmai l.com> wrote in message
            news:f7db3cfc.0 308240536.2fc85 fcd@posting.goo gle.com...[color=blue]
            > "John Harrison" <john_andronicu s@hotmail.com> wrote in message[/color]
            news:<bia0qk$6v mre$1@ID-196037.news.uni-berlin.de>...[color=blue][color=green]
            > > "wael" <twaeltt@hotmai l.com> wrote in message
            > > news:f7db3cfc.0 308240115.59f86 b1@posting.goog le.com...[color=darkred]
            > > > "Jason" <@> wrote in message news:<3f468dcd@ shknews01>...
            > > > > Do you mean that you want to convert locale specific strings like[/color][/color][/color]
            ASCII,[color=blue][color=green][color=darkred]
            > > > > utf8, big5, etc into unicode UCS2 two byte entities, and then store[/color][/color][/color]
            them[color=blue][color=green]
            > > in[color=darkred]
            > > > > a wchar_t?
            > > > >[/color]
            > >
            > > Like this?
            > >
            > > wchar_t ch = L'A';
            > > char bytes[2];
            > > bytes[0] = ch/256; // byte[0] == 0x00
            > > bytes[1] = ch%256; // byte[1] == 0x41
            > >
            > > Still not completely clear what you are trying to do.
            > >
            > > john[/color]
            >
            > thank you for help ,
            > let me be more clear:-
            > 1- i receive text from ascii socket like this ( i will assume it is on
            > {0041 , 0628 , 0627 , 0042}
            > each 4 chars is UCS-2 encoded hex as defined in
            > http://www.unicode.org/charts/
            > 00 prifx for english 41 char as defined in chart
            > http://www.unicode.org/charts/PDF/U0000.pdf
            > 06 prifx for arabic 28 is char as defined in chart
            > http://www.unicode.org/charts/PDF/U0600.pdf
            > also i receive data for other languages like greek
            >
            > i want convert incoming string to wchar_t and from wchar_t to send it
            > by socket
            >[/color]

            OK, try again, perhaps like this?

            ascii_data is your string of unicode numbers seperated by commas and with a
            leading { and trailing }. I.e. what you read from the socket. At the end
            unicode_data is a wide string of Unicode characters, you can write that to
            your other socket.

            #include <algorithm>
            #include <istream>
            #include <sstream>
            #include <string>

            int main()
            {
            std::string ascii_data = "{0041 , 0628 , 0627 , 0042}";
            // remove leading and trailing {}
            ascii_data.eras e(ascii_data.be gin());
            ascii_data.eras e(ascii_data.en d() - 1);
            // replace commas with spaces
            std::replace(as cii_data.begin( ), ascii_data.end( ), ',', ' ');
            // use string as stream
            std::istringstr eam str(ascii_data) ;
            // read hex numbers from stream
            std::wstring unicode_data;
            unsigned char_value;
            while (str >> std::hex >> char_value)
            {
            unicode_data.pu sh_back(static_ cast<wchar_t>(c har_value));
            }
            }

            john


            Comment

            • John Harrison

              #7
              Re: unicode (UCS-2 encoded)


              "wael" <twaeltt@hotmai l.com> wrote in message
              news:f7db3cfc.0 308250449.3c9f6 4a@posting.goog le.com...[color=blue]
              > sorry ,
              > //bytes order
              > this was by mistake
              > //i am sorry if i can not let you understand may be for my bad language
              > i try encode wchar according 'The Unicode Standard and ISO/IEC 10646'
              > this is the format which i need
              > please look at:
              > Figure 1-1. Wide ASCII
              > http://www.unicode.org/book/uc20ch1.html
              > the picture display how chars look like in binary format
              > You will find 'A' is = '0000 0000 0100 0001' = (in hex) '0 0 4 1'
              > Other char down (arabic char) = '0000 0110 0011 0011' = in hex '0 6 3 3'
              > the problem is if i try get the binary string of this arabic char
              > it is '0000 0000 1101 0011' which != '0000 0110 0011 0011'
              > for that i must convert '0000 0000 1101 0011' to ISO/IEC 10646'
              >
              >
              >
              > sorry for distrub you john
              > and thank you for try help me
              >
              > thanks[/color]

              Wael, at last I think I understand you!

              You have characters encoded in one character set and you want to convert
              them to Unicode. Do you know which character set you have currently? I think
              there are two commonly used character sets for Arabic, one is IS0-8859-6,
              the other is CP1256 Do you know which you have?

              Whatever you have it should just a be a simple matter of setting up a table
              to convert between them.

              Here is a table that converts CP1256 to Unicode



              and here's a table that converts ISO-8859-6 to Unicode



              Hope this helps.
              john



              You have characters which are encoding using IS0-8859-6 (Arabic)


              Comment

              • John Ericson

                #8
                Re: unicode (UCS-2 encoded)

                "wael" <twaeltt@hotmai l.com> wrote in message
                news:f7db3cfc.0 308250449.3c9f6 4a@posting.goog le.com...[color=blue]
                > sorry ,
                > //bytes order
                > this was by mistake
                > //i am sorry if i can not let you understand may be for my[/color]
                bad language[color=blue]
                > i try encode wchar according 'The Unicode Standard and[/color]
                ISO/IEC 10646'[color=blue]
                > this is the format which i need
                > please look at:
                > Figure 1-1. Wide ASCII
                > http://www.unicode.org/book/uc20ch1.html
                > the picture display how chars look like in binary format
                > You will find 'A' is = '0000 0000 0100 0001' = (in hex) '0[/color]
                0 4 1'[color=blue]
                > Other char down (arabic char) = '0000 0110 0011 0011' =[/color]
                in hex '0 6 3 3'[color=blue]
                > the problem is if i try get the binary string of this[/color]
                arabic char[color=blue]
                > it is '0000 0000 1101 0011' which != '0000 0110 0011[/color]
                0011'[color=blue]
                > for that i must convert '0000 0000 1101 0011' to ISO/IEC[/color]
                10646'

                <snip>

                You might want to check out
                http://www.dinkumware.com/libDCorX.html CoreX library
                character set converters, google on Plauger and/or Pete
                Becker IIRC for more info in this newsgroup (apologies if
                I'm misunderstandin g what you're trying to do).

                - -
                Best Regards, John E.


                Comment

                • wael

                  #9
                  Re: unicode (UCS-2 encoded)

                  "John Ericson" <jericson@pacbe ll.net> wrote in message news:<W8p2b.578 1$Qw7.3083@news svr25.news.prod igy.com>...[color=blue]
                  > "wael" <twaeltt@hotmai l.com> wrote in message
                  > news:f7db3cfc.0 308250449.3c9f6 4a@posting.goog le.com...[color=green]
                  > > sorry ,
                  > > //bytes order
                  > > this was by mistake
                  > > //i am sorry if i can not let you understand may be for my[/color]
                  > bad language[color=green]
                  > > i try encode wchar according 'The Unicode Standard and[/color]
                  > ISO/IEC 10646'[color=green]
                  > > this is the format which i need
                  > > please look at:
                  > > Figure 1-1. Wide ASCII
                  > > http://www.unicode.org/book/uc20ch1.html
                  > > the picture display how chars look like in binary format
                  > > You will find 'A' is = '0000 0000 0100 0001' = (in hex) '0[/color]
                  > 0 4 1'[color=green]
                  > > Other char down (arabic char) = '0000 0110 0011 0011' =[/color]
                  > in hex '0 6 3 3'[color=green]
                  > > the problem is if i try get the binary string of this[/color]
                  > arabic char[color=green]
                  > > it is '0000 0000 1101 0011' which != '0000 0110 0011[/color]
                  > 0011'[color=green]
                  > > for that i must convert '0000 0000 1101 0011' to ISO/IEC[/color]
                  > 10646'
                  >
                  > <snip>
                  >
                  > You might want to check out
                  > http://www.dinkumware.com/libDCorX.html CoreX library
                  > character set converters, google on Plauger and/or Pete
                  > Becker IIRC for more info in this newsgroup (apologies if
                  > I'm misunderstandin g what you're trying to do).
                  >
                  > - -
                  > Best Regards, John E.[/color]
                  thank you John E.
                  for try help
                  http://www.dinkumware.com/libDCorX.html CoreX library
                  is this library is free >???? and for VC++ 6 ??

                  thankx

                  Comment

                  Working...