ReadString and Unicode

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • leejwen
    New Member
    • Jun 2007
    • 50

    ReadString and Unicode

    I'm confused with Unicode. Can I read a unicode text or ASCII text file by using same codes below. If not, how can I do that? (compiler, vc++ 2005)

    CFile fDict;
    CArchive arDict(&fDict, CArchive::load) ;
    CString dictLine = _T("");
    while(arDict.Re adString(dictLi ne))
    {
    ...
    }
  • Laharl
    Recognized Expert Contributor
    • Sep 2007
    • 849

    #2
    This looks like C# to me, perhaps it belongs there?

    Comment

    • leejwen
      New Member
      • Jun 2007
      • 50

      #3
      Originally posted by Laharl
      This looks like C# to me, perhaps it belongs there?
      This is pure c++ code. I expect ReadString can identify if the opened file is Unicode, but seems disappointing if I'm not wrong.

      Comment

      • Banfa
        Recognized Expert Expert
        • Feb 2006
        • 9067

        #4
        Originally posted by Laharl
        This looks like C# to me, perhaps it belongs there?
        Nope but it's not pure C++ either it is MFC (microsoft foundation class library) a class library that is rather old and rather bloated and mainly just a thin wrapper to WIN32.


        CString will support unicode if your code is compiled to support unicode otherewise you might want to string CStringW.

        CArchive should also support unicode characters if the system is being built with unicode support.

        Comment

        • leejwen
          New Member
          • Jun 2007
          • 50

          #5
          Originally posted by Banfa
          Nope but it's not pure C++ either it is MFC (microsoft foundation class library) a class library that is rather old and rather bloated and mainly just a thin wrapper to WIN32.


          CString will support unicode if your code is compiled to support unicode otherewise you might want to string CStringW.

          CArchive should also support unicode characters if the system is being built with unicode support.
          It does support when you compile. But my problem is its exe seems not being able to support both. For example, I need this exe to be able to open Unicode file as well as ASCII file, as it's said to transparent. So, do I have to write IsTextUnicode finally?

          Comment

          • weaknessforcats
            Recognized Expert Expert
            • Mar 2007
            • 9214

            #6
            Originally posted by leejwen
            For example, I need this exe to be able to open Unicode file as well as ASCII
            You will need code to do that. The Unicode/ASCII setting compiles your code one way or the other but not both.

            I suggest you a) create a .exe for Unicode, b) create another .exe for ASCII and then in your main program call CreateProcess and start the correct .exe.

            Comment

            • leejwen
              New Member
              • Jun 2007
              • 50

              #7
              Originally posted by weaknessforcats
              You will need code to do that. The Unicode/ASCII setting compiles your code one way or the other but not both.

              I suggest you a) create a .exe for Unicode, b) create another .exe for ASCII and then in your main program call CreateProcess and start the correct .exe.
              Sorry for coming late. Is the way current software handle this problem. It seems I'll blame microsoft again. I think it's not difficult to have a judgement inside ReadString() function.

              Comment

              • JosAH
                Recognized Expert MVP
                • Mar 2007
                • 11453

                #8
                Originally posted by leejwen
                Sorry for coming late. Is the way current software handle this problem. It seems I'll blame microsoft again. I think it's not difficult to have a judgement inside ReadString() function.
                This is one of the rare occasions where you can't blame Microsoft because it is
                extremely difficult or impossible to 'guess' the encoding of a character set.

                We're in the middle of a transition phase, i.e. we still use bytes as the smallest
                addressable memory unit but we want to use Unicode characters that don't fit
                in a single byte. We have to encode those characters. Many encoding schemes
                exist (UTF-8 and UTF-16 being the most 'universal' for now) but we have to
                explicitly know the encoding that was used in order to properly decode those
                byte streams.

                a convention is to prepend the stream with 0xfffe or 0xfeff telling the endianess
                of the encoding, but that's about it and even that convention isn't used consistently
                all over the place.

                kind regards,

                Jos

                Comment

                • leejwen
                  New Member
                  • Jun 2007
                  • 50

                  #9
                  Originally posted by JosAH
                  This is one of the rare occasions where you can't blame Microsoft because it is
                  extremely difficult or impossible to 'guess' the encoding of a character set.

                  We're in the middle of a transition phase, i.e. we still use bytes as the smallest
                  addressable memory unit but we want to use Unicode characters that don't fit
                  in a single byte. We have to encode those characters. Many encoding schemes
                  exist (UTF-8 and UTF-16 being the most 'universal' for now) but we have to
                  explicitly know the encoding that was used in order to properly decode those
                  byte streams.

                  a convention is to prepend the stream with 0xfffe or 0xfeff telling the endianess
                  of the encoding, but that's about it and even that convention isn't used consistently
                  all over the place.

                  kind regards,

                  Jos
                  Thank you! This is helpful.

                  Comment

                  Working...