MSXML and UTF-8 chinese characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • K

    MSXML and UTF-8 chinese characters

    I've an XML file in UTF-8.
    It contains some chinese characters ( both simplified chinese and
    traditional chinese).

    In loading the XML file with MSXML parser, I used the below code to retrieve
    the data in a node. The CString was then display in CListCtrl. For the
    traditional chinese characters, they were shown correctly, but for
    simplified characters, I encounted many "?", but some characters were
    correct.

    if (MSXML::NODE_EL EMENT == pChild->nodeType)
    {
    MSXML::IXMLDOMN amedNodeMapPtr pAttrs = pChild->attributes;
    MSXML::IXMLDOMN odePtr pAttr;

    pAttr = pAttrs->getNamedItem(L "id");
    CString id = OLE2T(pAttr->text);

    MSXML::IXMLDOMN odePtr pWording = pChild->firstChild;
    CString wording = OLE2T(pWording->text);

    //add the wording to language
    pMessageLanguag e->m_wordingList. insert(MessageW ordingListPair( id,
    wording) );

    }


  • Jochen Kalmbach

    #2
    Re: MSXML and UTF-8 chinese characters

    K wrote:
    [color=blue]
    > I've an XML file in UTF-8.
    > It contains some chinese characters ( both simplified chinese and
    > traditional chinese).
    >
    > In loading the XML file with MSXML parser, I used the below code to
    > retrieve the data in a node. The CString was then display in
    > CListCtrl. For the traditional chinese characters, they were shown
    > correctly, but for simplified characters, I encounted many "?", but
    > some characters were correct.
    >[/color]

    You should compile with UNICODE and _UNICODE defined!
    Or you have to convert the unicode to MBCS...

    --
    Greetings
    Jochen

    Do you need a memory-leak finder ?

    Comment

    • MerkX Zyban

      #3
      Re: MSXML and UTF-8 chinese characters

      K,

      Does your XML file begin with the following line?

      <?xml version="1.0" encoding="UTF-8" ?>

      If not, add this line and see what happens. If you do have this line (or
      you add it) and still have problems, then you may be using characters that
      Windows cannot support or your fonts cannot display (i.e. traditional
      Chinese).

      Windows supports Unicode up to version 2.1 only. The XML parser converts
      your XML source to UTF-16 and parsed internally. When the XML parser sees
      the line above it will convert your XML file from UTF-8 with no loss of
      information. However, without this line (specifically without the encoding
      clue) the system default ANSI code page will be used when converting to
      UTF-16.

      Even with this line, you may still have characters that your fonts can't
      display, however no loss in the conversion to/from UTF-8 will occur.

      Hope this helps (and I hope I know what I'm talking about :-)

      -MerkX



      "K" <k@taka.com> wrote in message
      news:#Dhca63dDH A.1044@TK2MSFTN GP10.phx.gbl...[color=blue]
      > I've an XML file in UTF-8.
      > It contains some chinese characters ( both simplified chinese and
      > traditional chinese).
      >
      > In loading the XML file with MSXML parser, I used the below code to[/color]
      retrieve[color=blue]
      > the data in a node. The CString was then display in CListCtrl. For the
      > traditional chinese characters, they were shown correctly, but for
      > simplified characters, I encounted many "?", but some characters were
      > correct.
      >
      > if (MSXML::NODE_EL EMENT == pChild->nodeType)
      > {
      > MSXML::IXMLDOMN amedNodeMapPtr pAttrs = pChild->attributes;
      > MSXML::IXMLDOMN odePtr pAttr;
      >
      > pAttr = pAttrs->getNamedItem(L "id");
      > CString id = OLE2T(pAttr->text);
      >
      > MSXML::IXMLDOMN odePtr pWording = pChild->firstChild;
      > CString wording = OLE2T(pWording->text);
      >
      > //add the wording to language
      > pMessageLanguag e->m_wordingList. insert(MessageW ordingListPair( id,
      > wording) );
      >
      > }
      >
      >[/color]


      Comment

      • K

        #4
        Re: MSXML and UTF-8 chinese characters

        My project was compiling as UNICODE build, and my XML was begin with the
        <?xml ... ?> line, but my problem is still persist.

        After reading in the node in MSXML, can I use the macro OLE2T then assign it
        to a CStirng ??

        What does CSTring store internally ?? I'm using VS.NET to compile my
        projects.

        I can see and edit the xml file in DreamWaver, so the fonts must be
        supported by my system. However, after loading up the XML file by MSXML, and
        get the node, and assigned to a CString, and display it out, the problem
        happends, for some simplified chinese becomes "?", but some are okay.



        "MerkX Zyban" <MerkX@NetWand. com> wrote in message
        news:ukoRisAeDH A.1832@TK2MSFTN GP09.phx.gbl...[color=blue]
        > K,
        >
        > Does your XML file begin with the following line?
        >
        > <?xml version="1.0" encoding="UTF-8" ?>
        >
        > If not, add this line and see what happens. If you do have this line (or
        > you add it) and still have problems, then you may be using characters that
        > Windows cannot support or your fonts cannot display (i.e. traditional
        > Chinese).
        >
        > Windows supports Unicode up to version 2.1 only. The XML parser converts
        > your XML source to UTF-16 and parsed internally. When the XML parser sees
        > the line above it will convert your XML file from UTF-8 with no loss of
        > information. However, without this line (specifically without the[/color]
        encoding[color=blue]
        > clue) the system default ANSI code page will be used when converting to
        > UTF-16.
        >
        > Even with this line, you may still have characters that your fonts can't
        > display, however no loss in the conversion to/from UTF-8 will occur.
        >
        > Hope this helps (and I hope I know what I'm talking about :-)
        >
        > -MerkX
        >
        >
        >
        > "K" <k@taka.com> wrote in message
        > news:#Dhca63dDH A.1044@TK2MSFTN GP10.phx.gbl...[color=green]
        > > I've an XML file in UTF-8.
        > > It contains some chinese characters ( both simplified chinese and
        > > traditional chinese).
        > >
        > > In loading the XML file with MSXML parser, I used the below code to[/color]
        > retrieve[color=green]
        > > the data in a node. The CString was then display in CListCtrl. For the
        > > traditional chinese characters, they were shown correctly, but for
        > > simplified characters, I encounted many "?", but some characters were
        > > correct.
        > >
        > > if (MSXML::NODE_EL EMENT == pChild->nodeType)
        > > {
        > > MSXML::IXMLDOMN amedNodeMapPtr pAttrs = pChild->attributes;
        > > MSXML::IXMLDOMN odePtr pAttr;
        > >
        > > pAttr = pAttrs->getNamedItem(L "id");
        > > CString id = OLE2T(pAttr->text);
        > >
        > > MSXML::IXMLDOMN odePtr pWording = pChild->firstChild;
        > > CString wording = OLE2T(pWording->text);
        > >
        > > //add the wording to language
        > > pMessageLanguag e->m_wordingList. insert(MessageW ordingListPair( id,
        > > wording) );
        > >
        > > }
        > >
        > >[/color]
        >
        >[/color]


        Comment

        • Mihai N.

          #5
          Re: MSXML and UTF-8 chinese characters

          > After reading in the node in MSXML, can I use the macro OLE2T then[color=blue]
          > assign it to a CStirng ??
          >
          > What does CSTring store internally ?? I'm using VS.NET to compile my
          > projects.[/color]
          CString stores ANSI in an ANSI application and Unicode in a UNICODE app.
          If you app. is Unicode, there is no need to use

          But question marks are usualy the result of bad code page conversions.
          Are you sure there are no conversions happening
          (maybe in m_wordingList.i nsert, or in MessageWordingL istPair)?

          Mihai

          Comment

          Working...