I'm on a Solaris 9 Japanese machine w/ an Ultra 5 Sparc CPU. I'm using
Xerces 2.6 DOM
I've got a document in UTF-8 format..
<?xml version="1.0" encoding="UTF-8"?>
<Name>ja_aler t-\343\201\250\34 3\201\241\343\2 01\244\343\201\ 252\343\201\256 \343\
201\253</Name>
(I'm not sure if the Japanese came out right here, but everything after
ja_alert- is UTF-8 for Japanese).
When I extract the text element I get an XMLCh* that claims to be 15 char's
long. However, when I get a char* from it, all the Japanese is truncated and
it comes out only 9 chars long.
char * value = XMLString::tran scode( pNode->getNodeValue () );
cout<<"original length is "<<strlen( value )<<endl;
cout<<"Its a text named
"<<XMLString::t ranscode(pNode->getNodeName( ))
<<" value "
<<XMLString::tr anscode(pNode->getNodeValue() )
<<" size is "<<XMLString::s tringLen( pNode->getNodeValue() )
<<endl;
I get back...
original length is 9
Its a text named #text value ja_alert- size is 15
(notice the Japanese is gone).
My locale looks like...
=> locale
LANG=ja
LC_CTYPE="ja"
LC_NUMERIC="ja"
LC_TIME="ja"
LC_COLLATE="ja"
LC_MONETARY="ja "
LC_MESSAGES="ja "
LC_ALL=
Do I need to something to tell the transcoder what encoding to transcode
to??
-Robert
Xerces 2.6 DOM
I've got a document in UTF-8 format..
<?xml version="1.0" encoding="UTF-8"?>
<Name>ja_aler t-\343\201\250\34 3\201\241\343\2 01\244\343\201\ 252\343\201\256 \343\
201\253</Name>
(I'm not sure if the Japanese came out right here, but everything after
ja_alert- is UTF-8 for Japanese).
When I extract the text element I get an XMLCh* that claims to be 15 char's
long. However, when I get a char* from it, all the Japanese is truncated and
it comes out only 9 chars long.
char * value = XMLString::tran scode( pNode->getNodeValue () );
cout<<"original length is "<<strlen( value )<<endl;
cout<<"Its a text named
"<<XMLString::t ranscode(pNode->getNodeName( ))
<<" value "
<<XMLString::tr anscode(pNode->getNodeValue() )
<<" size is "<<XMLString::s tringLen( pNode->getNodeValue() )
<<endl;
I get back...
original length is 9
Its a text named #text value ja_alert- size is 15
(notice the Japanese is gone).
My locale looks like...
=> locale
LANG=ja
LC_CTYPE="ja"
LC_NUMERIC="ja"
LC_TIME="ja"
LC_COLLATE="ja"
LC_MONETARY="ja "
LC_MESSAGES="ja "
LC_ALL=
Do I need to something to tell the transcoder what encoding to transcode
to??
-Robert
Comment