MS XML Parser error in CData section

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • !NoItAll
    Contributor
    • May 2006
    • 297

    MS XML Parser error in CData section

    The MSXML parser is choking on a single character that often appears in my data within a CDATA section.

    Try this:

    Code:
    Dim bRet As Boolean
    Dim lRet as long 
    Dim xmlDoc As MSXML2.DOMDocument
    Set xmlDoc = New MSXML2.DOMDocument
    
    bRet = xmlDoc.Load("d:\test.xml")
    
    lRet = xmlDoc.parseError.filepos   'returns the position of the A9 (copyright symbol)
    The file I'm loading looks like this:

    <NRCS_2NEWARC RECORDNUMBER= "1844">
    <TreeStructur e Data="19930121"/>
    <![CDATA[NEWSS © 1993 All Rights Reserved ]]>
    </NRCS_2NEWARC>

    Which I saved as d:\test.xml. See the A9 (copyright symbol) inside the CData section - MSXML chokes on it every time - If I remove the copyright symbol everything works as expected. Why? I thought a CDATA section was supposed to be passed intact! The only thing you're not supposed to put into a CDATA section is ]]> which terminates it.
    This is frustrating!
    I've tried it with MSXML 4, 5, and 6.
  • !NoItAll
    Contributor
    • May 2006
    • 297

    #2
    ok - I see the problem. That character, standing alone, makes for improper UTF8, which XML is expecting. The correct thing to do is make sure I convert the data to UTF8 first - but that seems stupid to me. Again - I thought CDATA was supposed to go completely uninterpreted so you could put any old garbage in there. Apparently not - it has to be proper garbage....

    Comment

    Working...