Xerces and .NET System.Xml

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?ISO-8859-1?Q?Ernesto_Basc=F3n_Pantoja?=

    Xerces and .NET System.Xml

    Hi everybody:

    I do not know if this is the correct list to a very specific
    implementation problem, but if you can help me, it would be great! :)

    I have one application that builds a Xml that contains some strange
    characters:

    std::string str = "Code = ";
    str += '♦'; //strange character ASCII 4

    and I serialize the Xml using Xerces and Xerces writes something like
    (no matter the encoding I am using; I tried iso-8859-1; utf-8; utf-16,
    etc.)

    <XmlTest>
    Code = ♦
    </XmlTest>


    But when I want to load this Xml using the Microsoft .NET
    System.Xml.XmlD ocument, I get an:

    "Invalid character found" exception and the XML cannot be loaded.

    What is wrong here? If I try to serialize the same String using the MS
    implementation, I get a:

    <XmlTest>
    Code = &x4;
    </XmlTest>



  • Martin Honnen

    #2
    Re: Xerces and .NET System.Xml

    Ernesto Bascón Pantoja wrote:
    I have one application that builds a Xml that contains some strange
    characters:
    >
    std::string str = "Code = ";
    str += '♦'; //strange character ASCII 4
    ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
    reference in XML 1.1 I think.
    Why do you want to put such characters into your XML documents?

    What is wrong here? If I try to serialize the same String using the MS
    implementation, I get a:
    >
    <XmlTest>
    Code = &x4;
    </XmlTest>
    ..NET is not necessarily complying with the XML 1.0 specification, it
    allows you to serialize such characters as numeric character references.
    You can turn that off by using an XmlWriter with XmlWriterSettin gs where
    CheckCharacters is set to true.


    --

    Martin Honnen

    Comment

    • =?ISO-8859-1?Q?Ernesto_Basc=F3n_Pantoja?=

      #3
      Re: Xerces and .NET System.Xml

      On Apr 22, 1:29 pm, Martin Honnen <mahotr...@yaho o.dewrote:
      Ernesto Bascón Pantoja wrote:
      I have one application that builds a Xml that contains some strange
      characters:
      >
      std::string str = "Code = ";
      str += '♦'; //strange character ASCII 4
      >
      ASCII 4 is not allowed in XML 1.0 and only allowed as numeric character
      reference in XML 1.1 I think.
      Why do you want to put such characters into your XML documents?
      I am getting clear text from a database and I serialize it into a XML
      to allow a .NET client to receive such information;
      the problem occurs when the "clear text" comes with those characters
      or with international characters. Xerces performs the serialization
      but does not transform the '♦' or the 'ß' in 'Straße' and serializes
      them as they come.

      I do not know if written directy those characters with utf-8 encoding
      is valid.

      Comment

      • Joseph J. Kesselman

        #4
        Re: Xerces and .NET System.Xml

        ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
        is or how you try to escape it. I'd suggest introducing something like
        <mychar codepoint="4"/and having your application code convert this
        appropriately. Or do a base-64 encoding on your block of binary code and
        have the application convert that appropriately.

        XML 1.1 relaxes restrictions on characters somewhat. I'm not sure
        offhand whether it would let you get away with this one or not. But
        support for 1.1 is, alas, still extremely rare; you may have to beat up
        your XML library suppliers to get it, and having gotten it you may have
        trouble interchanging those files with other applications or users that
        haven't yet upgraded.

        Comment

        • =?ISO-8859-1?Q?Ernesto_Basc=F3n_Pantoja?=

          #5
          Re: Xerces and .NET System.Xml

          On Apr 22, 1:54 pm, "Joseph J. Kesselman" <keshlam-nos...@comcast. net>
          wrote:
          ASCII 4 is not a legal XML 1.0 character, no matter what your encoding
          is or how you try to escape it. I'd suggest introducing something like
          <mychar codepoint="4"/and having your application code convert this
          appropriately. Or do a base-64 encoding on your block of binary code and
          have the application convert that appropriately.
          So, how can I say to Xerces: "given this string, transcode the special
          characters to their Unicode escape sequence (i.e. &4;)

          XML 1.1 relaxes restrictions on characters somewhat. I'm not sure
          offhand whether it would let you get away with this one or not. But
          support for 1.1 is, alas, still extremely rare; you may have to beat up
          your XML library suppliers to get it, and having gotten it you may have
          trouble interchanging those files with other applications or users that
          haven't yet upgraded.

          Comment

          • Joseph J. Kesselman

            #6
            Re: Xerces and .NET System.Xml

            Ernesto Bascón Pantoja wrote:
            So, how can I say to Xerces: "given this string, transcode the special
            characters to their Unicode escape sequence (i.e. &4;)
            Xerces deals with XML. The 0x04 character is not XML. (Or at least not
            XML 1.0), so it isn't Xerces' responsibility to deal with it.

            If you must represent this character in data that's expressed as XML,
            it's your application's responsibility to use some alternate escaping
            solution (such as the element I suggested, or base-64 encoding, or
            whatever).

            If you really want this character to appear as itself in the file...
            that isn't an XML file and you can't expect XML tools to either accept
            it or generate it.



            Take a long step back from this detail and look at the the actual
            problem you're trying to solve. You haven't told us that, so we can't
            say more than that the specific solution you've proposed here doesn't work.

            Comment

            Working...