MSXML leaves out encoding when using .NET

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jeroen

    MSXML leaves out encoding when using .NET

    We're using MSXML to transform the XML document we have to an XHTML
    file using an XSLT. Now the problem is that the dotnet implementation
    we made does something subtly different from the commandline call to
    MSXML. The problem is that the dotnet variant leaves out a piece of
    info on the charset, leading to the browser going to a default encoding
    instead of the wanted UTF-8.

    MSXML2.DOMDocum ent40Class stylesheet = new
    MSXML2.DOMDocum ent40Class();
    stylesheet.asyn c = false;
    source.validate OnParse = false;
    stylesheet.load (xsls[i]);
    string s = source.transfor mNode(styleshee t);
    System.IO.TextW riter file = System.IO.File. CreateText("pat h.html");
    file.Write(s);


    Note that the xslt has a line:
    <xsl:output method="html" indent="yes" encoding="UTF-8" />

    This code creates a meta tag different from the commandline version:
    <META http-equiv="Content-Type" content="text/html">

    Whereas the commandline version of MSXML nicely outputs.
    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">

    Anyone have a clue how to do this? Do I need a
    CreateProcessin gInstruction for the stylesheet?

  • sloan

    #2
    Re: MSXML leaves out encoding when using .NET


    I think you have another option in DotNet... rather than MSXML2 library.

    Here is some code I found as a starter:


    public class XMLtoXSLTransfo rmWrapper
    {

    string debugMsg=null;

    public void DoTranslation(s tring xmlFile, string xslFile, string
    outputFile)
    {

    try
    {

    //Create a new XslTransform object.
    XslTransform xslt = new XslTransform();

    //Load the stylesheet.
    xslt.Load(xslFi le);

    //Create a new XPathDocument and load the XML data to be transformed.
    XPathDocument mydata = new XPathDocument(x mlFile);

    //Create an XmlTextWriter which outputs to the console.
    //XmlWriter writer = new XmlTextWriter(C onsole.Out);

    //Transform the data and send the output to the console.
    //xslt.Transform( mydata,null,wri ter, null);
    xslt.Transform (xmlFile, outputFile);
    }
    catch (Exception ex)
    {
    debugMsg = ex.Message;
    Console.WriteLi ne (debugMsg);

    }


    }



    public XMLtoXSLTransfo rmWrapper()
    {
    //
    // TODO: Add constructor logic here
    //
    }
    }


    "Jeroen" <mercuros@gmail .comwrote in message
    news:1163088668 .052685.257170@ i42g2000cwa.goo glegroups.com.. .
    We're using MSXML to transform the XML document we have to an XHTML
    file using an XSLT. Now the problem is that the dotnet implementation
    we made does something subtly different from the commandline call to
    MSXML. The problem is that the dotnet variant leaves out a piece of
    info on the charset, leading to the browser going to a default encoding
    instead of the wanted UTF-8.
    >
    MSXML2.DOMDocum ent40Class stylesheet = new
    MSXML2.DOMDocum ent40Class();
    stylesheet.asyn c = false;
    source.validate OnParse = false;
    stylesheet.load (xsls[i]);
    string s = source.transfor mNode(styleshee t);
    System.IO.TextW riter file = System.IO.File. CreateText("pat h.html");
    file.Write(s);
    >
    >
    Note that the xslt has a line:
    <xsl:output method="html" indent="yes" encoding="UTF-8" />
    >
    This code creates a meta tag different from the commandline version:
    <META http-equiv="Content-Type" content="text/html">
    >
    Whereas the commandline version of MSXML nicely outputs.
    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
    >
    Anyone have a clue how to do this? Do I need a
    CreateProcessin gInstruction for the stylesheet?
    >

    Comment

    • Martin Honnen

      #3
      Re: MSXML leaves out encoding when using .NET

      Jeroen wrote:
      We're using MSXML to transform the XML document we have to an XHTML
      file using an XSLT.
      Why do you use MSXML with a managed .NET application? With .NET 1.x you
      should use System.Xml.Xsl. Xsl.Transform, with .NET 2.0 you should use
      System.Xml.Xsl. Xsl.CompiledTra nsform for XSLT transformations .
      string s = source.transfor mNode(styleshee t);
      You get a string result with transformNode,
      Note that the xslt has a line:
      <xsl:output method="html" indent="yes" encoding="UTF-8" />
      >
      This code creates a meta tag different from the commandline version:
      <META http-equiv="Content-Type" content="text/html">
      and a string is simply a sequence of Unicode characters that does not
      have an encoding. Encoding matters on the byte level, with a COM
      application using MSXML you could use transformNodeTo Object and
      transform to a stream, that way MSXML writes out a charset parameter as
      needed. But with .NET you should not use MSXML at all, I doubt its
      transformNodeTo Object will work with a .NET stream implementation. You
      can simply run the transformation with XslTransform or
      XslCompiledTran sform where the Transform method has various overloads
      directly writing to a file or stream.




      --

      Martin Honnen --- MVP XML

      Comment

      • Jeroen

        #4
        Re: MSXML leaves out encoding when using .NET

        Why do you use MSXML with a managed .NET application? With .NET 1.x you
        should use System.Xml.Xsl. Xsl.Transform, with .NET 2.0 ...
        (We do .net 1.x) Unfortunately, we had serious performance issues with
        the dotnet xslt processing libraries. When we encountered those
        problems we found through some searching that we could use MSXML
        instead. It has been working fine and fast, only the encoding problem
        remains.

        The weird thing is that MSXML called from the commandline to parse the
        xslt does something different (>better) than when called with the code
        posted above. The commandline call looks like this:

        msxsl.exe data.xml stylesheet.xslt

        Comment

        • Marc Gravell

          #5
          Re: MSXML leaves out encoding when using .NET

          Yeah; the compiled transforms in 2.0 are quite a bit better...

          Without more info, I wouldn't presume to say for sure... but in a number of
          cases I *have* seen, the reported performance problems between 1.1, 2.0 and
          MSXML were actually more a case of a band-aid - meaning that the xslt itself
          simply wasn't written very well, and the different implementations just
          highlighted / exacerbated the problem - reworking some of the xslt to
          included e.g. Munchean grouping can make a huge difference.

          Marc


          Comment

          • Jeroen

            #6
            Re: MSXML leaves out encoding when using .NET [processinginstr uctions?]

            Thanks Marc, that gives hope and more incentive to switch to
            studio2005/dotnet2.

            As a followup on the original problem; I have been trying some new ways
            to get msxml to include the charset option in one way or the other. My
            latest attempt was to add this line of code...

            stylesheet.crea teProcessingIns truction("xml", "version=\"1.0\ "
            encoding=\"UTF-8\"");

            ....which did not solve my problem but still seems the way to look. So
            here's a new (rather noob) subquestion, which might help me in my
            current quest:

            *Does anyone know of a good overview for these processinginstr uctions??*

            Comment

            Working...