Does XmlTextWriter encode HTML?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • darrel

    Does XmlTextWriter encode HTML?

    I'm trying to get ASP.net to write out some XML including HTML from a DB:

    The HTML is stored in the DB as encoded HTML. I'm trying to decode it and
    write it to an XML node (The HTML is valid XML). I have this:

    objXMLWriter.Wr iteElementStrin g("text",
    Trim(System.Web .HttpContext.Cu rrent.Server.Ht mlDecode(DS.Tab les(0).Rows(row Count)("itemTex t").ToString )))

    But in the XML file, it's still coming out encoded. It appears that the
    XmlTextWriter encodes HTML content by default, as I can write out
    System.Web.Http Context.Current .Server.HtmlDec ode(DS.Tables(0 ).Rows(rowCount )("itemText").T oString
    to screen and it comes out fine (not encoded). Anyway to turn this off?

    _Darrel

    --
    =============== =============== =============== =============== =====
    Win prizes searching google:



  • Oleg Tkachenko [MVP]

    #2
    Re: Does XmlTextWriter encode HTML?

    darrel wrote:
    I'm trying to get ASP.net to write out some XML including HTML from a DB:
    >
    The HTML is stored in the DB as encoded HTML. I'm trying to decode it and
    write it to an XML node (The HTML is valid XML). I have this:
    >
    objXMLWriter.Wr iteElementStrin g("text",
    Trim(System.Web .HttpContext.Cu rrent.Server.Ht mlDecode(DS.Tab les(0).Rows(row Count)("itemTex t").ToString )))
    >
    But in the XML file, it's still coming out encoded. It appears that the
    XmlTextWriter encodes HTML content by default, as I can write out
    System.Web.Http Context.Current .Server.HtmlDec ode(DS.Tables(0 ).Rows(rowCount )("itemText").T oString
    to screen and it comes out fine (not encoded). Anyway to turn this off?
    That's because you are using WriteElementStr ing() method. It writes text
    to XML and in XML text characters & and < must be encoded.
    Use WriteRaw() instead to write your HTML as is. But then you need to
    make sure that your HTML is actually well-formed (XHTML) otherwise
    you'll end up with malformed XML document.

    --
    Oleg Tkachenko [XML MVP, MCPD]
    http://blog.tkachenko.com | http://www.XmlLab.Net | http://www.XLinq.Net

    Comment

    • darrel

      #3
      Re: Does XmlTextWriter encode HTML?

      That's because you are using WriteElementStr ing() method. It writes text
      to XML and in XML text characters & and < must be encoded.
      This seems to contradict what I'v ebeen told previously. I was under the
      impression that XHTML is just fine as-is in an XML file.
      Use WriteRaw() instead to write your HTML as is. But then you need to make
      sure that your HTML is actually well-formed (XHTML) otherwise you'll end
      up with malformed XML document.
      Ah. Gotcha. Thanks!

      -Darrel


      Comment

      • Oleg Tkachenko [MVP]

        #4
        Re: Does XmlTextWriter encode HTML?

        darrel wrote:
        >That's because you are using WriteElementStr ing() method. It writes text
        >to XML and in XML text characters & and < must be encoded.
        >
        This seems to contradict what I'v ebeen told previously. I was under the
        impression that XHTML is just fine as-is in an XML file.
        XHTML is ok. XHTML is XML after all. But text in XML can't contain & and
        < characters. So whenever you write text into XML using
        WriteElementStr ing(), XmlWriter takes care and encodes & and <.

        --
        Oleg Tkachenko [XML MVP, MCPD]
        http://blog.tkachenko.com | http://www.XmlLab.Net | http://www.XLinq.Net

        Comment

        • darrel

          #5
          Re: Does XmlTextWriter encode HTML?

          XHTML is ok. XHTML is XML after all. But text in XML can't contain & and <
          characters. So whenever you write text into XML using
          WriteElementStr ing(), XmlWriter takes care and encodes & and <.
          So WriteElementStr ing treats any HTML as just that...HTML...a nd doesn't care
          that it might be valid XHTML?

          -Darrel


          Comment

          • darrel

            #6
            Re: Does XmlTextWriter encode HTML?

            So WriteElementStr ing treats any HTML as just that...HTML...a nd doesn't
            care that it might be valid XHTML?
            Well, if I use writeRaw() for that one elememt, I get plain XHTML. Alas, it
            completely destroys the entire XML documents indent formatting, making it
            kind of messy to read by hand. Not a huge deal, just another annyance I
            guess.

            -Darrel


            Comment

            • Oleg Tkachenko [MVP]

              #7
              Re: Does XmlTextWriter encode HTML?

              darrel wrote:
              >XHTML is ok. XHTML is XML after all. But text in XML can't contain & and <
              >characters. So whenever you write text into XML using
              >WriteElementSt ring(), XmlWriter takes care and encodes & and <.
              >
              So WriteElementStr ing treats any HTML as just that...HTML...a nd doesn't care
              that it might be valid XHTML?
              WriteElementStr ing() method knows nothing about HTML or XHTML. It
              accepts element name and string value and writes out element with that
              string as text value. That's its responsibility to encode string value
              properly as per XML encoding rules.

              --
              Oleg Tkachenko [XML MVP, MCPD]
              http://blog.tkachenko.com | http://www.XmlLab.Net | http://www.XLinq.Net

              Comment

              • Oleg Tkachenko [MVP]

                #8
                Re: Does XmlTextWriter encode HTML?

                darrel wrote:
                >So WriteElementStr ing treats any HTML as just that...HTML...a nd doesn't
                >care that it might be valid XHTML?
                >
                Well, if I use writeRaw() for that one elememt, I get plain XHTML. Alas, it
                completely destroys the entire XML documents indent formatting, making it
                kind of messy to read by hand. Not a huge deal, just another annyance I
                guess.
                The proper way would be to create XmlReader over your XHTML snippet and
                call xmlWriter.Write Node(xmlReader, false).


                --
                Oleg Tkachenko [XML MVP, MCPD]
                http://blog.tkachenko.com | http://www.XmlLab.Net | http://www.XLinq.Net

                Comment

                • Martin Honnen

                  #9
                  Re: Does XmlTextWriter encode HTML?

                  darrel wrote:
                  Well, if I use writeRaw() for that one elememt, I get plain XHTML. Alas, it
                  completely destroys the entire XML documents indent formatting, making it
                  kind of messy to read by hand. Not a huge deal, just another annyance I
                  guess.
                  Have you tried to use ReadNode to feed your string of XHTML through an
                  XmlReader? That way you might be able to insert the string of XHTML into
                  the document your XmlWriter creates maintaining the current indentation
                  of the complete document the XmlWriter writes e.g.

                  XmlWriterSettin gs writerSettings = new XmlWriterSettin gs();
                  writerSettings. Indent = true;
                  using (XmlWriter xmlWriter = XmlWriter.Creat e(Console.Out,
                  writerSettings) ) {
                  xmlWriter.Write StartDocument() ;
                  xmlWriter.Write StartElement("r oot");
                  xmlWriter.Write StartElement("c hild");
                  XmlReaderSettin gs readerSettings = new XmlReaderSettin gs();
                  readerSettings. IgnoreWhitespac e = true;
                  xmlWriter.Write Node(XmlReader. Create(new StringReader(@" <html
                  xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"" >
                  <head>
                  <title>Exampl e</title>
                  </head>
                  <body>
                  <p>Kibology for all.</p>
                  </body>
                  </html>"), readerSettings) , true);
                  xmlWriter.Write EndDocument();
                  }

                  outputs

                  <?xml version="1.0" encoding="utf-8"?>
                  <root>
                  <child>
                  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
                  <head>
                  <title>Exampl e</title>
                  </head>
                  <body>
                  <p>Kibology for all.</p>
                  </body>
                  </html>
                  </child>
                  </root>

                  where the XHTML inserted is indented according to the surrounding elements.




                  --

                  Martin Honnen --- MVP XML

                  Comment

                  • darrel

                    #10
                    Re: Does XmlTextWriter encode HTML?

                    Have you tried to use ReadNode to feed your string of XHTML through an
                    XmlReader?
                    Oleg/Martin:

                    Thanks! I haven't tried that.

                    How would I take a ds field and read that into an XML reader?

                    I need to grab the XHTML from the database:

                    System.Web.Http Context.Current .Server.HtmlDec ode(DS.Tables(0 ).Rows(rowCount )("itemText").T oString

                    I can't seem to just read that in with an xmlreader.

                    -Darrel


                    Comment

                    Working...