XML to plain text

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Big D

    XML to plain text

    I have a simple xml file that contains, in part, content that is in HTML. I
    am encompassing that content in <![cdata[]]> tags. This works fine.

    However, my application needs to output the XML file (from a strongly typed
    dataset) to plain text. I am doing

    theText = myDataset.getXM L()

    Which works, but it doesn't "remember" the portions that were in cdata tags,
    so that content gets parsed, turning every html tag int &lt;b&gt;, etc...

    Is there a simple way to output that data without parsing it, and forcing
    certain nodes to use the cdata tag? The getXML function accepts no
    parameters.

    Thanks!

    MCD


  • Richard T. Edwards@pwpsquared.net

    #2
    Re: XML to plain text

    Since I believe this issomething which can't get fixed in five seconds try:
    theText =Replace(theTex t, "<", "&lt;")
    theText =Replace(theTex t, ">", "&gt;")

    "Big D" <a@a.com> wrote in message
    news:%23xn7TSz% 23DHA.2212@TK2M SFTNGP10.phx.gb l...[color=blue]
    > I have a simple xml file that contains, in part, content that is in HTML.[/color]
    I[color=blue]
    > am encompassing that content in <![cdata[]]> tags. This works fine.
    >
    > However, my application needs to output the XML file (from a strongly[/color]
    typed[color=blue]
    > dataset) to plain text. I am doing
    >
    > theText = myDataset.getXM L()
    >
    > Which works, but it doesn't "remember" the portions that were in cdata[/color]
    tags,[color=blue]
    > so that content gets parsed, turning every html tag int &lt;b&gt;, etc...
    >
    > Is there a simple way to output that data without parsing it, and forcing
    > certain nodes to use the cdata tag? The getXML function accepts no
    > parameters.
    >
    > Thanks!
    >
    > MCD
    >
    >[/color]


    Comment

    • Cor

      #3
      Re: XML to plain text

      Hi BigD,

      Did you mean this?

      \\\It start with making a sample dataset
      Dim ds As New DataSet
      Dim dt As New DataTable("para meters")
      For c As Integer = 1 To 10
      Dim dc As New DataColumn("ele m" & c.tostring)
      dt.Columns.Add( dc)
      Next
      For r As Integer = 1 To 10
      Dim dr As DataRow = dt.NewRow
      For c As Integer = 1 To 10
      dr("elem" & c.tostring) = _
      r.ToString & c.tostring ' or just dr(c) but to show you
      Next
      dt.Rows.Add(dr) ' can also before but I find this looking nicer
      Next
      ds.Tables.Add(d t)
      -- end building sample dataset
      Dim ser As XmlSerializer = New XmlSerializer(G etType(DataSet) )
      Dim ms As New IO.MemoryStream
      Dim sw As IO.TextWriter = New IO.StreamWriter (ms)
      ser.Serialize(s w, ds)
      Dim b As Long = ms.Length
      ms.Position = 0
      Dim sr As IO.TextReader = New IO.StreamReader (ms)
      Dim xmlstring As String = sr.ReadToEnd
      sw.Close()
      sr.Close()
      ms.Close()
      ///
      I hope this helps a little bit?

      Cor


      Comment

      • Jeffrey Tan[MSFT]

        #4
        RE: XML to plain text


        Hi Big D,

        I have reviewed your issue. I will spend some time to do some research on
        this issue.

        I will reply to you ASAP. Thanks for your understanding.

        Best regards,
        Jeffrey Tan
        Microsoft Online Partner Support
        Get Secure! - www.microsoft.com/security
        This posting is provided "as is" with no warranties and confers no rights.

        Comment

        • Cor

          #5
          Re: XML to plain text

          Hi Jeffrey.
          [color=blue]
          > I have reviewed your issue. I will spend some time to do some research on
          > this issue.
          >
          > I will reply to you ASAP. Thanks for your understanding.
          >[/color]

          While there are 2 answers that are unanswered if it fits.

          Is there a difference with the actions from MOPS between the persons who are
          asking questions to this newsgroup?

          If this is not an accident I think this is embarrassing.

          Cor


          Comment

          • Big D

            #6
            Re: XML to plain text

            Richard,

            Thanks for the reply.

            Yes, obviously I could do that. However, for one, I'm not confident that
            "<,>" are the only characters getting parsed out. Secondly, The problem is
            that even if I could do a find and replace on all the parsed characters,
            this information would not exist within the <![cdata[]]> tag, so it won't be
            a valid XML document. (the GetXML() funciton just places the parsed data in
            between the tags without "knowing" that previously it was in cdata)

            Is this a part of the schema that I need to adjust to notify it to expect
            these characters? If so, how?

            Thanks for the input!

            -MCD

            "Richard T. Edwards@pwpsqua red.net" <redwar@pwpsqua red.net> wrote in message
            news:uPwh1B0%23 DHA.2432@TK2MSF TNGP11.phx.gbl. ..[color=blue]
            > Since I believe this issomething which can't get fixed in five seconds[/color]
            try:[color=blue]
            > theText =Replace(theTex t, "<", "&lt;")
            > theText =Replace(theTex t, ">", "&gt;")
            >
            > "Big D" <a@a.com> wrote in message
            > news:%23xn7TSz% 23DHA.2212@TK2M SFTNGP10.phx.gb l...[color=green]
            > > I have a simple xml file that contains, in part, content that is in[/color][/color]
            HTML.[color=blue]
            > I[color=green]
            > > am encompassing that content in <![cdata[]]> tags. This works fine.
            > >
            > > However, my application needs to output the XML file (from a strongly[/color]
            > typed[color=green]
            > > dataset) to plain text. I am doing
            > >
            > > theText = myDataset.getXM L()
            > >
            > > Which works, but it doesn't "remember" the portions that were in cdata[/color]
            > tags,[color=green]
            > > so that content gets parsed, turning every html tag int &lt;b&gt;,[/color][/color]
            etc...[color=blue][color=green]
            > >
            > > Is there a simple way to output that data without parsing it, and[/color][/color]
            forcing[color=blue][color=green]
            > > certain nodes to use the cdata tag? The getXML function accepts no
            > > parameters.
            > >
            > > Thanks!
            > >
            > > MCD
            > >
            > >[/color]
            >
            >[/color]


            Comment

            • Big D

              #7
              Re: XML to plain text

              Hey Cor,

              Thanks for the reply. I haven't tried the code bit, but it doesn't seem
              like what I need. First off, It appears that you are programmaticall y
              building the dataset, not from the schema... that is a neccescity for my
              design. The cool part of how I have it working is that since it's a
              strongly typed dataset, it's super easy to work with, I don't have to know
              everything about the schema in order to operate on parts of it, and the
              GetXML() function is EXACTLY what I want to do, EXCEPT of course that it is
              parsing the "<" characters and such.

              To me is seems like a schema issue. Previously I have just manually entered
              the CDATA tag into fields where I knew that there would be HTML. It seems
              like VS should be able to know from a setting in the xsd that the element
              contains illegal characters. That way, when GetXML reads the schema to
              output the data in the dataset, it would know what to do.

              Maybe I'm dreaming.

              ;-)

              Thanks!

              MCD
              "Cor" <non@non.com> wrote in message
              news:OOHcjs3%23 DHA.2476@TK2MSF TNGP12.phx.gbl. ..[color=blue]
              > Hi BigD,
              >
              > Did you mean this?
              >
              > \\\It start with making a sample dataset
              > Dim ds As New DataSet
              > Dim dt As New DataTable("para meters")
              > For c As Integer = 1 To 10
              > Dim dc As New DataColumn("ele m" & c.tostring)
              > dt.Columns.Add( dc)
              > Next
              > For r As Integer = 1 To 10
              > Dim dr As DataRow = dt.NewRow
              > For c As Integer = 1 To 10
              > dr("elem" & c.tostring) = _
              > r.ToString & c.tostring ' or just dr(c) but to show you
              > Next
              > dt.Rows.Add(dr) ' can also before but I find this looking[/color]
              nicer[color=blue]
              > Next
              > ds.Tables.Add(d t)
              > -- end building sample dataset
              > Dim ser As XmlSerializer = New XmlSerializer(G etType(DataSet) )
              > Dim ms As New IO.MemoryStream
              > Dim sw As IO.TextWriter = New IO.StreamWriter (ms)
              > ser.Serialize(s w, ds)
              > Dim b As Long = ms.Length
              > ms.Position = 0
              > Dim sr As IO.TextReader = New IO.StreamReader (ms)
              > Dim xmlstring As String = sr.ReadToEnd
              > sw.Close()
              > sr.Close()
              > ms.Close()
              > ///
              > I hope this helps a little bit?
              >
              > Cor
              >
              >[/color]


              Comment

              • Jeffrey Tan[MSFT]

                #8
                Re: XML to plain text


                Hi Big D,

                Sorry for letting you wait for so long time.

                After consult to the product team, I know the cause of the problem.

                Actually, this behavior is by design.

                This is the way XML is supposed to be serialized to a string. The "<"
                character is not allowed to occur in text or attribute content because it
                marks the beginning of a markup, therefore we escape it as &lt;. We also
                escape ">" for compatibility reasons. If you look at the dataset content
                though, you should see "<" and ">" in the value unescaped.

                XML spec section 2.4:

                The ampersand character (&) and the left angle bracket (<) may appear in
                their literal form only when used as markup delimiters, or within a
                comment, a processing instruction, or a CDATA section. If they are needed
                elsewhere, they must be escaped using either numeric character references
                or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>)
                may be represented using the string "&gt;", and must, for compatibility, be
                escaped using "&gt;"

                So as a workaround, you may follow Richard's suggestion to parse the string
                yourself.

                Best regards,
                Jeffrey Tan
                Microsoft Online Partner Support
                Get Secure! - www.microsoft.com/security
                This posting is provided "as is" with no warranties and confers no rights.

                Comment

                • Cor

                  #9
                  Re: XML to plain text

                  Hi Jeffrey,

                  Now I get curious, can you tell me why I have that design behaviour not with
                  the sample I have send.

                  Cor

                  [color=blue]
                  >
                  > Hi Big D,
                  >
                  > Sorry for letting you wait for so long time.
                  >
                  > After consult to the product team, I know the cause of the problem.
                  >
                  > Actually, this behavior is by design.
                  >
                  > This is the way XML is supposed to be serialized to a string. The "<"
                  > character is not allowed to occur in text or attribute content because it
                  > marks the beginning of a markup, therefore we escape it as &lt;. We also
                  > escape ">" for compatibility reasons. If you look at the dataset content
                  > though, you should see "<" and ">" in the value unescaped.
                  >
                  > XML spec section 2.4:
                  >
                  > The ampersand character (&) and the left angle bracket (<) may appear in
                  > their literal form only when used as markup delimiters, or within a
                  > comment, a processing instruction, or a CDATA section. If they are needed
                  > elsewhere, they must be escaped using either numeric character references
                  > or the strings "&amp;" and "&lt;" respectively. The right angle bracket[/color]
                  (>)[color=blue]
                  > may be represented using the string "&gt;", and must, for compatibility,[/color]
                  be[color=blue]
                  > escaped using "&gt;"
                  >
                  > So as a workaround, you may follow Richard's suggestion to parse the[/color]
                  string[color=blue]
                  > yourself.
                  >
                  > Best regards,
                  > Jeffrey Tan
                  > Microsoft Online Partner Support
                  > Get Secure! - www.microsoft.com/security
                  > This posting is provided "as is" with no warranties and confers no rights.
                  >[/color]


                  Comment

                  • Cor

                    #10
                    Re: XML to plain text

                    Hi,

                    I saw overlooking my sample, that I forgot to tell that it needs an import
                    to
                    System.Xml.Seri alization
                    Or that you have to set it before the xmlseralizer.

                    Cor


                    Comment

                    • Jeffrey Tan[MSFT]

                      #11
                      Re: XML to plain text


                      Hi Cor,

                      Oh, sorry, I can not see any succees in your solution.

                      In your solution, you build a dataset yourself, which contains no CDATA
                      section, also, your self-produced dataset contains no "special"
                      character(such as "<" or ">").

                      I have tested your solution in the correct way in C#, but it also does not
                      work, like this:
                      private void button1_Click(o bject sender, System.EventArg s e)
                      {
                      DataSet ds=new DataSet();
                      ds.ReadXml(@"D: \newtest.xml");

                      XmlSerializer ser=new XmlSerializer(t ypeof(DataSet)) ;
                      MemoryStream ms=new MemoryStream();
                      TextWriter sw=new StreamWriter(ms );
                      ser.Serialize(s w, ds);

                      long b=ms.Length;
                      ms.Position=0;

                      TextReader sr=new StreamReader(ms );
                      string xmlstring =sr.ReadToEnd() ;
                      sw.Close();
                      sr.Close();
                      ms.Close();
                      }

                      Then, in debugger you will see that the CDATA section in my
                      "D:\newtest.xml " is also parsed(That is "<" becomes "&lt")


                      Best regards,
                      Jeffrey Tan
                      Microsoft Online Partner Support
                      Get Secure! - www.microsoft.com/security
                      This posting is provided "as is" with no warranties and confers no rights.

                      Comment

                      • Cor

                        #12
                        Re: XML to plain text

                        Hi Jeffrey,

                        Thank you for your message. It made my confusion totally clear.

                        I was going for the dataset alone to string, while the problem is a HTML
                        portion text saved as a string in a dataset.

                        Thinking it over than the answer for Big D is of course very simple.

                        To get the portions in the dataset, read it with dataset.readXML (path) and
                        then just write the items as needed with the streamreader to disk or just
                        use it.

                        (The answer can be "by design", but I think that the addition must than be
                        that it when it is written in this way by the ds.writexml it is readed in
                        the properiate size back with ds.readxml).

                        Just my thoughts

                        Cor


                        Comment

                        Working...