Detecting CDATA sections with XSLT

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Dave Matthews

    Detecting CDATA sections with XSLT

    Hi folks,

    I'm writing a web-page editing tool for my company which will allow
    staff (with no "technical" expertise) to maintain their own Intranet sites.
    The content for each webpage is stored in the form of XHTML in an XML
    document (which, in turn, is stored in an XML database). So far so good.
    However the editing tool must allow users to paste in the contents of MS
    Word documents. I soon discovered that Word does not generate
    properly-formed HTML, the main problem being that tags that should be nested
    are often "overlapped " (as my example below shows). My solution is to store
    this "bad" data as CDATA sections, thereby preventing the finished XML
    document from being invalidated. My finished XML document looks something
    like this:


    <page id="0001">
    <content>
    <p>
    <i>
    <font face="Arial">Pr operly-formed HTML</font>
    </i>
    </p>
    <![CDATA[<p><i><font face="Arial">Th e 'i' and 'font' end-tags are
    wrong and there is no end-tag for 'p'</i></font>]]>
    <p>
    <i>
    <font face="Arial">Th is is OK.</font>
    </i>
    </p>
    </content>
    </page>


    On retrieving a document for formatting and display within the client
    browser, my XSL template for the <content> nodes needs to be able to detect
    whether each of its children can be regarded as proper XML (and, therefore,
    to transform the it into HTML) or a CDATA section whose contents will simply
    be passed straight to the browser. So my template needs to look something
    like this:


    <xsl:template match="content" >
    <xsl:for-each select="*">
    <xsl:choose>
    <xsl:when test="nodetype( .)=cdata()">
    <xsl:value-of select=".">
    </xsl:when>
    <xsl:otherwis e>
    <xsl:apply-templates/>
    </xsl:otherwise>
    </xsl:choose>
    </xsl:for-each>
    </xsl:template>


    Of course it's the fourth line of code - <xsl:when
    test="nodetype( .)=cdata()"> - that is giving me problems. Unfortunately I am
    stuck with a fairly basic XSLT engine that has none of the fancy additional
    functions MSXML, SAXON or Xalan offer. Try as I might, I can't find a way of
    getting XSLT to tell when it's dealing with a CDATA section.

    (I could simply hold everything as CDATA but in the future I am going to
    have to interface with other systems that will demand as much content a
    possible be presented as proper XML/XHTML.)

    Any ideas would be very much appreciated!

    --

    Many thanks in advance!


    Dave Matthews

    'New Avengers' and 'Professionals' sites at:
    The authorised home of Avengers Mark 1 Productions, producers of The New Avengers and The Professionals TV series. Brian Clemens and Laurie Johnson



  • Rolf Magnus

    #2
    Re: Detecting CDATA sections with XSLT

    Dave Matthews wrote:
    [color=blue]
    > On retrieving a document for formatting and display within the
    > client
    > browser, my XSL template for the <content> nodes needs to be able to
    > detect whether each of its children can be regarded as proper XML
    > (and, therefore, to transform the it into HTML) or a CDATA section
    > whose contents will simply be passed straight to the browser.[/color]

    xslt doesn't see CDATA secions. They're converted to text nodes before
    xslt even sees them. But couldn't you just wrap an element around it?

    <page id="0001">
        <content>
            <p>
                <i>
                    <font face="Arial">Pr operly-formed HTML</font>
                </i>
            </p>
    <non-wellformed>
            <![CDATA[<p><i><font face="Arial">Th e 'i' and 'font' end-tags are
    wrong and there is no end-tag for 'p'</i></font>]]>
    </non-wellformed>
            <p>
                <i>
                    <font face="Arial">Th is is OK.</font>
                </i>
            </p>
        </content>
    </page>

    Comment

    • Dave Matthews

      #3
      Re: Detecting CDATA sections with XSLT

      Thanks for your help, guys. You've confirmed what I was rapidly coming to
      suspect!

      Rolf - your idea of using a "wrapping" element seems an ideal way around
      the problem - many thanks!

      --
      Cheers,

      Dave Matthews

      'New Avengers' and 'Professionals' sites at:
      The authorised home of Avengers Mark 1 Productions, producers of The New Avengers and The Professionals TV series. Brian Clemens and Laurie Johnson



      Comment

      Working...