XSL and entities

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Tjerk Wolterink

    XSL and entities

    I've a problem in an xsl transformation.
    My xml input:

    --- input.xml ---

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <!DOCTYPE xc:content [
    <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    %xhtml;
    ]>
    <xc:xcontent xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="news">
    <xc:text type="html">
    leuk he jazeker ãôé<br/>
    </xc:text>
    </xc:xcontent>

    ----

    And an xsl file:

    -- style.xsl ---

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:styleshe et version="1.0"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
    xmlns:page="htt p://www.wolterinkwe bdesign.com/xml/page"
    xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent">

    <xsl:output method="xml" indent="yes"/>

    <!--
    ! All html should remain html
    !-->
    <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
    <xsl:copy>
    <xsl:for-each select="@*">
    <xsl:copy/>
    </xsl:for-each>
    <xsl:apply-templates select="./node()"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="/xc:xcontent">
    <page:page type="module">
    <p>
    <xsl:apply-templates select="xc:text "/>
    </p>
    </page:page>
    </xsl:template>

    </xsl:stylesheet>

    ---



    The output here is:

    ---
    <page:page type="module">
    <p>
    leuk he jazeker<br/>
    </p>
    </page:page>

    ---


    But i expect this as output


    ---
    <page:page type="module">
    <p>
    leuk he jazeker ãôé<br/>
    </p>
    </page:page>
    ---


    How can that be, why are the characters: ãôé gone??
    Is there something wrong with my encoding?
    Note: i do'nt know if the files are really encoded in ISO-8859-1, but it did work for me.
    My editor says the encoding is ISO-8859-1 so i think that is good.. Or did the editor get that
    information from the xml prolog?
  • Tjerk Wolterink

    #2
    Re: XSL and entities

    > cut

    Well my topic-subject is not really a good choice. there are not entities involved.

    Comment

    • David Carlisle

      #3
      Re: XSL and entities


      are they really gone or are you just looking at the file in some program
      that doesn't understand the encoding, they appeared to be gone inyour
      posted output but that does'nt match what xslt should have done.
      That output is also missing a namespace declaration for xhtml, is it
      really the output you got from XSLT?

      If you want iso-8859-1 output add
      <xsl:output encoding="iso-8859-1"/>
      to your stylesheet.

      Incidentally despite the fact that you have refered to entities in the
      subject line there are no entity references in your input (except the
      parameter entity reference %xhtml) if you enter all your characters
      directlly as character data there's no need to reference the xhtml dtd
      (which might have a very noticable effect on parsing speed, especially
      if you really are fetching the dtd off eth w3c site each time)

      David

      Comment

      • Martin Honnen

        #4
        Re: XSL and entities



        Tjerk Wolterink wrote:
        [color=blue]
        > I've a problem in an xsl transformation.
        > My xml input:
        >
        > --- input.xml ---
        >
        > <?xml version="1.0" encoding="ISO-8859-1"?>
        > <!DOCTYPE xc:content [[/color]
        ^^^^^^^^^^
        [color=blue]
        > <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
        > %xhtml;
        > ]>
        > <xc:xcontent xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent"[/color]

        If the DOCTYPE declaration says the root element is xc:content then you
        should have that but you have xc:xcontent so one needs to be changed.
        [color=blue]
        > xmlns="http://www.w3.org/1999/xhtml" module="news">
        > <xc:text type="html">
        > leuk he jazeker ãôé<br/>
        > </xc:text>
        > </xc:xcontent>
        >
        > ----
        >
        > And an xsl file:
        >
        > -- style.xsl ---
        >
        > <?xml version="1.0" encoding="ISO-8859-1"?>
        > <xsl:styleshe et version="1.0"
        > xmlns="http://www.w3.org/1999/xhtml"
        > xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
        > xmlns:page="htt p://www.wolterinkwe bdesign.com/xml/page"
        > xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent">
        >
        > <xsl:output method="xml" indent="yes"/>[/color]

        What output encoding do you want?
        [color=blue]
        > <!--
        > ! All html should remain html
        > !-->
        > <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
        > <xsl:copy>
        > <xsl:for-each select="@*">
        > <xsl:copy/>
        > </xsl:for-each>
        > <xsl:apply-templates select="./node()"/>
        > </xsl:copy>
        > </xsl:template>[/color]

        Could be done easier and more efficient:

        <xsl:template match="xhtml:*" >
        <xsl:copy>
        <xsl:copy-of select="@* " />
        <xsl:apply-templates select="node()"/>
        </xsl:copy>
        </xsl:template>

        where the prefix xhtml is bound to the namespace URI for XHTML earlier
        in the document.

        [color=blue]
        > The output here is:
        >
        > ---
        > <page:page type="module">
        > <p>
        > leuk he jazeker<br/>
        > </p>
        > </page:page>
        >
        > ---
        >
        >
        > But i expect this as output
        >
        >
        > ---
        > <page:page type="module">
        > <p>
        > leuk he jazeker ãôé<br/>
        > </p>
        > </page:page>
        > ---[/color]

        What XSLT processor are you using, how exactly do you run the
        transformation?

        --

        Martin Honnen

        Comment

        • Tjerk Wolterink

          #5
          Re: XSL and entities

          Martin Honnen wrote:[color=blue]
          >
          >
          > Tjerk Wolterink wrote:
          >[color=green]
          >> I've a problem in an xsl transformation.
          >> My xml input:
          >>
          >> --- input.xml ---
          >>
          >> <?xml version="1.0" encoding="ISO-8859-1"?>
          >> <!DOCTYPE xc:content [[/color]
          >
          > ^^^^^^^^^^
          >[color=green]
          >> <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          >> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
          >> %xhtml;
          >> ]>
          >> <xc:xcontent xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent"[/color]
          >
          >
          > If the DOCTYPE declaration says the root element is xc:content then you
          > should have that but you have xc:xcontent so one needs to be changed.
          >[/color]

          Your right, typing error. The xml-reader does not complain, therefore i did not notice this error.
          [color=blue][color=green]
          >> xmlns="http://www.w3.org/1999/xhtml" module="news">
          >> <xc:text type="html">
          >> leuk he jazeker ãôé<br/>
          >> </xc:text>
          >> </xc:xcontent>
          >>
          >> ----
          >>
          >> And an xsl file:
          >>
          >> -- style.xsl ---
          >>
          >> <?xml version="1.0" encoding="ISO-8859-1"?>
          >> <xsl:styleshe et version="1.0"
          >> xmlns="http://www.w3.org/1999/xhtml"
          >> xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
          >> xmlns:page="htt p://www.wolterinkwe bdesign.com/xml/page"
          >> xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent">
          >>
          >> <xsl:output method="xml" indent="yes"/>[/color]
          >
          >
          > What output encoding do you want?
          >[color=green]
          >> <!--
          >> ! All html should remain html
          >> !-->
          >> <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
          >> <xsl:copy>
          >> <xsl:for-each select="@*">
          >> <xsl:copy/>
          >> </xsl:for-each>
          >> <xsl:apply-templates select="./node()"/>
          >> </xsl:copy>
          >> </xsl:template>[/color]
          >
          >
          > Could be done easier and more efficient:
          >
          > <xsl:template match="xhtml:*" >
          > <xsl:copy>
          > <xsl:copy-of select="@* " />
          > <xsl:apply-templates select="node()"/>
          > </xsl:copy>
          > </xsl:template>
          >
          > where the prefix xhtml is bound to the namespace URI for XHTML earlier
          > in the document.
          >[/color]

          that is a solution, but they both work.
          [color=blue]
          >[color=green]
          >> The output here is:
          >>
          >> ---
          >> <page:page type="module">
          >> <p>
          >> leuk he jazeker<br/>
          >> </p>
          >> </page:page>
          >>
          >> ---
          >>
          >>
          >> But i expect this as output
          >>
          >>
          >> ---
          >> <page:page type="module">
          >> <p>
          >> leuk he jazeker ãôé<br/>
          >> </p>
          >> </page:page>
          >> ---[/color]
          >
          >
          > What XSLT processor are you using, how exactly do you run the
          > transformation?
          >[/color]

          I'm using sablatron for xsl transformations .
          But i think the problem is more complex than i thought.

          Comment

          • Tjerk Wolterink

            #6
            Re: XSL and entities

            > [cut]

            Well,

            The example i gave you was a bad one.
            The problem i have do not occur in my examples.

            Here an example where the problem does occur:

            I have an xml document:
            ---
            <?xml version="1.0" encoding="ISO-8859-1"?>

            <!DOCTYPE xc:xcontent [
            <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
            "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
            %xhtml;
            ]>
            <xc:xcontent xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="geschie denis">
            <xc:text type="html">
            <p>Caf&eacute ; de Kletskop is gevestigd in een oud lichtenvoords pander,
            </p> </xc:text>
            </xc:xcontent>
            ---


            And when i put this together with this xsl document:


            -- style.xsl ---

            <?xml version="1.0" encoding="ISO-8859-1"?>
            <xsl:styleshe et version="1.0"
            xmlns="http://www.w3.org/1999/xhtml"
            xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
            xmlns:page="htt p://www.wolterinkwe bdesign.com/xml/page"
            xmlns:xc="http://www.wolterinkwe bdesign.com/xml/xcontent">

            <xsl:output method="xml" indent="yes"/>

            <!--
            ! All html should remain html
            !-->
            <xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
            <xsl:copy>
            <xsl:for-each select="@*">
            <xsl:copy/>
            </xsl:for-each>
            <xsl:apply-templates select="./node()"/>
            </xsl:copy>
            </xsl:template>

            <xsl:template match="/xc:xcontent">
            <page:page type="module">
            <xsl:apply-templates select="xc:text "/>
            </page:page>
            </xsl:template>

            </xsl:stylesheet>
            ---


            Then the output will be:

            --
            <page:page type="module">
            <p>
            Caf de Kletskop is gevestigd in een oud lichtenvoords pander,
            </p>
            </page:page>
            --


            My &eacute; in the xml is gone in the transformation output.

            Sorry that i gave a bad example, now the problem should be clear.

            How do you solve my problem?

            Comment

            • David Carlisle

              #7
              Re: XSL and entities


              a non validating parser is allowed by the XML recommendation to _not_
              fetch external DTD files and just report entity references as undefined.

              howevr the Xpath model does not support undefined entities so in this
              case I would expect that you get a parsing error on input that the
              entity reference cab bot be resolved. Your system seems to be silently
              dropping the entities, which looks like a bug to me.

              Can't suggest what you can do other than raise it with maintainers.

              David

              Comment

              • Tjerk Wolterink

                #8
                Re: XSL and entities

                David Carlisle wrote:
                [color=blue]
                > a non validating parser is allowed by the XML recommendation to _not_
                > fetch external DTD files and just report entity references as undefined.
                >
                > howevr the Xpath model does not support undefined entities so in this
                > case I would expect that you get a parsing error on input that the
                > entity reference cab bot be resolved. Your system seems to be silently
                > dropping the entities, which looks like a bug to me.[/color]

                Is there no way to match entities in xsl?
                What is the default behavior of xsl systems when it comes to entities?
                [color=blue]
                > Can't suggest what you can do other than raise it with maintainers.[/color]

                raise it with maintainers??
                You mean to report it as a bug
                [color=blue]
                >
                > David[/color]

                Comment

                • David Carlisle

                  #9
                  Re: XSL and entities

                  Tjerk Wolterink <tjerk@wolterin kwebdesign.com> writes:
                  [color=blue]
                  > David Carlisle wrote:
                  >[color=green]
                  > > a non validating parser is allowed by the XML recommendation to _not_
                  > > fetch external DTD files and just report entity references as undefined.
                  > >
                  > > howevr the Xpath model does not support undefined entities so in this
                  > > case I would expect that you get a parsing error on input that the
                  > > entity reference cab bot be resolved. Your system seems to be silently
                  > > dropping the entities, which looks like a bug to me.[/color]
                  >
                  > Is there no way to match entities in xsl?[/color]

                  No, they are expanded by teh xml parser befope XSLT starts , so the
                  input tree has all entities expanded.
                  [color=blue]
                  > What is the default behavior of xsl systems when it comes to entities?[/color]
                  If the parser expands then they are not there as far as XXSLT is
                  concerned, if it doesn't it's a fatal error and nothing is transformed.
                  [color=blue]
                  >[color=green]
                  > > Can't suggest what you can do other than raise it with maintainers.[/color][/color]

                  Try a different XSLT engine?
                  [color=blue]
                  >
                  > raise it with maintainers??
                  > You mean to report it as a bug[/color]

                  yes.
                  [color=blue]
                  >[color=green]
                  > >
                  > > David[/color][/color]

                  David

                  Comment

                  • Tjerk Wolterink

                    #10
                    Re: XSL and entities

                    David Carlisle wrote:[color=blue]
                    > Tjerk Wolterink <tjerk@wolterin kwebdesign.com> writes:
                    >
                    > [cut][/color]


                    I could set the following option of the xslt-parser:

                    XSLT_SABOPT_PAR SE_PUBLIC_ENTIT IES = on
                    Tell the processor to parse public entities. By default this has been turned off.

                    But now when i do the following xsltransformati on:

                    xml:
                    --

                    <?xml version="1.0" encoding="ISO-8859-1"?>
                    <page:page xmlns:page="htt p://www.wolterinkwe bdesign.com/xml/page">
                    <page:content >

                    <page:module
                    module="agenda"
                    stylesheet="age nda.xsl">

                    <page:multipl e-settings multiple="agend apunt" max="30" order-by="datum" direction="desc "/>
                    </page:module>
                    </page:content>
                    </page:page>

                    --


                    xsl:
                    --
                    <?xml version="1.0" encoding="ISO-8859-1"?>
                    <!DOCTYPE xsl:stylesheet [
                    <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
                    %xhtml;
                    ]>

                    <xsl:styleshe et version="1.0"
                    xmlns="http://www.w3.org/1999/xhtml"
                    xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
                    xmlns:page="htt p://www.wolterinkwe bdesign.com/xml/page"
                    xmlns:menu="htt p://www.wolterinkwe bdesign.com/xml/menu"
                    xmlns:r="http://www.wolterinkwe bdesign.com/xml/roles">


                    [rest does not matter]

                    </xsl:stylesheet>
                    --


                    Now i get the following error:

                    ["msgtype"]=> string(5) "error"
                    ["code"]=> string(1) "2"
                    ["module"]=> string(9) "Sablotron"
                    ["URI"]=> string(49) "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
                    ["line"]=> string(1) "1"
                    ["msg"]=> string(51) "XML parser error 4: not well-formed (invalid token)"


                    So the dtdt on w3c.org is not valid??
                    How can i solve this?

                    What i want is that xhtml entities like &nbsp; are parsed to a number entitie lik &209;
                    (dont know if 209=nbsp but you know what i mean)

                    What should i do?

                    Comment

                    • David Carlisle

                      #11
                      Re: XSL and entities


                      So the dtdt on w3c.org is not valid??

                      I just tested the file you posted with rxp and it reported it as being
                      well formed.

                      How can i solve this?

                      Report it as a bug to the parser maintainers?

                      You don't need to load the whole xhtml dtd, just the entity definitions,
                      eg the dtd you quoted uses
                      <!ENTITY % HTMLlat1 PUBLIC
                      "-//W3C//ENTITIES Latin 1 for XHTML//EN"
                      "xhtml-lat1.ent">
                      %HTMLlat1;

                      <!ENTITY % HTMLsymbol PUBLIC
                      "-//W3C//ENTITIES Symbols for XHTML//EN"
                      "xhtml-symbol.ent">
                      %HTMLsymbol;

                      <!ENTITY % HTMLspecial PUBLIC
                      "-//W3C//ENTITIES Special for XHTML//EN"
                      "xhtml-special.ent">
                      %HTMLspecial;


                      so

                      for latin-1 for example. so you might like to try just loading those, or
                      the versions of entity files you will find at
                      http://www.w3.org/2003/entities which I personally prefer (being
                      biased:-) instead of loading the xhtml dtd.


                      But as I said at the beginning, not using a <!DOCTYPE and not using
                      entity references in your stylesheet really will make your life simpler.

                      At the very least you ought to make local copies of the files and
                      reference those. refererencing the w3c site to download the xhtml dtd
                      every time you do a transformation is going to slow your transformation
                      down dramatically.

                      David

                      Comment

                      • Tjerk Wolterink

                        #12
                        Re: XSL and entities

                        David Carlisle wrote:
                        [color=blue]
                        > So the dtdt on w3c.org is not valid??
                        >
                        > I just tested the file you posted with rxp and it reported it as being
                        > well formed.
                        >
                        > How can i solve this?
                        >
                        > Report it as a bug to the parser maintainers?
                        >
                        > You don't need to load the whole xhtml dtd, just the entity definitions,
                        > eg the dtd you quoted uses
                        > <!ENTITY % HTMLlat1 PUBLIC
                        > "-//W3C//ENTITIES Latin 1 for XHTML//EN"
                        > "xhtml-lat1.ent">
                        > %HTMLlat1;
                        >
                        > <!ENTITY % HTMLsymbol PUBLIC
                        > "-//W3C//ENTITIES Symbols for XHTML//EN"
                        > "xhtml-symbol.ent">
                        > %HTMLsymbol;
                        >
                        > <!ENTITY % HTMLspecial PUBLIC
                        > "-//W3C//ENTITIES Special for XHTML//EN"
                        > "xhtml-special.ent">
                        > %HTMLspecial;
                        >
                        >
                        > so
                        > http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
                        > for latin-1 for example. so you might like to try just loading those, or
                        > the versions of entity files you will find at
                        > http://www.w3.org/2003/entities which I personally prefer (being
                        > biased:-) instead of loading the xhtml dtd.
                        >
                        >
                        > But as I said at the beginning, not using a <!DOCTYPE and not using
                        > entity references in your stylesheet really will make your life simpler.
                        >
                        > At the very least you ought to make local copies of the files and
                        > reference those. refererencing the w3c site to download the xhtml dtd
                        > every time you do a transformation is going to slow your transformation
                        > down dramatically.
                        >
                        > David[/color]


                        i think i solved the problem. The xsl-engine was not able to load dtd's from other servers.

                        David,
                        thanks for your help!

                        Comment

                        Working...