RFC: From XHTML to HTML via XSLT

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?iso-8859-2?Q?K=F8i=B9tof_=AEelechovski?=

    RFC: From XHTML to HTML via XSLT

    It is common knowledge that XHTML is better HTML and you can serve XHTML content as HTML.
    However, the second statement is incorrect, for various reasons;
    it is enough to say that the HTML validator does not tolerate XML-style empty tags.
    It seems serving XHTML to the browser is of no advantage and can cause serious problems if the browser does not understand the difference.
    This raises the question of downgrading XHTML to HTML.
    I could not find any relevant instruction at the WWW Corporation so I decided I have to roll my own with XSLT.
    I attach the XSLT code and I kindly ask for comments (because I am a novice in this area).
    Please note that all tags and attributes have to be copied stripping the napespace;
    <xsl:copydoes not work as expected because I get <br></brinstead of <bronly.
    I decided to copy the comments explicitly
    in order to be able to embed Internet Explorer conditional inclusion comments into the output.
    Chris

    <xsl:styleshe et version="1.0"

    xmlns:xsl="http ://www.w3.org/1999/XSL/Transform">

    <xsl:output

    method="html" doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"

    doctype-system="http://www.w3.org/TR/html4/loose.dtd"

    />

    <xsl:template match="@*"<xsl: attribute name="{name()}" <xsl:value-of select="." /</xsl:attribute>

    </xsl:template>

    <xsl:template match="*"

    <xsl:element name="{name()}" <xsl:apply-templates select="@* | node()" /</xsl:element

    </xsl:template<xs l:template match="comment( )"<xsl:copy /</xsl:template>

    </xsl:stylesheet>

  • Lars Eighner

    #2
    Re: RFC: From XHTML to HTML via XSLT

    In our last episode, <ett939$25rn$1@ news2.ipartners .pl>, the lovely and
    talented Køi¹tof ®elechovski broadcast on
    comp.infosystem s.www.authoring.html:
    It is common knowledge that XHTML is better HTML
    No, it isn't.

    --
    Lars Eighner <http://larseighner.com/ <http://myspace.com/larseighner>
    Countdown: 670 days to go.

    Comment

    • =?ISO-8859-1?Q?Kristof_Zelechovski?=

      #3
      Re: RFC: From XHTML to HTML via XSLT


      Uzytkownik "Lars Eighner" <usenet@larseig hner.comnapisal w wiadomosci news:slrnf04cg3 .is2.usenet@goo dwill.larseighn er.com...
      In our last episode, <ett939$25rn$1@ news2.ipartners .pl>, the lovely and
      talented Køi¹tof ®elechovski broadcast on
      comp.infosystem s.www.authoring.html:
      >It is common knowledge that XHTML is better HTML
      No, it isn't.
      It is not better HTML but it is a common opinion.
      Chris

      Comment

      • Pierre Senellart

        #4
        Re: RFC: From XHTML to HTML via XSLT

        Křištof Želechovski ,comp.infosyste ms.www.authoring.html:
        doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
        doctype-system="http://www.w3.org/TR/html4/loose.dtd"
        Ideally, you should keep the DTD kind (Transitional/Strict) from the
        XHTML document, but there is unfortunately no way to do this in XSLT.
        <xsl:template match="@*"<xsl: attribute name="{name()}" >
        <xsl:value-of select="." /</xsl:attribute</xsl:template>
        I am not sure this is the good strategy: actually, you'd probably want to
        keep only attributes in the default namespace (so as to remove
        xml:lang/xml:space, for instance):

        <xsl:template match="@*[namespace-uri()='']">
        <xsl:copy />
        </xsl:template>

        <xsl:template match="@*" />
        <xsl:template match="*">
        >
        <xsl:element name="{name()}" <xsl:apply-templates select="@* | node()" /</xsl:element>
        >
        </xsl:template>
        Same here, you probably want to keep only elements of the xhtml
        namespace; note also the use of local-name() instead of name(), in case
        your original XHTML document use namespace prefixes:

        <xsl:template match="*[namespace-uri()='http://www.w3.org/1999/xhtml']">
        <xsl:element name="{local-name()}">
        <xsl:apply-templates select="@*|node ()" />
        </xsl:element>
        </xsl:template>

        Untested.

        Comment

        • Jukka K. Korpela

          #5
          Re: RFC: From XHTML to HTML via XSLT

          Scripsit Kristof Zelechovski:
          >>It is common knowledge that XHTML is better HTML
          >>
          >No, it isn't.
          >
          It is not better HTML but it is a common opinion.
          If you don't know the difference between knowledge and opinion, I suggest
          that you postpone further participation in public discussions until you do.
          That's just my opinion; all people have the right to ridicule themselves in
          public.

          --
          Jukka K. Korpela ("Yucca")


          Comment

          • =?ISO-8859-1?Q?Kristof_Zelechovski?=

            #6
            Re: RFC: From XHTML to HTML via XSLT


            Uzytkownik "Jukka K. Korpela" <jkorpela@cs.tu t.finapisal w wiadomosci news:9ktMh.2212 7$mh2.11690@rea der1.news.sauna lahti.fi...
            Scripsit Kristof Zelechovski:
            >>>It is common knowledge that XHTML is better HTML
            >>>
            >>No, it isn't.
            >>
            >It is not better HTML but it is a common opinion.
            If you don't know the difference between knowledge and opinion, I suggest
            that you postpone further participation in public discussions until you do.
            That's just my opinion; all people have the right to ridicule themselves in
            public.
            I have not ridiculed myself. You are trying to ridicule me. Have fun.
            Chris

            Comment

            • =?UTF-8?Q?K=C5=99i=C5=A1tof_=C5=BDelechovski?=

              #7
              Re: RFC: From XHTML to HTML via XSLT


              Użytkownik "Pierre Senellart" <invalid@invali d.invalidnapisa ł w wiadomości news:ettkis$1mg k$1@nef.ens.fr. ..
              Křištof Želechovski ,comp.infosyste ms.www.authoring.html:
              >doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
              >doctype-system="http://www.w3.org/TR/html4/loose.dtd"
              Ideally, you should keep the DTD kind (Transitional/Strict) from the
              XHTML document, but there is unfortunately no way to do this in XSLT.
              Since all my pages are transitional, I do not have such a problem.
              It seems uncommon to have some pages transitional and some pages strictly conformant;
              you usually decide one way or the other.
              ><xsl:templat e match="@*"<xsl: attribute name="{name()}" >
              ><xsl:value-of select="." /</xsl:attribute</xsl:template>
              I am not sure this is the good strategy: actually, you'd probably want to
              keep only attributes in the default namespace (so as to remove
              xml:lang/xml:space, for instance):
              Good point.
              <xsl:template match="@*[namespace-uri()='']">
              <xsl:copy />
              As I have already noticed, copy does not work because it leaves the namespace qualification,
              as in ‘xhtml:clear= "none"’, and does not remove the default value, as in ‘xhtml:restri cted="restricte d"’.
              </xsl:template>

              <xsl:template match="@*" />
              ><xsl:templat e match="*"
              >
              ><xsl:element name="{name()}" <xsl:apply-templates select="@* | node()" /</xsl:element
              >
              ></xsl:template>
              Same here, you probably want to keep only elements of the xhtml
              namespace; note also the use of local-name() instead of name(), in case
              your original XHTML document use namespace prefixes:
              It does not, but using the local name does not harm either.
              While custom elements should not make it to the output, you have to decide what to do with them if you use them.
              Skipping them altogether is one possibility, but I can imagine it need not be the best solution.
              On the other hand, if you let them pass through, the validation fails
              and at least you know you are loosing information.
              <xsl:template match="*[namespace-uri()='http://www.w3.org/1999/xhtml']">
              <xsl:element name="{local-name()}">
              <xsl:apply-templates select="@*|node ()" />
              </xsl:element>
              </xsl:template>

              Untested.
              Thanks a lot.
              Can anyone make a comment why no public resource for this transformation is available at the WWW Corp.?
              Chris

              Comment

              • Nick Kew

                #8
                Re: RFC: From XHTML to HTML via XSLT

                On Thu, 22 Mar 2007 07:52:23 +0100
                Křištof Želechovski <giecrilj@stegn y.2a.plwrote:
                It is common knowledge that XHTML is better HTML
                Sinc we don't know you here, it's not so easy to judge whether
                that's intended to be ironic. But you've already been bitten
                by those who take what you said literally.
                > This raises the question
                of downgrading XHTML to HTML.
                If you so wish.
                > I could not find any relevant
                instruction at the WWW Corporation so I decided I have to roll my own
                with XSLT.
                XSLT is an extremely inefficient way to do this (parsing an entire
                document to an in-memory tree is inherently very expensive).
                Far better to use SAX. There are modules for Apache that will let
                you transform between HTML and XHTML on the fly, going in
                whichever direction you please. Not that I'd recommend using them
                unless you have an existing need to parse the markup.

                --
                Nick Kew

                Application Development with Apache - the Apache Modules Book

                Comment

                • Sherm Pendley

                  #9
                  Re: RFC: From XHTML to HTML via XSLT

                  Nick Kew <nick@grimnir.w ebthing.comwrit es:
                  On Thu, 22 Mar 2007 07:52:23 +0100
                  Křištof Želechovski <giecrilj@stegn y.2a.plwrote:
                  >
                  >> This raises the question
                  >of downgrading XHTML to HTML.
                  >
                  If you so wish.
                  >
                  >> I could not find any relevant
                  >instruction at the WWW Corporation so I decided I have to roll my own
                  >with XSLT.
                  >
                  XSLT is an extremely inefficient way to do this (parsing an entire
                  document to an in-memory tree is inherently very expensive).
                  Far better to use SAX.
                  Even SAX seems an awfully convoluted way to do this... Why not just use tidy?

                  tidy -ashtml infile.xhtml outfile.html

                  sherm--

                  --
                  Web Hosting by West Virginians, for West Virginians: http://wv-www.net
                  Cocoa programming in Perl: http://camelbones.sourceforge.net

                  Comment

                  • Rob

                    #10
                    Re: RFC: From XHTML to HTML via XSLT

                    Jukka K. Korpela schreef:
                    Scripsit Kristof Zelechovski:
                    >
                    >>>It is common knowledge that XHTML is better HTML
                    >>>
                    >>No, it isn't.
                    >>
                    >It is not better HTML but it is a common opinion.
                    >
                    If you don't know the difference between knowledge and opinion, I
                    suggest that you postpone further participation in public discussions
                    until you do. That's just my opinion; all people have the right to
                    ridicule themselves in public.
                    >
                    I think Kristof was very subtly telling the OP that it is more a matter
                    of opinion than of knowledge.

                    Rob

                    Comment

                    • Nick Kew

                      #11
                      Re: RFC: From XHTML to HTML via XSLT

                      On Thu, 22 Mar 2007 08:43:55 -0400
                      Sherm Pendley <spamtrap@dot-app.orgwrote:
                      XSLT is an extremely inefficient way to do this (parsing an entire
                      document to an in-memory tree is inherently very expensive).
                      Far better to use SAX.
                      >
                      Even SAX seems an awfully convoluted way to do this... Why not just
                      use tidy?
                      >
                      tidy -ashtml infile.xhtml outfile.html
                      Because tidy parses to an in-memory tree. Which is, as I said,
                      hugely expensive.

                      Yes of course, if all you need is a commandline tool for processing
                      static files, then that's fine: you have hundreds of trivial solutions
                      to choose from. But if you want to do anything more interesting
                      like process outgoing content in a server, you want something
                      more efficient. In the case of Apache, the lack of a parseChunk
                      API makes tidy even more expensive than XSLT for this.

                      --
                      Nick Kew

                      Application Development with Apache - the Apache Modules Book

                      Comment

                      • Andreas Prilop

                        #12
                        Re: RFC: From XHTML to HTML via XSLT

                        On Thu, 22 Mar 2007, Køi¹tof ®elechovski wrote:
                        It is common knowledge that XHTML is better HTML and you can serve XHTML content as HTML.
                        It is common knowledge that vodka is better whisky and you can serve
                        vodka content in whisky glasses.

                        --
                        In memoriam Alan J. Flavell

                        Comment

                        • =?ISO-8859-1?Q?Kristof_Zelechovski?=

                          #13
                          Re: RFC: From XHTML to HTML via XSLT


                          Uzytkownik "Rob" <robwaaijenberg @hotmail.comnap isal w wiadomosci news:46027d3c$0 $321$e4fe514c@n ews.xs4all.nl.. .
                          Jukka K. Korpela schreef:
                          >Scripsit Kristof Zelechovski:
                          >
                          >>>>It is common knowledge that XHTML is better HTML
                          >>>>
                          >>>No, it isn't.
                          >>>
                          >>It is not better HTML but it is a common opinion.
                          >
                          >If you don't know the difference between knowledge and opinion, I
                          >suggest that you postpone further participation in public discussions
                          >until you do. That's just my opinion; all people have the right to
                          >ridicule themselves in public.
                          >
                          I think Kristof was very subtly telling the OP that it is more a matter
                          of opinion than of knowledge.
                          That is, I was subtly telling it to myself, because I am the OP.
                          Chris

                          Comment

                          • Rob Waaijenberg

                            #14
                            Re: RFC: From XHTML to HTML via XSLT

                            Kristof Zelechovski schreef:
                            Uzytkownik "Rob" <robwaaijenberg @hotmail.comnap isal w wiadomosci news:46027d3c$0 $321$e4fe514c@n ews.xs4all.nl.. .
                            >Jukka K. Korpela schreef:
                            >>Scripsit Kristof Zelechovski:
                            >>>
                            >>>>>It is common knowledge that XHTML is better HTML
                            >>>>No, it isn't.
                            >>>It is not better HTML but it is a common opinion.
                            >>If you don't know the difference between knowledge and opinion, I
                            >>suggest that you postpone further participation in public discussions
                            >>until you do. That's just my opinion; all people have the right to
                            >>ridicule themselves in public.
                            >>>
                            >I think Kristof was very subtly telling the OP that it is more a matter
                            >of opinion than of knowledge.
                            >>
                            >
                            That is, I was subtly telling it to myself, because I am the OP.
                            Chris
                            Oops

                            --
                            Rob Waaijenberg

                            Comment

                            • Simon Brooke

                              #15
                              Re: RFC: From XHTML to HTML via XSLT

                              in message <ett939$25rn$1@ news2.ipartners .pl>, Křištof Želechovski
                              ('giecrilj@steg ny.2a.pl') wrote:
                              It is common knowledge that XHTML is better HTML and you can serve XHTML
                              content as HTML. However, the second statement is incorrect, for various
                              reasons; it is enough to say that the HTML validator does not tolerate
                              XML-style empty tags. It seems serving XHTML to the browser is of no
                              advantage and can cause serious problems if the browser does not
                              understand the difference. This raises the question of downgrading XHTML
                              to HTML. I could not find any relevant instruction at the WWW Corporation
                              so I decided I have to roll my own with XSLT. I attach the XSLT code and
                              I kindly ask for comments (because I am a novice in this area). Please
                              note that all tags and attributes have to be copied stripping the
                              napespace; <xsl:copydoes not work as expected because I get <br></br>
                              instead of <bronly. I decided to copy the comments explicitly in order
                              to be able to embed Internet Explorer conditional inclusion comments into
                              the output. Chris
                              My comments:

                              (1) there is no point in doing this, XHTML is not broken.
                              (2) if you did want to do this, the stylesheet you would need would be:

                              <?xml version="1.0" encoding="utf-8"?>

                              <xsl:styleshe et version="1.0"
                              xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
                              xmlns:xhtml="ht tp://www.w3.org/1999/xhtml">

                              <xsl:output indent="yes" method="html"
                              doctype-public="-//W3C//DTD HTML 4.01 Strict//EN"
                              doctype-system="http://www.w3.org/TR/html4/strict.dtd"/>

                              <xsl:template match="@* | node()">
                              <xsl:copy>
                              <xsl:apply-templates select="@* | node()"/>
                              </xsl:copy>
                              </xsl:template>
                              </xsl:stylesheet>


                              --
                              simon@jasmine.o rg.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

                              ;; It's dangerous to be right when the government is wrong.
                              ;; Voltaire RIP Dr David Kelly 1945-2004

                              Comment

                              Working...