encoding of scripts

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andy Fish

    encoding of scripts

    Hi,

    using HTML 4.01 (not xhtml), I have recently discovered that this:

    <script>var x='</script>';</script>

    is not valid HTML - the fact that there is an end script tag in quotes
    causes the parser to stop recognising the script. initially my reaction was
    that this is not a surprise because I had failed to HTML encode the script
    contents, so my second attempt was this:

    <script>var x='&lt;/script&gt;';</script>

    however this it DOES NOT WORK - the variable ends up containing the text
    "&lt;/script&gt;"

    can someone point me at part of the w3c specification that states how script
    tags are parsed differently to other tags in HTML.

    interestingly i have also discovered that this:

    <script>if (3<5);</script>

    IS valid html (and seems even to be valid XHTML) even though it is not valid
    XML

    Andy


  • Erwin Moller

    #2
    Re: encoding of scripts

    Andy Fish schreef:
    Hi,
    >
    using HTML 4.01 (not xhtml), I have recently discovered that this:
    >
    <script>var x='</script>';</script>
    >
    is not valid HTML - the fact that there is an end script tag in quotes
    causes the parser to stop recognising the script. initially my reaction was
    that this is not a surprise because I had failed to HTML encode the script
    contents, so my second attempt was this:
    >
    <script>var x='&lt;/script&gt;';</script>
    >
    however this it DOES NOT WORK - the variable ends up containing the text
    "&lt;/script&gt;"
    >
    can someone point me at part of the w3c specification that states how script
    tags are parsed differently to other tags in HTML.
    >
    interestingly i have also discovered that this:
    >
    <script>if (3<5);</script>
    >
    IS valid html (and seems even to be valid XHTML) even though it is not valid
    XML
    >
    Andy
    >
    >
    What about:

    <script>var x='<\/script>';</script>
    ?
    Mind the added \

    Regards,
    Erwin Moller

    Comment

    • viza

      #3
      Re: encoding of scripts

      On Jun 2, 12:41 pm, "Andy Fish" <ajf...@blueyon der.co.ukwrote:
      can someone point me at part of the w3c specification that states how script
      tags are parsed differently to other tags in HTML.
      http://www.w3.org/TR/html4/sgml/dtd.html#Script :

      <!ENTITY % Script "CDATA" -- script expression -->



      <!ELEMENT SCRIPT - - %Script; -- script statements -->
      interestingly i have also discovered that this:
      >
      <script>if (3<5);</script>
      >
      IS valid html
      Apart from the missing required "type" attribute, yes. The content
      type of the script element in HTML4 is CDATA, which means everything
      up to the first occurrence of </ is read as-is.
      (and seems even to be valid XHTML) even though it is not valid XML
      This is not possible since XHTML is XML.

      The content type of the script element in XHTML1 is PCDATA, which that
      your original idea of using
      var= '&lt;foo&gt;'

      means the same as
      var='<foo>'

      in a raw javascript file. Note that this doesn't actually work "in
      the wild", because most users have broken browsers (eg: IE).

      The best thing to do is to never ever have anything in your script
      elements and only include scripts in separate files.

      HTH
      viza

      Comment

      • Andreas Prilop

        #4
        Re: encoding of scripts

        On Mon, 2 Jun 2008, Andy Fish wrote:
        Newsgroups: comp.infosystem s.www.authoring.html
        In how many newsgroups did you multipost?

        Comment

        • Jukka K. Korpela

          #5
          Re: encoding of scripts

          Scripsit Andy Fish:
          using HTML 4.01 (not xhtml), I have recently discovered that this:
          >
          <script>var x='</script>';</script>
          >
          is not valid HTML - the fact that there is an end script tag in quotes
          causes the parser to stop recognising the script.
          The fact that there is an end tag causes that. Quotes do not matter.
          They are just data characters in this context.
          <script>var x='&lt;/script&gt;';</script>
          >
          however this it DOES NOT WORK - the variable ends up containing the
          text "&lt;/script&gt;"
          By HTML 4.01 rules, yes. There the content model is CDATA, which means
          that entity references are not recognized, and "&" is just a data
          character.
          can someone point me at part of the w3c specification that states how
          script tags are parsed differently to other tags in HTML.
          They aren't. The _content_ of the <script_element _ is special. This
          can be found in the HTML 4.01 specs simply by looking at the description
          of that element; it points to

          which refers to an appendix that explains ways to overcome the "</"
          problem, such as prefixing "/" with "\" in JavaScript. In JavaScript,
          you could also write
          var x='<'+'/script>';
          but that looks a bit more hackish.
          interestingly i have also discovered that this:
          >
          <script>if (3<5);</script>
          >
          IS valid html
          No it isn't, but that's due to the lack of the type="..." attribute. If
          you fix that, then it is valid. That's because the digit "5" isn't a
          name start character.
          (and seems even to be valid XHTML)
          It isn't valid in XHTML, since by XHTML rules, "<" must not appear in
          any context as such except as the starting character of a tag.

          In XHTML, the content model of <scriptis #PCDATA, so _there_ you could
          use &lt; to stand for "<". But it's not wise to use XHTML as the
          delivery format of a web page, because IE does not support XHTML.
          even though it is not valid XML
          It would be impossible for a document to be non-valid XML if it is valid
          XHTML. This immediately follows from the _definition_ of validity.

          There is a simple way to get rid of such complexities: write your script
          into an external file and refer to it via <script type="text/javascript"
          src="foo.js"></script>.

          --
          Jukka K. Korpela ("Yucca")


          Comment

          • Andy Fish

            #6
            Re: encoding of scripts

            thanks for all the replies - i understand it all now

            unfortunately i can't write all my scripts in separate js files because this
            is all javascript that i'm generating on the fly on the server, but i have
            amended my quoting/encoding functions to detect '</' and split it into 2
            concatenated strings

            :-)


            "Jukka K. Korpela" <jkorpela@cs.tu t.fiwrote in message
            news:JHT0k.1096 6$_03.6624@read er1.news.saunal ahti.fi...
            Scripsit Andy Fish:
            >
            >using HTML 4.01 (not xhtml), I have recently discovered that this:
            >>
            ><script>var x='</script>';</script>
            >>
            >is not valid HTML - the fact that there is an end script tag in quotes
            >causes the parser to stop recognising the script.
            >
            The fact that there is an end tag causes that. Quotes do not matter. They
            are just data characters in this context.
            >
            ><script>var x='&lt;/script&gt;';</script>
            >>
            >however this it DOES NOT WORK - the variable ends up containing the
            >text "&lt;/script&gt;"
            >
            By HTML 4.01 rules, yes. There the content model is CDATA, which means
            that entity references are not recognized, and "&" is just a data
            character.
            >
            >can someone point me at part of the w3c specification that states how
            >script tags are parsed differently to other tags in HTML.
            >
            They aren't. The _content_ of the <script_element _ is special. This can
            be found in the HTML 4.01 specs simply by looking at the description of
            that element; it points to

            which refers to an appendix that explains ways to overcome the "</"
            problem, such as prefixing "/" with "\" in JavaScript. In JavaScript, you
            could also write
            var x='<'+'/script>';
            but that looks a bit more hackish.
            >
            >interestingl y i have also discovered that this:
            >>
            ><script>if (3<5);</script>
            >>
            >IS valid html
            >
            No it isn't, but that's due to the lack of the type="..." attribute. If
            you fix that, then it is valid. That's because the digit "5" isn't a name
            start character.
            >
            >(and seems even to be valid XHTML)
            >
            It isn't valid in XHTML, since by XHTML rules, "<" must not appear in any
            context as such except as the starting character of a tag.
            >
            In XHTML, the content model of <scriptis #PCDATA, so _there_ you could
            use &lt; to stand for "<". But it's not wise to use XHTML as the delivery
            format of a web page, because IE does not support XHTML.
            >
            >even though it is not valid XML
            >
            It would be impossible for a document to be non-valid XML if it is valid
            XHTML. This immediately follows from the _definition_ of validity.
            >
            There is a simple way to get rid of such complexities: write your script
            into an external file and refer to it via <script type="text/javascript"
            src="foo.js"></script>.
            >
            --
            Jukka K. Korpela ("Yucca")
            http://www.cs.tut.fi/~jkorpela/

            Comment

            Working...