Validating against a higher standard

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michael Stemper

    Validating against a higher standard

    The W3C Validator is a great help, as far as it goes. However, I'm looking
    for something stricter. My coding style does not allow for implicit
    termination of an element; my intention and desire is to explicitly
    terminate every element.

    Just last night, it took me two hours to track down some funny behavior. It
    turned out to be caused by entries in a definition list that looked like:
    <dt>Term<dt>
    <dd>Definitio n</dd>

    A few years back, I was bit by Mosaic's behavior when <tdelements
    weren't explictly terminated with a </td>.

    It takes good eyesight and good concentration to find things like this,
    and since these are apparently valid html, the W3C validator doesn't
    help as much as I'd like.

    Is there a tool available someplace that will flag failure to explictly
    terminate elements that aren't required by the standard to be explicitly
    terminated?

    --
    Michael F. Stemper
    #include <Standard_Discl aimer>
    If it's "tourist season", where do I get my license?
  • Thelma Roslyn Lubkin

    #2
    Re: Validating against a higher standard

    Michael Stemper <mstemper@walka bout.empros.com wrote:
    : A few years back, I was bit by Mosaic's behavior when <tdelements
    : weren't explictly terminated with a </td>.

    The W3C Validator has been catching those for me.
    --thelma

    : Michael F. Stemper

    Comment

    • Harlan Messinger

      #3
      Re: Validating against a higher standard

      Thelma Roslyn Lubkin wrote:
      Michael Stemper <mstemper@walka bout.empros.com wrote:
      : A few years back, I was bit by Mosaic's behavior when <tdelements
      : weren't explictly terminated with a </td>.
      >
      The W3C Validator has been catching those for me.
      --thelma
      You must be validating your code as XHTML. XHTML, like all XML
      languages, requires explicit start and end tags. Under non-XML HTML, the
      missing </tdis legal, so there isn't an error for the validator to catch.

      Comment

      • Lars Eighner

        #4
        Re: Validating against a higher standard

        In our last episode, <g9rqlk$rio$1@a ioe.org>, the lovely and talented
        Michael Stemper broadcast on comp.infosystem s.www.authoring.html:
        The W3C Validator is a great help, as far as it goes. However, I'm looking
        for something stricter. My coding style does not allow for implicit
        termination of an element; my intention and desire is to explicitly
        terminate every element.
        Install nsgmls from James Clark's SP or the OpenSp package. Get a copy of
        whatever DTD you are using. Edit it so it requires closing tags. With a
        little supervision, you can do this with search-and-replace to change
        <space>O<spacew ith <space>-<spaceexcept where O is followed by EMPTY. If
        you cannot read a DTD by eyeball, you may have to brush up to assure this
        gets done correctly. Use switches in nsgmls to make it use the local (doped)
        copy of the DTD instead of the one specified in the DOCTYPE (it would fetch
        the non-doped from the web if it could). Throw away the output and look at
        the errors. (There won't be any if al is well.)

        You can use to Tidy to add closing tags, but Tidy is not a validator and
        does some other stuff to markup you may not want.
        Just last night, it took me two hours to track down some funny behavior. It
        turned out to be caused by entries in a definition list that looked like:
        <dt>Term<dt>
        <dd>Definitio n</dd>
        A few years back, I was bit by Mosaic's behavior when <tdelements
        weren't explictly terminated with a </td>.
        It takes good eyesight and good concentration to find things like this,
        and since these are apparently valid html, the W3C validator doesn't
        help as much as I'd like.
        Is there a tool available someplace that will flag failure to explictly
        terminate elements that aren't required by the standard to be explicitly
        terminated?
        --
        Lars Eighner <http://larseighner.com/usenet@larseigh ner.com
        War on Terrorism: Treat Readers like Mushrooms
        "If the story needs rewriting to play down the civilian casualties, DO IT."
        -Memo, _Panama City_ (FL) _News Herald_

        Comment

        • Thelma Roslyn Lubkin

          #5
          Re: Validating against a higher standard

          Harlan Messinger <hmessinger.rem ovethis@comcast .netwrote:
          : Thelma Roslyn Lubkin wrote:
          :Michael Stemper <mstemper@walka bout.empros.com wrote:
          :: A few years back, I was bit by Mosaic's behavior when <tdelements
          :: weren't explictly terminated with a </td>.
          :>
          : The W3C Validator has been catching those for me.
          : --thelma

          : You must be validating your code as XHTML. XHTML, like all XML
          : languages, requires explicit start and end tags. Under non-XML HTML, the
          : missing </tdis legal, so there isn't an error for the validator to catch.

          Yes, you're right. --thelma

          Comment

          • dorayme

            #6
            Re: Validating against a higher standard

            In article <g9rqlk$rio$1@a ioe.org>,
            mstemper@walkab out.empros.com (Michael Stemper) wrote:
            The W3C Validator is a great help, as far as it goes. However, I'm looking
            for something stricter. My coding style does not allow for implicit
            termination of an element; my intention and desire is to explicitly
            terminate every element.
            >
            Just last night, it took me two hours to track down some funny behavior. It
            turned out to be caused by entries in a definition list that looked like:
            <dt>Term<dt>
            <dd>Definitio n</dd>
            >
            A few years back, I was bit by Mosaic's behavior when <tdelements
            weren't explictly terminated with a </td>.
            >
            It takes good eyesight and good concentration to find things like this,
            and since these are apparently valid html, the W3C validator doesn't
            help as much as I'd like.
            >
            Is there a tool available someplace that will flag failure to explictly
            terminate elements that aren't required by the standard to be explicitly
            terminated?
            There are a number of ways of going on this one. One way is to use a
            text editor that can be set to give you "warnings" of missing closing
            tags (even though it is strictly not illegal). My BBEdit (on a Mac) does
            this and most usefully!

            The other way is to create your own doctype using standard ones with
            whatever enhancements you want.

            Perhaps you might care to browse through:

            <http://validator.w3.or g/docs/help.html>

            --
            dorayme

            Comment

            • Chris F.A. Johnson

              #7
              Re: Validating against a higher standard

              On 2008-09-05, Michael Stemper wrote:
              The W3C Validator is a great help, as far as it goes. However, I'm looking
              for something stricter. My coding style does not allow for implicit
              termination of an element; my intention and desire is to explicitly
              terminate every element.
              >
              Just last night, it took me two hours to track down some funny behavior. It
              turned out to be caused by entries in a definition list that looked like:
              <dt>Term<dt>
              <dd>Definitio n</dd>
              >
              A few years back, I was bit by Mosaic's behavior when <tdelements
              weren't explictly terminated with a </td>.
              >
              It takes good eyesight and good concentration to find things like this,
              and since these are apparently valid html, the W3C validator doesn't
              help as much as I'd like.
              >
              Is there a tool available someplace that will flag failure to explictly
              terminate elements that aren't required by the standard to be explicitly
              terminated?
              I find that GNU emacs's HTML mode indentation works well for this.
              If tags are not closed, the indentation doesn't work properly. It
              still requires some searching, but at least you know when there's
              a missing tag.

              --
              Chris F.A. Johnson <http://cfaj.freeshell. org>
              =============== =============== =============== =============== =======
              Author:
              Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)

              Comment

              • Jukka K. Korpela

                #8
                Re: Validating against a higher standard

                Lars Eighner wrote:
                >The W3C Validator is a great help, as far as it goes. However, I'm
                >looking for something stricter. My coding style does not allow for
                >implicit termination of an element; my intention and desire is to
                >explicitly terminate every element.
                >
                Install nsgmls from James Clark's SP or the OpenSp package.
                You don't need that. Any validator will do, such as the W3C validator. (Of
                course, I'm not referring to products other than validators sold as
                validators.)
                Get a copy of whatever DTD you are using. Edit it so it requires closing
                tags.
                Well, that's the simple way, though not trivial, but good instructions for
                this special case were given. And then you just upload the DTD file on a web
                server and refer to it in your doctype declaration. More on this:


                Yucca

                Comment

                • Peter J Ross

                  #9
                  Re: Validating against a higher standard

                  In comp.infosystem s.www.authoring.html on Fri, 5 Sep 2008 19:35:48
                  +0200 (CEST), Michael Stemper <mstemper@walka bout.empros.com wrote:
                  The W3C Validator is a great help, as far as it goes. However, I'm looking
                  for something stricter. My coding style does not allow for implicit
                  termination of an element; my intention and desire is to explicitly
                  terminate every element.
                  >
                  Just last night, it took me two hours to track down some funny behavior. It
                  turned out to be caused by entries in a definition list that looked like:
                  <dt>Term<dt>
                  <dd>Definitio n</dd>
                  >
                  A few years back, I was bit by Mosaic's behavior when <tdelements
                  weren't explictly terminated with a </td>.
                  >
                  It takes good eyesight and good concentration to find things like this,
                  and since these are apparently valid html, the W3C validator doesn't
                  help as much as I'd like.
                  >
                  Is there a tool available someplace that will flag failure to explictly
                  terminate elements that aren't required by the standard to be explicitly
                  terminated?
                  pcregrep -M '(<[^<>]+>)[^<>]+\1' <filename>

                  This will catch simple examples such as the one you've provided. It
                  won't catch nested examples, such as "<p><em>tes t</em><p>" and it will
                  catch false positives such as "<br><br>", but perhaps it's a start.

                  Any other variety of grep that offers multi-line matching will work
                  as well.


                  --
                  PJR :-)
                  slrn newsreader (v0.9.9): http://slrn.sourceforge.net/
                  extra slrn documentation: http://slrn-doc.sourceforge.net/
                  newsgroup name validator: http://pjr.lasnobberia.net/usenet/validator

                  Comment

                  • Ben C

                    #10
                    Re: Validating against a higher standard

                    On 2008-09-06, Peter J Ross <pjr@example.in validwrote:
                    In comp.infosystem s.www.authoring.html on Fri, 5 Sep 2008 19:35:48
                    +0200 (CEST), Michael Stemper <mstemper@walka bout.empros.com wrote:
                    [...]
                    >Is there a tool available someplace that will flag failure to explictly
                    >terminate elements that aren't required by the standard to be explicitly
                    >terminated?
                    >
                    pcregrep -M '(<[^<>]+>)[^<>]+\1' <filename>
                    >
                    This will catch simple examples such as the one you've provided. It
                    won't catch nested examples, such as "<p><em>tes t</em><p>" and it will
                    catch false positives such as "<br><br>", but perhaps it's a start.
                    >
                    Any other variety of grep that offers multi-line matching will work
                    as well.
                    Using a custom DTD seems to me like the best idea.

                    But if you were to use something like grep, this sounds like a job for
                    sgrep.

                    e.g.:

                    $ sgrep -o "Unclosed <pat %f:%i" '"<p>" not in inner("<p>" .. "</p>")' *.html

                    %i prints out a character position not a line-number, so in vim for
                    example you'd use the :go command to find them.

                    This will work reliably, but not for ps with attributes, <p class="foo">
                    etc.

                    Sgrep would be improved in my opinion if you could use regular
                    expressions for what it calls "phrases" in its grammar.

                    Maybe you could build up expressions with "chars" but I get this error:

                    Parse error in command line expression column 1 :
                    'chars' disabled until I figure out how to fix it (JJ)

                    Probably better to do a hybrid approach: pre-process the input first to
                    remove all attributes from all elements with a regex and then look for
                    the unclosed <p>s, <td>s etc. with sgrep.

                    Comment

                    • Ben C

                      #11
                      Re: Validating against a higher standard

                      On 2008-09-06, Peter J Ross <pjr@example.in validwrote:
                      [...]
                      >Is there a tool available someplace that will flag failure to explictly
                      >terminate elements that aren't required by the standard to be explicitly
                      >terminated?
                      There is now: http://www.tidraso.co.uk/code/checkUnclosed.py.gz

                      Comment

                      • Jim Moe

                        #12
                        Re: Validating against a higher standard

                        On 09/05/08 10:35 am, Michael Stemper wrote:
                        The W3C Validator is a great help, as far as it goes. However, I'm looking
                        for something stricter. My coding style does not allow for implicit
                        termination of an element; my intention and desire is to explicitly
                        terminate every element.
                        >
                        Just last night, it took me two hours to track down some funny behavior. It
                        turned out to be caused by entries in a definition list that looked like:
                        <dt>Term<dt>
                        <dd>Definitio n</dd>
                        >
                        Have it validate to XHTML Strict. The W3C validator allows you to select
                        which DTD to test the markup.
                        XHTML requires closure for all elements. It also requires all tag and
                        attribute names in lower case, and attribute values in quotes.

                        --
                        jmm (hyphen) list (at) sohnen-moe (dot) com
                        (Remove .AXSPAMGN for email)

                        Comment

                        • Dr J R Stockton

                          #13
                          Re: Validating against a higher standard

                          In comp.infosystem s.www.authoring.html message <g9rqlk$rio$1@a ioe.org>,
                          Fri, 5 Sep 2008 19:35:48, Michael Stemper <mstemper@walka bout.empros.com
                          posted:
                          >Is there a tool available someplace that will flag failure to explictly
                          >terminate elements that aren't required by the standard to be explicitly
                          >terminated?
                          >
                          It should be easy enough to write a RegExp (or RegExps) to find for
                          removal all known elements that don't use termination (e.g. <br>).
                          Having done that, it should be easy enough to write a RegExp that finds
                          for removal all instances of (ignore whitespace)
                          < word anything anything-without-< < / same-word >
                          and to use it repeatedly until there are no more matches.

                          Assuming that you put spaces around relational operators, and encode
                          then in strings, in embedded JavaScript, then it seems at first thought
                          that a satisfactory document will yield an empty string, and a failing
                          one will yield a string from which at least one error can easily be
                          spotted.

                          Alternatively, a program could be written in a general language.

                          I might try it; but don't hold your breath waiting.


                          Since TIDY inserts terminations but alters whitespace, one might write a
                          tool to take a document and reduce all whitespaces to single spaces.
                          Apply that tool to the original and tidied documents. Then do a binary
                          comparison (DOS COMP) to find the first difference. UNTESTED.

                          I use TIDY very frequently; but (from a batch file) as a checker only,
                          since I don't want layout changed.



                          One might also look, though maybe not needed in this context, for
                          elements which illegally contain, directly or indirectly, others of
                          themselves.

                          --
                          (c) John Stockton, nr London, UK. ?@merlyn.demon. co.uk Turnpike v6.05 MIME.
                          Web <URL:http://www.merlyn.demo n.co.uk/- FAQish topics, acronyms, & links.
                          Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
                          Do not Mail News to me. Before a reply, quote with ">" or "" (SonOfRFC1036)

                          Comment

                          • Lars Eighner

                            #14
                            Re: Validating against a higher standard

                            In our last episode, <Iduwk.61451$_0 3.43398@reader1 .news.saunalaht i.fi>, the
                            lovely and talented Jukka K. Korpela broadcast on
                            comp.infosystem s.www.authoring.html:
                            Lars Eighner wrote:
                            >>The W3C Validator is a great help, as far as it goes. However, I'm
                            >>looking for something stricter. My coding style does not allow for
                            >>implicit termination of an element; my intention and desire is to
                            >>explicitly terminate every element.
                            >>
                            >Install nsgmls from James Clark's SP or the OpenSp package.
                            You don't need that. Any validator will do, such as the W3C validator. (Of
                            course, I'm not referring to products other than validators sold as
                            validators.)
                            I never figured out how to stream the document in my editor through
                            W3C and send the errors to another buffer in the editor. Installing an SP
                            package is much simpler and easier. Of course it cannot generate warnings
                            for HTML requirements that cannot be expressed expressed as sgml. But it
                            more than makes up for that in being simple to incorporate in make files,
                            other batch processing, and even in the open and save keystrokes of
                            an editor, so documents are checked whenever they are opened, whenever they
                            are saved, or at any time they are open in the editor.

                            >Get a copy of whatever DTD you are using. Edit it so it requires closing
                            >tags.
                            Well, that's the simple way, though not trivial, but good instructions for
                            this special case were given. And then you just upload the DTD file on a web
                            server and refer to it in your doctype declaration.
                            Using a local doped DTD it is unnecessary to mess with the DOCTYPE.
                            Yucca
                            --
                            Lars Eighner <http://larseighner.com/usenet@larseigh ner.com
                            War on Terrorism: Camp Follower
                            "I am ... a total sucker for the guys ... with all the ribbons on and stuff,
                            and they say it's true and I'm ready to believe it. -Cokie Roberts,_ABC_

                            Comment

                            • Jukka K. Korpela

                              #15
                              Re: Validating against a higher standard

                              Lars Eighner wrote:
                              I never figured out how to stream the document in my editor through
                              W3C and send the errors to another buffer in the editor. Installing
                              an SP package is much simpler and easier.
                              It may suit your needs, but for an ordinary HTML author, that's
                              unnecessarily complicated. For all that we can know, he might even lack the
                              privileges required for installing any software in the computer he uses (at
                              work).
                              Using a local doped DTD it is unnecessary to mess with the DOCTYPE.
                              The doctype declaration is the normal SGML (and XML) way of specifying the
                              syntax being used. It works globally, so that you can pass the document to
                              _any_ validating parser.

                              You need a doctype declaration anyway, to keep browsers away from Quirks
                              Mode, so why not make it declare the syntax you are trying to use? This
                              approach also lets you use _different_ syntax definitions, e.g. normally
                              using some "stricter than Strict" but using something else when really
                              needed (say, when you need to use a Transitional feature for some special
                              reason, or markup like <wbr>).

                              Yucca

                              Comment

                              Working...