How do I detect empty tags?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • vega

    How do I detect empty tags?

    How do I detect empty tags if I have the DOM document?

    For example: <br /> and <br></br>

    I tried org.w3c.dom.Nod e.getFirstChild (), it returns null for both <br
    /> and <br></br>
    I also tried getNodeValue(), they both returns null also.

    I know <br /> and <br></br> are the same from the xml spec. Is there
    any way to tell the different syntax using DOM parser?

    Thanks,
    -John

  • Andy Dingley

    #2
    Re: How do I detect empty tags?

    On 13 Apr 2005 18:23:59 -0700, "vega" <johnh2@gmail.c om> wrote:
    [color=blue]
    >How do I detect empty tags if I have the DOM document?
    >
    >For example: <br /> and <br></br>[/color]

    You can't and you don't need to. In XML these are exactly
    equivalent(sic) .



    "Empty-element tags MAY be used for any element which has no content,
    whether or not it is declared using the keyword EMPTY. For
    interoperabilit y, the empty-element tag SHOULD be used, and SHOULD
    only be used, for elements which are declared EMPTY."


    There may be a useful difference you can find in the element's
    definition from DTD or schema - i..e. EMPTY You can access this by
    either parsing it, or (more easily) by using a document parser that
    understands schema and offers a more direct link to the relevant one.

    This is the definition though, not the instance. It won't tell you if
    the empty-element form of the tag in your document was used because
    it's an EMPTY element, or just a non-empty element that happens to
    have no content in this instance.


    In general though, the way the document was serialised is not visible
    to an XML application and even more importantly there is NO reason why
    it needs to be. You just never need it.

    If you do think you need it, then the chances are that you're in a
    non-XML context, such as XHTML or RSS. Although these are ostensibly
    XML protocols, they exist in an environment that's still rooted in the
    HTML past. There may be valid reasons for still caring about things
    that a purely XML context wouldn't need to.

    Comment

    • Mukul Gandhi

      #3
      Re: How do I detect empty tags?

      <br/> and <br></br> are same according to XML spec.. I do not think
      any compliant XML parser would treat these two ways differently. So I
      think the XML parser cannot report this difference..

      Just also curious, for what purpose this information is useful to
      you..

      Regards,
      Mukul

      "vega" <johnh2@gmail.c om> wrote in message news:<111344183 9.479148.5280@z 14g2000cwz.goog legroups.com>.. .[color=blue]
      > How do I detect empty tags if I have the DOM document?
      >
      > For example: <br /> and <br></br>
      >
      > I tried org.w3c.dom.Nod e.getFirstChild (), it returns null for both <br
      > /> and <br></br>
      > I also tried getNodeValue(), they both returns null also.
      >
      > I know <br /> and <br></br> are the same from the xml spec. Is there
      > any way to tell the different syntax using DOM parser?
      >
      > Thanks,
      > -John[/color]

      Comment

      • Richard Tobin

        #4
        Re: How do I detect empty tags?

        In article <b1634669.05041 40412.6993c7d@p osting.google.c om>,
        Mukul Gandhi <mukul_gandhi@y ahoo.com> wrote:
        [color=blue]
        ><br/> and <br></br> are same according to XML spec.. I do not think
        >any compliant XML parser would treat these two ways differently. So I
        >think the XML parser cannot report this difference..[/color]

        An XML parser can report what it likes, but it would usually be unwise
        to write software that depended on the difference. For one thing,
        passing the document through any common XML program might well change
        it.

        The XML Infoset does not distinguish between the two forms.
        [color=blue]
        >Just also curious, for what purpose this information is useful to
        >you..[/color]

        Editor-like applications should preserve the user's preferred
        formatting, and ideally so should any application that doesn't
        completely alter the structure of the document.

        -- Richard

        Comment

        • Jon Haugsand

          #5
          Re: How do I detect empty tags?

          * Richard Tobin[color=blue][color=green]
          > >Just also curious, for what purpose this information is useful to
          > >you..[/color]
          >
          > Editor-like applications should preserve the user's preferred
          > formatting, and ideally so should any application that doesn't
          > completely alter the structure of the document.[/color]

          Would <br><!-- metainformation comment --></br> be illegal according
          to the spec?

          --
          Jon Haugsand
          Dept. of Informatics, Univ. of Oslo, Norway, mailto:jonhaug@ ifi.uio.no
          http://www.ifi.uio.no/~jonhaug/, Phone: +47 22 85 24 92

          Comment

          • Malte

            #6
            Re: How do I detect empty tags?

            Jon Haugsand wrote:[color=blue]
            > * Richard Tobin
            >[color=green][color=darkred]
            >>>Just also curious, for what purpose this information is useful to
            >>>you..[/color]
            >>
            >>Editor-like applications should preserve the user's preferred
            >>formatting, and ideally so should any application that doesn't
            >>completely alter the structure of the document.[/color]
            >
            >
            > Would <br><!-- metainformation comment --></br> be illegal according
            > to the spec?
            >[/color]

            Das interessiert mich auch. Ich habe hier nachgeschaut:



            Dort heisst es man sollte <br /> bevorzugen (statt <br></br>) (xhtml)

            und hier



            Hier heisst es, dass <br /> nicht erlaubt sei (html 4.01) (Start tag:
            required, End tag: forbidden)


            Comment

            • Andy Dingley

              #7
              Re: How do I detect empty tags?

              On 14 Apr 2005 15:12:57 +0200, Jon Haugsand <jonhaug@ifi.ui o.no>
              wrote:
              [color=blue]
              >Would <br><!-- metainformation comment --></br> be illegal according
              >to the spec?[/color]

              Yes. (according to XML 1.0)

              "The representation of an empty element is either a start-tag
              immediately followed by an end-tag, or an empty-element tag."

              Note "immediatel y"

              <br/> is equivalent to <br />
              <br /> is equivalent to <br></br>

              <br>[... anything ...]</br> is _not_ equivalent to <br></br>

              Even <br> </br> (simple whitespace) is not empty content and thus is
              invalid for an element defined as EMPTY


              Of course in most cases this will be treated as valid, because <br />
              is presumed to be an XHTML element and most XHTML gets handled by a
              HTML parser, not an XML parser.

              Comment

              • David Carlisle

                #8
                Re: How do I detect empty tags?



                Of course in most cases this will be treated as valid, because <br />
                is presumed to be an XHTML element and most XHTML gets handled by a
                HTML parser, not an XML parser.


                Except that if it gets handled by a real HTML parser it is valid but
                equivalent to <br>> so typesets a > at the start of the new line.

                See what onsgmls makes of:

                <html><head><ti tle>a</title></head>
                <body>
                <br/><br>>
                </body>
                </html>


                (BODY
                AID IMPLIED
                ACLASS IMPLIED
                ASTYLE IMPLIED
                ATITLE IMPLIED
                ACLEAR TOKEN NONE
                (BR
                )BR
                ->
                AID IMPLIED
                ACLASS IMPLIED
                ASTYLE IMPLIED
                ATITLE IMPLIED
                ACLEAR TOKEN NONE
                (BR
                )BR
                ->
                )BODY
                )HTML
                C


                David

                Comment

                • Andy Dingley

                  #9
                  Re: How do I detect empty tags?

                  On Thu, 14 Apr 2005 14:39:42 GMT, David Carlisle <davidc@nag.co. uk>
                  wrote:
                  [color=blue]
                  >Except that if it gets handled by a real HTML parser[/color]

                  But is HTML SGML ? 8-) I accept your point for SGML certainly, but
                  HTML is a world-of-hacks no matter how you look at it.

                  Comment

                  • Alan J. Flavell

                    #10
                    Re: How do I detect empty tags?

                    On Thu, 14 Apr 2005, Andy Dingley wrote:
                    [color=blue]
                    > But is HTML SGML ? 8-)[/color]

                    The W3C say both yes and no. This has been discussed before, or
                    course: in the body of the HTML specification, they describe HTML as
                    an application of SGML, but then later on they rule-out certain
                    constructions when SGML didn't allow to be ruled out. That's the way
                    I understood the argument, anyway.
                    [color=blue]
                    > I accept your point for SGML certainly, but HTML is a world-of-hacks
                    > no matter how you look at it.[/color]

                    Indeed. And XHTML/1,0 Appendix C continued that messy tradition.
                    Quite why so many newcomers aspire to just that, beats me.

                    Comment

                    • Richard Tobin

                      #11
                      Re: How do I detect empty tags?

                      In article <dmhdi9cxie.fsf @fugazze.ifi.ui o.no>,
                      Jon Haugsand <jonhaug@ifi.ui o.no> wrote:[color=blue]
                      >Would <br><!-- metainformation comment --></br> be illegal according
                      >to the spec?[/color]

                      Well-formed but invalid. An element declared EMPTY must have "no
                      content (not even entity references, comments, PIs or white space)".

                      -- Richard

                      Comment

                      • Peter Flynn

                        #12
                        Re: How do I detect empty tags?

                        Malte wrote:
                        [color=blue]
                        > Jon Haugsand wrote:[color=green]
                        >> * Richard Tobin
                        >>[color=darkred]
                        >>>>Just also curious, for what purpose this information is useful to
                        >>>>you..
                        >>>
                        >>>Editor-like applications should preserve the user's preferred
                        >>>formatting , and ideally so should any application that doesn't
                        >>>completely alter the structure of the document.[/color]
                        >>
                        >>
                        >> Would <br><!-- metainformation comment --></br> be illegal according
                        >> to the spec?
                        >>[/color]
                        >
                        > Das interessiert mich auch. Ich habe hier nachgeschaut:
                        >
                        > http://www.w3.org/TR/xhtml1/#C_2
                        >
                        > Dort heisst es man sollte <br /> bevorzugen (statt <br></br>) (xhtml)[/color]

                        That is XML. The <br/> form for EMPTY elements is permitted.
                        [color=blue]
                        > und hier
                        >
                        > http://www.w3.org/TR/1999/REC-html40...t.html#edef-BR
                        >
                        > Hier heisst es, dass <br /> nicht erlaubt sei (html 4.01) (Start tag:
                        > required, End tag: forbidden)[/color]

                        That is SGML. The SGML form for EMPTY elements is <br> (no slash).

                        ///Peter
                        --
                        sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
                        &;top"

                        Comment

                        • Peter Flynn

                          #13
                          Re: How do I detect empty tags?

                          Andy Dingley wrote:
                          [color=blue]
                          > On 13 Apr 2005 18:23:59 -0700, "vega" <johnh2@gmail.c om> wrote:
                          >[color=green]
                          >>How do I detect empty tags if I have the DOM document?
                          >>
                          >>For example: <br /> and <br></br>[/color]
                          >
                          > You can't and you don't need to. In XML these are exactly
                          > equivalent(sic) .[/color]

                          It was a bone of contention at design time. Many contributors felt that
                          the Null End Tag trick was useful ONLY when the element was declared
                          EMPTY, and that the full form <foo></foo> meant something different (eg
                          "this is an element which CAN have content, it just doesn't happen to
                          have any on this occasion") and that to conflate them was poor design.
                          They lost.

                          ///Peter
                          --
                          sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
                          &;top"

                          Comment

                          • Jan Roland Eriksson

                            #14
                            Re: How do I detect empty tags?

                            On Fri, 15 Apr 2005 00:55:54 +0100, Peter Flynn
                            <peter.no-sp@m.silmaril.i e> wrote:
                            [color=blue]
                            >Andy Dingley wrote:
                            >[color=green]
                            >> On 13 Apr 2005 18:23:59 -0700, "vega" <johnh2@gmail.c om> wrote:[color=darkred]
                            >>>How do I detect empty tags if I have the DOM document?
                            >>>For example: <br /> and <br></br>[/color][/color][/color]
                            [color=blue][color=green]
                            >> You can't and you don't need to. In XML these are exactly
                            >> equivalent(sic) .[/color][/color]
                            [color=blue]
                            >It was a bone of contention at design time. Many contributors felt that
                            >the Null End Tag trick...[/color]

                            Not so fast; let's get this right in the first place and say that it's
                            about a NESTC+NET "trick" (if you really want to call it a trick?)

                            The original definition is here of course...



                            ....where the (informative) SGML declaration for XML has the following
                            DELIM definitions (among others)

                            NESTC "/" (NET-Enabling Start-Tag Close)
                            NET ">" (Null End-Tag)
                            [color=blue]
                            >...was useful ONLY when the element was declared EMPTY...[/color]

                            Actually it was the other way around, the "trick" was supposed to be
                            useful when you had _no_ declarations available at all, as in "DTD'less
                            parsing" of fully tagged, i.e. "well formed" instances of markup.
                            [color=blue]
                            >...and that the full form <foo></foo> meant something different (eg
                            >"this is an element which CAN have content, it just doesn't happen to
                            >have any on this occasion") and that to conflate them was poor design.[/color]

                            Exactly, and a useful distinction precisely for the cases where you need
                            to parse an instance without the inclusion of a declaration subset.

                            Had the distinction been kept, we would have been able to give the OP a
                            useful answer here in this thread, but as it all went haywire after some
                            very big companys rep's started to stick their nose too deep into the
                            issue, oh well...
                            [color=blue]
                            >They lost.[/color]

                            We have had lots of those over the years, sad to say.

                            --
                            Rex


                            Comment

                            Working...