character to HTML ampersand escape sequence converter

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Nick Kew

    #16
    Re: character to HTML ampersand escape sequence converter

    In article <hsivonen-BAB9D4.01493919 122004@news.dna internet.net>,
    Henri Sivonen <hsivonen@iki.f i> writes:
    [color=blue][color=green]
    >> In article <hsivonen-5BCFB2.12592918 122004@news.dna internet.net>,
    >> Henri Sivonen <hsivonen@iki.f i> writes:
    >>[color=darkred]
    >> >> Indeed. I was on the point of suggesting AN XML processor until I saw
    >> >> that (libxml2 accepts HTML as well as XML input).[/color][/color]
    >[color=green]
    >> The HTML parser gives you either SAX or DOM, and will process either
    >> HTML or XHTML input without distinction.[/color]
    >
    > Are the elements in the XHTML namespace or in no namespace?[/color]

    They're not namespaced. At least not in the SAX parse mode, which is
    where I've investigated the issue. At least, my preliminary experiments
    trying to use the HTML parser in SAX2 mode were not successful, which
    is not to say I won't return to the issue.
    [color=blue]
    > The good
    > thing about TagSoup is that it allows the app internals to be written
    > for XHTML, so the same app internals work for HTML, XHTML *and*
    > XHTML+FooML (using an XML parser). That is, the HTML/XHTML difference is
    > left on the parsing level and not carried over to higher levels as in
    > browsers.[/color]

    Watch this space. That's what I'd like mod_publisher to do. OTOH,
    how many people mix HTML (no X) with other namespaces in real life?
    The full capability is at best a pathological edge-case.

    BTW, if you're interested in namespace processing on the Web,
    may I refer you to my recently-published article at
    www.XML.com,Textuality Services,Nick Kew,Instruction,XML Namespace Processing in Apache


    --
    Nick Kew

    Comment

    • Henri Sivonen

      #17
      Re: character to HTML ampersand escape sequence converter

      In article <cq2g92-mu.ln1@hugin.we bthing.com>,
      nick@hugin.webt hing.com (Nick Kew) wrote:
      [color=blue]
      > In article <hsivonen-BAB9D4.01493919 122004@news.dna internet.net>,
      > Henri Sivonen <hsivonen@iki.f i> writes:
      >[color=green][color=darkred]
      > >> In article <hsivonen-5BCFB2.12592918 122004@news.dna internet.net>,
      > >> Henri Sivonen <hsivonen@iki.f i> writes:
      > >>
      > >> >> Indeed. I was on the point of suggesting AN XML processor until I saw
      > >> >> that (libxml2 accepts HTML as well as XML input).[/color]
      > >[color=darkred]
      > >> The HTML parser gives you either SAX or DOM, and will process either
      > >> HTML or XHTML input without distinction.[/color]
      > >
      > > Are the elements in the XHTML namespace or in no namespace?[/color]
      >
      > They're not namespaced.[/color]

      That's a pity. Of course, it's possible to write a filter that takes
      SAX1 events, adds the namespacing and emits SAX2 events, but it is
      uncool to have to implement stuff that a library should be able to do
      out of the box.
      [color=blue][color=green]
      > > The good
      > > thing about TagSoup is that it allows the app internals to be written
      > > for XHTML, so the same app internals work for HTML, XHTML *and*
      > > XHTML+FooML (using an XML parser). That is, the HTML/XHTML difference is
      > > left on the parsing level and not carried over to higher levels as in
      > > browsers.[/color]
      >
      > Watch this space. That's what I'd like mod_publisher to do. OTOH,
      > how many people mix HTML (no X) with other namespaces in real life?[/color]

      The people who export from MS Office?

      I was not suggesting that namespaces in HTML should be supported. How
      that would work isn't even defined.

      However, I think it doesn't make sense to write the app internals for
      namespaceless HTML so that massive rework is needed for XHTML+FooML. It
      makes more sense to write the app internals for namespaced compound
      documents and to convert HTML to XHTML at parse time. Using an XML
      parser is the right way to go for XHTML and XHTML+FooML.
      [color=blue]
      > BTW, if you're interested in namespace processing on the Web,
      > may I refer you to my recently-published article at
      > http://www.xml.com/pub/a/2004/12/15/...amespaces.html[/color]

      Interesting.

      BTW, how do you reconcile the GPL and the Apache license?

      --
      Henri Sivonen
      hsivonen@iki.fi

      Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

      Comment

      • Nick Kew

        #18
        Re: character to HTML ampersand escape sequence converter

        In article <hsivonen-38EBA0.19382419 122004@news.dna internet.net>,
        Henri Sivonen <hsivonen@iki.f i> writes:
        [color=blue][color=green]
        >> Watch this space. That's what I'd like mod_publisher to do. OTOH,
        >> how many people mix HTML (no X) with other namespaces in real life?[/color]
        >
        > The people who export from MS Office?[/color]

        Good catch. I'd forgotten that one. Don't they try/claim to be XHTML?
        [color=blue]
        > I was not suggesting that namespaces in HTML should be supported. How
        > that would work isn't even defined.[/color]

        It would presumably work by treating it as XHTML. Like XPath, XSLT,
        etc, which do work fine with HTML and the libxml2 parser.
        [color=blue][color=green]
        >> BTW, if you're interested in namespace processing on the Web,
        >> may I refer you to my recently-published article at
        >> http://www.xml.com/pub/a/2004/12/15/...amespaces.html[/color]
        >
        > Interesting.
        >
        > BTW, how do you reconcile the GPL and the Apache license?[/color]

        Why is that a problem? My work is GPL (if you want it free - dual
        licensing available otherwise). Apache is ASF license. They are
        distributed separately. Those Linux distros (and FreeBSD) that
        package my GPL modules offer them to users as separate packages,
        and don't have a problem with it. Even the fundamentalists at
        Debian don't have a problem with it. Any more than they have a
        problem distributing non-GPL apps like Apache to run on Linux itself.

        --
        Nick Kew

        Comment

        • Henri Sivonen

          #19
          Re: character to HTML ampersand escape sequence converter

          In article <l8vg92-j41.ln1@hugin.w ebthing.com>,
          nick@hugin.webt hing.com (Nick Kew) wrote:
          [color=blue]
          > In article <hsivonen-38EBA0.19382419 122004@news.dna internet.net>,
          > Henri Sivonen <hsivonen@iki.f i> writes:
          >[color=green][color=darkred]
          > >> Watch this space. That's what I'd like mod_publisher to do. OTOH,
          > >> how many people mix HTML (no X) with other namespaces in real life?[/color]
          > >
          > > The people who export from MS Office?[/color]
          >
          > Good catch. I'd forgotten that one. Don't they try/claim to be XHTML?[/color]

          I don't think so. It's more like HTML tag soup spiced up with colonified
          names and XML "data islands".
          [color=blue][color=green]
          > > I was not suggesting that namespaces in HTML should be supported. How
          > > that would work isn't even defined.[/color]
          >
          > It would presumably work by treating it as XHTML.[/color]

          With namespaces in HTML I meant this kind of Microsoftism:

          <HTML xmlns:k='urn:ke wl-schema-urn'>
          <HEAD>
          <TITLE>Test</TITLE>
          <xml>
          <k:foo>
          <k:bar/>
          </k:foo>
          </xml>
          </HEAD>
          <BODY>
          ....
          </BODY>
          </HTML>

          (I suppose Microsoft has defined how that is supposed to work. So saying
          it isn't defined was not entirely accurate.)
          [color=blue]
          > Why is that a problem?[/color]

          The FSF lists the Apache licenses 1.0, 1.1 and 2.0 as GPL-incompatible
          free software licenses.


          [color=blue]
          > Even the fundamentalists at Debian don't have a problem with it.[/color]

          That's surprising. :-)
          [color=blue]
          > Any more than they have a
          > problem distributing non-GPL apps like Apache to run on Linux itself.[/color]

          IIRC, Linus Torvalds declared an exception when the subject came up.

          --
          Henri Sivonen
          hsivonen@iki.fi

          Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

          Comment

          Working...