Choice of format for web publishing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Haines Brown

    Choice of format for web publishing

    I'd like to raise an issue that is somewhat outside the focus of this
    newsgroup although related, which is the ideal document format for web
    publication.

    In terms of likely future trends, what is the ideal format for the
    publication of technical documents (by "technical" , I mean documents
    that are paginated, have bibliography and footnotes),

    The reason for my question is that I've become involved in a project to
    develop an on-line journal in the humanities. The publisher intends to
    solicite manuscripts in Word and convert them to PDF (using Chicago
    Style Sheet, which is another matter).

    My instinct is to suggest to him that PDF has disadvantages (including
    accessibility and not being machine readable), and that he consider
    (X)HTML instead. I'd like to know reasons for choosing one over the
    other.

    --

    Haines Brown, KB1GRM



  • Garmt de Vries

    #2
    Re: Choice of format for web publishing

    On Jun 19, 3:00 pm, Haines Brown <bro...@teufel. hartford-hwp.com>
    wrote:
    The reason for my question is that I've become involved in a project to
    develop an on-line journal in the humanities. The publisher intends to
    solicite manuscripts in Word and convert them to PDF (using Chicago
    Style Sheet, which is another matter).
    >
    My instinct is to suggest to him that PDF has disadvantages (including
    accessibility and not being machine readable), and that he consider
    (X)HTML instead. I'd like to know reasons for choosing one over the
    other.
    I'm involved in an online journal that uses the Open Journal System.
    Articles are stored on the server as OpenOffice documents, and HTML or
    PDF versions are generated on the fly, according to the user's choice.
    This method seems to offer the best of both: PDF allows you to
    download and store just one file, and makes for better printing; HTML
    is (I believe) more accessible, as you say.

    Comment

    • Haines Brown

      #3
      Re: Choice of format for web publishing

      What I was hoping to see was someone suggest Tei/XML, with an
      appropriate schema and style sheet, but since it was not mentioned, I
      wonder if there's a problem going in that direction.
      --

      Haines Brown, KB1GRM



      Comment

      • Andy Dingley

        #4
        Re: Choice of format for web publishing

        On 19 Jun, 14:00, Haines Brown <bro...@teufel. hartford-hwp.comwrote:
        The reason for my question is that I've become involved in a project to
        develop an on-line journal in the humanities. The publisher intends to
        solicite manuscripts in Word and convert them to PDF (using Chicago
        Style Sheet, which is another matter).
        >
        My instinct is to suggest to him that PDF has disadvantages (including
        accessibility and not being machine readable), and that he consider
        (X)HTML instead. I'd like to know reasons for choosing one over the
        other.
        What do you mean by "publish" here?

        By all means offer PDFs as one final format that your CMS can offer to
        readers.

        Don't _store_ your content as PDFs though. Use something else
        (anything!) and generate PDFs on demand (with caching and maybe pre-
        generation).

        As a storage format, XHTML is one choice, as would be DocBook or
        TEI.

        I wouldn't use HTML, although I'd publish my XHTML to readers as HTML
        (for web-design reasons oft discussed hereabouts). The reason for this
        is that whatever XML-based format you choose for internal storage,
        it's likely to involve lots of namespacing and composition of overall
        schema by importing snippets from both DocBook and Dublin Core (etc.
        etc.) You really need namespace and processing features that XHTML
        gives you easily when HTML won't. XML tools will be far more use than
        SGML.

        I would favour DocBook over HTML for any "long" document that needs
        structure at a scope greater than heading / para. Neither has much
        semantic markup to them, neither has any advantage in the quality of
        their inline markup. DocBook does win out though for section/chapter/
        book level structure.

        Comment

        • Michael Wojcik

          #5
          Re: Choice of format for web publishing

          Haines Brown wrote:
          What I was hoping to see was someone suggest Tei/XML, with an
          appropriate schema and style sheet, but since it was not mentioned, I
          wonder if there's a problem going in that direction.
          Ask and ye shall receive - Andy Dingley just suggested TEI, though he
          proposed (and I concur) that you store internally in TEI or DocBook
          but serve HTML. I'm not sure whether that's what you were proposing
          above, or whether you were thinking of serving XML + schema + style
          sheet to user agents. The latter won't be handled properly by many
          UAs, and will confuse non-technical users if they try to save content,
          etc.

          You might want to take a look at /Kairos/ [1]. They've been in the
          online-humanities-journal biz for a while (about 12 years), so they
          have a lot of experience with what works well for their authors and
          readers.

          They publish most content as HTML, but they also run multimedia
          articles and the like. One factor to consider with an online
          humanities journal is that authors will want to use the affordances of
          the readers' systems, and that means accommodating things like video
          and interactive applets. (Obviously not all readers will be able to,
          or choose to, view that kind of content; but enough will.)

          You can get some nice innovative work if you allow for things like
          Karl Stolley's "Lo-Fi Manifesto" [2], for example.

          (The current /Kairos/ design is ... aging, shall we say; but they have
          a much nicer redesign coming out with the next issue that is prettier,
          standards-compliant, and amply supplied with features that degrade
          gracefully, like hCard markup on author information.)


          [1] http://kairos.technorhetoric.net/
          [2]


          --
          Michael Wojcik
          Micro Focus
          Rhetoric & Writing, Michigan State University

          Comment

          • Haines Brown

            #6
            Re: Choice of format for web publishing

            Michael Wojcik <mwojcik@newsgu y.comwrites:
            Andy Dingley just suggested TEI, though he proposed (and I concur)
            that you store internally in TEI or DocBook but serve HTML. I'm not
            sure whether that's what you were proposing above, or whether you were
            thinking of serving XML + schema + style sheet to user agents. The
            latter won't be handled properly by many UAs, and will confuse
            non-technical users if they try to save content, etc.
            Well, I _was_ toying with the idea of serving XML+schema+styl esheet. By
            "UA" I presume you mean the average browser (IE). However, I didn't
            realize that browsers have problems with XML + public schema +
            stylesheet. Would you be more specific about the kinds of problems and
            their likelihood of their occurring? And why would a non-technical user
            be confused? Wouldn't the user see on his browser the same thing if the
            document were instead served as HTML?

            I'm unclear about just what is implied by "store internally". Do you
            mean placing TEI or DocBook documents in a database on the server and
            then process them for display as HTML/XHTML for the user?
            You might want to take a look at /Kairos/ [1]. They've been in the
            online-humanities-journal biz for a while (about 12 years), so they
            have a lot of experience with what works well for their authors and
            readers.
            I don't understand why you offered this as an example, and probably miss
            your point. The document I looked at from the Kairos site is just some
            JavaScript that defines a framework and inserts into it an old-fashioned
            (using table for format, for example) document. If I were to do this I'd
            use SSI, XHTML, and CSS, but in any case, at least for the document I
            viewed, the internally stored document is only HTML, not TEI or DocBook.

            --

            Haines Brown, KB1GRM



            Comment

            • Michael Wojcik

              #7
              Re: Choice of format for web publishing

              Haines Brown wrote:
              Michael Wojcik <mwojcik@newsgu y.comwrites:
              >
              >Andy Dingley just suggested TEI, though he proposed (and I concur)
              >that you store internally in TEI or DocBook but serve HTML. I'm not
              >sure whether that's what you were proposing above, or whether you were
              >thinking of serving XML + schema + style sheet to user agents. The
              >latter won't be handled properly by many UAs, and will confuse
              >non-technical users if they try to save content, etc.
              >
              Well, I _was_ toying with the idea of serving XML+schema+styl esheet. By
              "UA" I presume you mean the average browser (IE).
              I mean user agent: whatever is processing the data you send. (That's
              standard terminology in the W3C specs, the HTTP RFCs, etc.) Doesn't
              particularly matter to me whether it's "average" or exotic, though of
              course you may decide not to worry about supporting less-common UAs.
              (Do you expect people to read your journal on their iPhones? On other
              mobile devices? On browsers embedded in appliances?)
              However, I didn't
              realize that browsers have problems with XML + public schema +
              stylesheet. Would you be more specific about the kinds of problems and
              their likelihood of their occurring?
              I was over-hasty with that comment. I assumed that there were many UAs
              that won't handle XML + schema + style sheet. (IE, for example,
              doesn't even handle XHTML properly.) And I believe I've read more
              substantial claims to that effect. But I realized when I read your
              response that I had not actually verified that suspicion.

              Personally, if I were building this application, I'd be reluctant to
              serve XML + schema + style sheet, simply because I'd rather not do the
              interoperabilit y testing (or limit my content to a handful of common
              UAs), when it's not at all difficult to serve HTML 4.01 Strict instead.
              And why would a non-technical user
              be confused? Wouldn't the user see on his browser the same thing if the
              document were instead served as HTML?
              Suppose you are a non-technical user. Suppose you are viewing a page
              of this journal and decide to save a copy. You know, from prior
              experience, that a saved web page is a file with an extension like
              ".htm" and possibly a folder containing some images and the like.
              What's a ".xml" file? What's a ".xsd" file?

              And whether the user sees "the same thing" is hard to say. Browsers
              have built-in styles for HTML, which they will fall back on in various
              circumstances. Some users have user style sheets, which select HTML
              elements.
              I'm unclear about just what is implied by "store internally". Do you
              mean placing TEI or DocBook documents in a database on the server and
              then process them for display as HTML/XHTML for the user?
              You have to store content, and you have to serve it. Sometimes content
              is static - that is, the server simply sends the stored representation
              (often just by reading a file from a local filesystem). Often it's
              dynamic: server-side includes, ASP and JSP and PHP and other sorts of
              scriptable pages, CGI scripts, server extensions that execute
              application code, etc.

              I don't care (well, for these purposes) how you store content. I'm
              suggesting that you store it in a form that works well for your
              production toolchain and for the applications that use it - so TEI or
              DocBook might well be a good choice. And I'm suggesting that you serve
              it in a form that the UA is likely to handle well; I'd suggest HTML
              4.01 Strict with external CSS 2.1 style sheets.

              To go from the stored representation to the presentation
              representation, XSLT looks like the obvious mechanism. The server
              could do that on the fly, if it has sufficient resources; or it could
              cache the generated HTML; or the HTML could be generated whenever the
              XML is updated and served statically.
              >You might want to take a look at /Kairos/ [1]. They've been in the
              >online-humanities-journal biz for a while (about 12 years), so they
              >have a lot of experience with what works well for their authors and
              >readers.
              >
              I don't understand why you offered this as an example, and probably miss
              your point. The document I looked at from the Kairos site is just some
              JavaScript that defines a framework and inserts into it an old-fashioned
              (using table for format, for example) document.
              I was unclear. I didn't mean /Kairos/ as an example of an implementation.

              I suggested it because it's an online humanities journal of long
              standing, relatively wide readership, and good reputation; because
              they've had to deal with all of these issues, and these are the
              compromises they arrived at; and because it demonstrates my other
              point, which is that people writing for an online journal will want to
              be able to use all the possible facilities. That means people will
              want to submit articles with multimedia components, so you need to
              think about how you'll handle non-text materials in your toolchain.
              People will want to submit articles with dynamic content and scripting
              - even applications, with any luck - so you'll need to handle that.
              If I were to do this I'd
              use SSI, XHTML, and CSS, but in any case, at least for the document I
              viewed, the internally stored document is only HTML, not TEI or DocBook.
              How can you tell how the document is stored internally? What you see
              is what the server sent you. You don't know what it did in producing
              that content.

              --
              Michael Wojcik
              Micro Focus
              Rhetoric & Writing, Michigan State University

              Comment

              • Haines Brown

                #8
                Re: Choice of format for web publishing

                Michael, thank you for your wise comments and clarifications.

                My translation of your "user agent" into the instance of browswers I see
                was too restrictive. You are right; I do have to consider iPhones,
                etc. Yes, that would be exotic today, but tomorrow perhaps less so. On
                the other hand, there is perhaps reason to assume that "exotic" UAs will
                at the same time learn to deal with XML.

                I know that IE does not do HTML well, and I have to make the appropriate
                accomodations. I'm too ignorant about the matter to say whether it would
                do any worse with XML.

                You point about the user possibly defining the presentation style
                understood. That suggests serving the pages with a clear separation of
                format and marked-up content, which can be either XML or HTML.

                Thanks again, you were very helpful.
                --

                Haines Brown, KB1GRM



                Comment

                Working...