Conversion of historical material to HTML

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Robert AH Prins

    Conversion of historical material to HTML

    Hi all,

    A number of questions:

    1) I'm in the process of converting some historical type-written
    newsletters to HTML and one of the things that came up while doing so
    was the fact that some headings underline every other letter. I can
    just go ahead and bold them instead, but that feels a bit 1984'ish, or
    I can emulate this using <span> tags,
    but this leads to horribly (too) long lines. Is there an easier
    method?

    2) Given the limitations of browsers, is there an easy way of getting
    (as close as possible to) the original fontsizes. I guess using
    appropriately set up span.xxx classes is the only way to influence
    line-spacing?

    3) I'm currently using CSS2 & <div>'s to get the paging right. This
    works OK, but is there an easier way?

    4) Does anyone have/know where to find some simple examples of pages
    (probably using tables) using different font-sizes for the various
    cells - I need to use them to emulate the small-size output of thermal
    printers.

    5) OT(ish) My scanner came with Paperport Deluxe V8 and ReadIris was
    on the PC. Both are OK(ish) for this type of material, but choke on
    more complicated (and I have heaps of it) stuff, can anyone recommend
    some really good OCR software?

    6) Would UltraEdit be a good choice to edit HTML/CSS or are there more
    suitable alternatives? FWIW, I'm now using an old DOS editor that I
    can use almost blindfolded, but that doesn't have any fancy features
    (other than being tiny and fast) and has a line-length limit of 255
    characters (see 1) above...

    Reply here, but if you can cc: any reply to <prino@onetel.n et.uk> I
    would appreciate it.

    Thanks,

    Robert
    --
    Robert AH Prins
    prino@onetel.ne t.uk
  • Mark Tranchant

    #2
    Re: Conversion of historical material to HTML

    Robert AH Prins wrote:
    [color=blue]
    > A number of questions:[/color]

    Not really answering any of the questions: but what are your trying to
    achieve here? I think you're going to put in lots of work for an
    unsatisfactory result. My advice would be:

    Convert the letters to the plainest HTML you can, with no attempt at
    presentation or replicating the original look. The goal here is to convert
    the information.

    Next, if appropriate, prepare high-resolution scans of the originals,
    either as individual images or as multi-page PDFs. This is to convert the
    visual appearance, if that is an important part of the history.

    --
    Mark.

    Comment

    • Neal

      #3
      Re: Conversion of historical material to HTML

      On 20 Apr 2004 07:59:20 -0700, Robert AH Prins <prino@bigfoot. com> wrote:
      [color=blue]
      > Hi all,
      >
      > A number of questions:
      >
      > 1) I'm in the process of converting some historical type-written
      > newsletters to HTML and one of the things that came up while doing so
      > was the fact that some headings underline every other letter. I can
      > just go ahead and bold them instead, but that feels a bit 1984'ish, or
      > I can emulate this using <span> tags,
      > but this leads to horribly (too) long lines. Is there an easier
      > method?[/color]

      What's your goal? To make the web page look just like the document? Can't
      really be done, you can get close. Or to present the factual information
      in the documents? Then mark up headings like headings.
      [color=blue]
      > 2) Given the limitations of browsers, is there an easy way of getting
      > (as close as possible to) the original fontsizes. I guess using
      > appropriately set up span.xxx classes is the only way to influence
      > line-spacing?[/color]

      Monitor resolution will be an issue here - you won't actually know the end
      product, will you?
      [color=blue]
      > 3) I'm currently using CSS2 & <div>'s to get the paging right. This
      > works OK, but is there an easier way?[/color]

      The page layout? This is appropriate.
      [color=blue]
      > 4) Does anyone have/know where to find some simple examples of pages
      > (probably using tables) using different font-sizes for the various
      > cells - I need to use them to emulate the small-size output of thermal
      > printers.[/color]

      Even if you're adopting the look and feel of these old documents, remember
      usability and accessibility. Don't make those fonts too small. You could
      apply font-size to the specific element.

      Bottom line - if you are looking to present historical documents, worry
      about the accessibility of the content first, and the nuance of the
      presentation later.

      Comment

      • Robert AH Prins

        #4
        Re: Conversion of historical material to HTML

        "Mark Tranchant" <mark@tranchant .plus.com> wrote in message
        news:40853C44.6 060405@tranchan t.plus.com...[color=blue]
        > Robert AH Prins wrote:
        >[color=green]
        > > A number of questions:[/color]
        >
        > Not really answering any of the questions: but what are your trying[/color]
        to[color=blue]
        > achieve here? I think you're going to put in lots of work for an
        > unsatisfactory result. My advice would be:[/color]

        My stepdaughter is doing most of the work to earn some extra
        pocketmoney,
        and there are no deadlines. It's a labour of love to keep this
        material
        available for posterity.
        [color=blue]
        > Convert the letters to the plainest HTML you can, with no attempt at
        > presentation or replicating the original look. The goal here is to[/color]
        convert[color=blue]
        > the information.[/color]

        I agree, but having an index and some cross-linking is extremely
        useful.
        [color=blue]
        > Next, if appropriate, prepare high-resolution scans of the[/color]
        originals,[color=blue]
        > either as individual images or as multi-page PDFs. This is to[/color]
        convert the[color=blue]
        > visual appearance, if that is an important part of the history.[/color]

        There are already PDFs availalable, but creating multi megabyte files
        (the
        largest in in excess of 3Mb) for a mere 6 pages A4 (text content about
        24k) is utter madness, even more so because it is just a picture and
        cannot
        be searched.

        Robert
        --
        Robert AH Prins
        prino@onetel.ne t.uk


        Comment

        • Jukka K. Korpela

          #5
          Re: Conversion of historical material to HTML

          prino@bigfoot.c om (Robert AH Prins) wrote:
          [color=blue]
          > 1) I'm in the process of converting some historical type-written
          > newsletters to HTML and one of the things that came up while doing so
          > was the fact that some headings underline every other letter.[/color]

          I would suggest avoiding any attempt to reproduce that in HTML. Any
          underlining, even broken underline, would easily be misunderstood as
          denoting a link in HTML. So it's better to use some other styling for a
          heading, if needed. I would simply use a suitable element, like h2, and
          add as much CSS as reasonable to make the appearance resemble the
          original - e.g. in fonts, bolding, etc., but not issues like underlining.

          The look & feel might matter, so some styling, even detailed, might be
          nice. But don't overdo it.

          But should you wish to imitate underlining, then CSS code like
          h2 { border-bottom: dashed black thin; }
          might be a reasonable compromise. It's not really underline but bottom
          border, and that's one reason why it wouldn't be taken as link underline
          so easily. Other options include setting suitable background image
          containing just a short vertical line at the level of underlining and
          some transparent stuff on the right, and repeating in x direction.
          [color=blue]
          > I can
          > just go ahead and bold them instead, but that feels a bit 1984'ish, or
          > I can emulate this using <span> tags,
          > but this leads to horribly (too) long lines.[/color]

          It gets rather awkward. But you could put every other character between
          <u> and </u>. I wouldn't be too puristic here.

          You could also use increased font size instead of bolding, e.g.
          h2 { font-weight: normal; font-size: 115%; }
          [color=blue]
          > 2) Given the limitations of browsers, is there an easy way of getting
          > (as close as possible to) the original fontsizes.[/color]

          Why would you do _that_? Don't fight against the strengths of the Web.
          Just as the Web is a way to make the data technically accessible
          worldwide over the network, not setting any font sizes (except relatively
          e.g. for headings) is part of the way of making it humanly accessible to
          people with different properties and preferences.
          [color=blue]
          > I guess using
          > appropriately set up span.xxx classes is the only way to influence
          > line-spacing?[/color]

          You cannot influence line spacing in HTML - though you could create very
          coarse simulations. Just use line-height in CSS.
          [color=blue]
          > 3) I'm currently using CSS2 & <div>'s to get the paging right. This
          > works OK, but is there an easier way?[/color]

          Paging, too, is an area where you should utilize the strengths of the Web
          and not fight against them. Do you know what paper sizes people have, or
          their printer settings?

          --
          Yucca, http://www.cs.tut.fi/~jkorpela/
          Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

          Comment

          • Robert AH Prins

            #6
            Re: Conversion of historical material to HTML

            "Jukka K. Korpela" <jkorpela@cs.tu t.fi> wrote in message news:<Xns94D1DE 56C78Fjkorpelac stutfi@193.229. 0.31>...[color=blue]
            > prino@bigfoot.c om (Robert AH Prins) wrote:
            >[color=green]
            > > 1) I'm in the process of converting some historical type-written
            > > newsletters to HTML and one of the things that came up while doing so
            > > was the fact that some headings underline every other letter.[/color]
            >
            > I would suggest avoiding any attempt to reproduce that in HTML. Any
            > underlining, even broken underline, would easily be misunderstood as
            > denoting a link in HTML.[/color]

            That's something I didn't consider. I'll stick to bold.
            [color=blue]
            > So it's better to use some other styling for a
            > heading, if needed. I would simply use a suitable element, like h2, and[/color]

            As I mentioned in my original post, these were type-written
            newsletters,
            the only text decoration they use are the alternate underline and ALL
            CAPS.
            [color=blue]
            > add as much CSS as reasonable to make the appearance resemble the
            > original - e.g. in fonts, bolding, etc., but not issues like underlining.
            >
            > The look & feel might matter, so some styling, even detailed, might be
            > nice. But don't overdo it.[/color]

            I have no plans to go all the way. My stepdaughter is
            scanning/correcting the
            material, I'm just adding very basic HTML.
            [color=blue]
            > But should you wish to imitate underlining, then CSS code like
            > h2 { border-bottom: dashed black thin; }[/color]

            Everything is enclosed in <pre> ... </pre> tags, I don't even use
            <p>'s.

            <snip>
            [color=blue]
            > I wouldn't be too puristic here.[/color]

            For these newsletters, which you can find at
            <http://www.rskey.org/tippc.htm> under the heading 52-Notes, I will
            stick to basic text, for others I will have to add a bit more fancy
            stuff (two columns and graphics)
            [color=blue][color=green]
            > > 2) Given the limitations of browsers, is there an easy way of getting
            > > (as close as possible to) the original fontsizes.[/color]
            >
            > Why would you do _that_? Don't fight against the strengths of the Web.
            >
            > Just as the Web is a way to make the data technically accessible
            > worldwide over the network, not setting any font sizes (except relatively
            > e.g. for headings) is part of the way of making it humanly accessible to
            > people with different properties and preferences.[/color]

            OK, try again: These things use two fontsizes, one for the body,
            another slightly smaller for the colofon on page 1. I'd like them to
            look like the
            originals (assuming a medium font in the browser)
            [color=blue][color=green]
            > > I guess using
            > > appropriately set up span.xxx classes is the only way to influence
            > > line-spacing?[/color]
            >
            > You cannot influence line spacing in HTML - though you could create very
            > coarse simulations. Just use line-height in CSS.[/color]

            Sorry, that's what I meant, and I'm using only full, 80 and 50%, which
            I guess
            should be OK in most modern browsers
            [color=blue][color=green]
            > > 3) I'm currently using CSS2 & <div>'s to get the paging right. This
            > > works OK, but is there an easier way?[/color]
            >
            > Paging, too, is an area where you should utilize the strengths of the Web
            > and not fight against them. Do you know what paper sizes people have, or
            > their printer settings?[/color]

            The originals were on US 8.5x11", which enough margins to fit in A4.
            Using
            medium font sixe in IE, they print OK on either format.

            Robert
            --
            Robert AH Prins
            prino@onetel.ne t.uk

            Comment

            Working...