Multiple coding systems, and filesystems

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • gentsquash@gmail.com

    Multiple coding systems, and filesystems

    On some of my course pages, I quote (with attribution)
    small sections of Wikipedia and the like. E.g, the top
    of


    has "entropia" in Greek font,



    has the o-umlaut from German, and



    has a Japanese font. What is the correct --maybe "coding
    system" is the term?-- so that I could quote all three of
    these on the same HTML page?

    And can the HTML-page be set up so that it will validate?
    =============== =============== =============== =======

    Actually, I'm ahead of myself. In the past I've cut&pasted
    a snippet from, say, wiki/entropy, into an Emacs buffer,
    adjoined a "From Wictionary http://..." and attempted to
    save the buffer. Sometimes Emacs asked me for what coding
    system to use --and I don't know how to placate it.

    If I'm using multiple coding systems on the same webpage,
    do I have to save the different snippets in different files
    stored with different coding systems, and then

    <!--#include ... -->

    each of them into one webpage? Or can the file system
    permit a file that simultaneously has Greek, German and
    Japanese characters?

    FWIW, my home OS is MacOSX and I need to upload my webpages
    to school. The math dept. server is probably running
    Unix; when I manipulate the html files (when at work), I'm
    using Emacs running on a Solaris (unix) system.

    Sincerely,
    Prof. Jonathan King (gentsquash)
    Mathematics dept, Univ. of Florida
  • Stanimir Stamenkov

    #2
    Re: Multiple coding systems, and filesystems

    Tue, 3 Jun 2008 14:08:25 -0700 (PDT), /gentsquash@gmai l.com/:
    Or can the file system
    permit a file that simultaneously has Greek, German and
    Japanese characters?
    Files generally store bytes. How these bytes will be interpreted is
    up to the application reading them. Characters are encoded into
    bytes using different coding schemes which generally are capable of
    representing the characters of a specific character set. The
    Unicode character set generally contains all possible characters so
    if you use some UTF (Unicode Transformation Format) variant you can
    have all characters you need encoded in a single entity. So make
    sure your text editor supports reading/saving files using UTF-8, for
    example.

    --
    Stanimir

    Comment

    • Jukka K. Korpela

      #3
      Re: Multiple coding systems, and filesystems

      Scripsit gentsquash@gmai l.com:
      On some of my course pages, I quote (with attribution)
      small sections of Wikipedia and the like. E.g, the top
      of

      >
      has "entropia" in Greek font,
      Technically, it has the word in Greek _characters_ (letters). This is
      the key issue; fonts are secondary. The page has a style sheet that
      makes special suggestions on the font of such words, in a most confusing
      and tricky way.
      What is the correct --maybe "coding
      system" is the term?-- so that I could quote all three of
      these on the same HTML page?
      The proper _character encoding_ is UTF-8 in such cases. As soon as you
      have Japanese, Greek, and umlaut Latin letters on one page, that's
      definitely the best option. If there were just a few "special"
      characters, you could present them using entity references like &ouml;
      or character references like ą, but this gets clumsy (or requires
      suitable software for generating them) if you have full sentences that
      consist of "special" characters.

      It's not possible (in practice on web pages) to switch the character
      encoding in the middle of an HTML document.
      In the past I've cut&pasted
      a snippet from, say, wiki/entropy, into an Emacs buffer,
      adjoined a "From Wictionary http://..." and attempted to
      save the buffer. Sometimes Emacs asked me for what coding
      system to use --and I don't know how to placate it.
      UTF-8, if Emacs can really produce it. The version of Emacs I've been
      using does not deal with "special" characters, but I recently looked at
      the newest version of Emacs for Windows, and it seems to have an
      impressive support to "special" characters.

      Note that the server should be configured to send an appropriate HTTP
      header. You normally do this by adding something to your .htaccess file,
      and in practice you need to use the same encoding for all ".html" files
      in a directory (folder), though you could use, for example, ISO-8859-1
      for ".html" and UTF-8 for ".htm" files.
      If I'm using multiple coding systems on the same webpage,
      do I have to save the different snippets in different files
      stored with different coding systems, and then
      >
      <!--#include ... -->
      >
      each of them into one webpage?
      No, it won't work that way, even if your server supports SSI includes.
      They result in a single document, which can have one encoding only. (I
      won't mention <iframe>, because it's really a poor hack for things like
      this, but it performs sort-of include where the included document is
      displayed "autonomous ly" inside the main canvas and may have a different
      encoding.)
      FWIW, my home OS is MacOSX and I need to upload my webpages
      to school. The math dept. server is probably running
      Unix; when I manipulate the html files (when at work), I'm
      using Emacs running on a Solaris (unix) system.
      A nice mess :-) but it should be manageable when using UTF-8. When
      uploading with FTP, use binary (not Ascii) mode, since no character
      conversion shall be performed - the data is already in a
      system-independent encoding.

      --
      Jukka K. Korpela ("Yucca")


      Comment

      • Andreas Prilop

        #4
        Re: Multiple coding systems, and filesystems

        On Tue, 3 Jun 2008, gentsquash@gmai l.com wrote:
        Greek
        German
        Japanese
        What is the correct --maybe "coding
        system" is the term?-- so that I could quote all three of
        these on the same HTML page?
        Use Unicode in the encoding ("charset") UTF-8:

        Sometimes Emacs asked me for what coding
        system to use --and I don't know how to placate it.
        Choose UTF-8 for the web.
        Or can the file system
        permit a file that simultaneously has Greek, German and
        Japanese characters?
        Yes - with Unicode.
        when I manipulate the html files (when at work), I'm
        using Emacs running on a Solaris (unix) system.
        Either use a UTF-8 locale such as

        export LC_ALL="en_US.U TF-8"
        export LANG="en_US.UTF-8"

        or write all non-ASCII characters as character references
        &#number;


        --
        In memoriam Alan J. Flavell

        Comment

        • Andreas Prilop

          #5
          Re: Multiple coding systems, and filesystems

          On Wed, 4 Jun 2008, Jukka K. Korpela wrote:
          though you could use, for example, ISO-8859-1
          for ".html" and UTF-8 for ".htm" files.
          A better idea is to separate content-type and charset.
          For example, use "utf8" for UTF-8 and "iso1" for ISO-8859-1.
          On Apache, you can write into your .htaccess file:

          Options +Multiviews
          DefaultType text/html
          AddCharset iso-8859-1 iso1
          AddCharset utf-8 utf8

          Name the files as "mypage.html.is o1" and "anotherpage.ht ml.utf8"
          or simply as "mypage.iso 1" and "anotherpage.ut f8";
          and don't forget "stylesheet.css .utf8".

          In the URLs, omit ".iso1" and ".utf8" of course:

          <a href="mypage.ht ml">
          <a href="anotherpa ge.html">


          /* One wonders if you need ISO-8859-1 at all
          when you can have documents in UTF-8. */

          --
          Solipsists of the world - unite!

          Comment

          Working...