Lang attribute values

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alan J. Flavell

    #76
    Re: Lang attribute values

    On Mon, 26 Jan 2004, Andreas Prilop wrote:
    [color=blue]
    > <http://www.unics.uni-hannover.de/nhtcapri/urdu-alphabet.html>[/color]

    Uh-uh, you're using space and zero-width joiner to exhibit the
    initial, final and medial forms. I guess I could do that in my Arabic
    Unicode page also?

    Comment

    • Andreas Prilop

      #77
      Re: Lang attribute values

      On Mon, 26 Jan 2004, Alan J. Flavell wrote:
      [color=blue][color=green]
      >> <http://www.unics.uni-hannover.de/nhtcapri/urdu-alphabet.html>[/color]
      >
      > Uh-uh, you're using space and zero-width joiner to exhibit the
      > initial, final and medial forms. I guess I could do that in my Arabic
      > Unicode page also?[/color]

      I was surprised that the zero-width joiner (‍) does work with
      Mozilla 1.3 and later. However, an earlier version of Netscape (7.0?)
      didn't recognize it, IIRC.

      Has someone still Netscape 7.0 on Windows [2000 or XP]?
      Can you tell me whether it shows different glyphs in the third column
      of <http://www.unics.uni-hannover.de/nhtcapri/arabic-alphabet.html> ?

      [ ـ does work with Netscape 7.0. ]

      Comment

      • Henri Sivonen

        #78
        Re: Lang attribute values

        In article <Xns947BB0D7575 jkorpelacstutfi @193.229.0.31>,
        "Jukka K. Korpela" <jkorpela@cs.tu t.fi> wrote:
        [color=blue]
        > Henri Sivonen <hsivonen@iki.f i> wrote:
        >[color=green]
        > > Choosing a font is only one problem. There are others including
        > > line breaking.[/color]
        >
        > Of course the _quality_ of rendering on screen or paper can be affected
        > by such processes.[/color]

        I think it is worthwhile to try to improve the quality.
        [color=blue]
        > My point was that browsers have been able to present
        > documents without knowing the language, and they keep doing so (even
        > now,[/color]

        The Mozilla feature can improve the quality of rendering in very
        realistic cases. It happens to degrade the quality in your rather
        theoretical example. But still, all the characters are rendered, so is
        it "just" a matter of quality. Isn't it appropriate to optimize the
        quality in cases that can plausibly occur on countless pages even if you
        can come up with a rare counter-example where the optimization degrades
        the quality?
        [color=blue]
        > and they always had the option of recognizing language from
        > actual content[/color]

        Browsers--being interactive applications that aim for incremental
        display--have never really had that option.
        [color=blue][color=green]
        > > When you write <span lang="ru">Dosto yevsky</span>, what would you
        > > want recipients to do with the language data?[/color]
        >
        > Nothing particular. I'm just giving (meta)informati on.[/color]

        I no longer appreciate the inclusion of metadata when the inclusion is
        not motivated by a realistic use case and is done just for the sake of
        providing metadata.
        [color=blue]
        > In a sense, here
        > I'm intentionally more papal than the pope - I am applying an
        > unconditional Priority 1 WAI guideline that the WAI itself violates.[/color]
        ....[color=blue]
        > And as I wrote, I don't recommend doing that in practice - but not
        > because the idea would be wrong. It's the Mozilla misbehavior that
        > makes it currently impractical.[/color]

        If you want to make a point about hypocrisy in WAI, why do you point at
        Mozilla as something that is misbehaving?
        [color=blue][color=green]
        > > That is, is it
        > > actually useful for transliterated text to come with language data
        > > in any existing or realistic client implementation for any of the
        > > purposes you list in
        > > http://www.cs.tut.fi/~jkorpela/kielimerkkaus/1.html ?[/color][/color]
        ....[color=blue]
        > In any existing implementation, most probably not.[/color]

        Doesn't that make the inclusion of language metadata on transliterated
        names about as useful as migrating from HTML 4.01 to XHTML 1.0 served as
        text/html? :-)
        [color=blue]
        > In a realistic implementation, why not?[/color]

        Because isolated transliterated Russian words marked as Russian are so
        rare and there are so many bugs and so little developer time.
        [color=blue]
        > Of course they would need to
        > know or guess the transliteration method, but there's nothing that
        > prevents them from making educated guesses, except that it means quite
        > some work.[/color]

        So guessing the transliteration method would be OK, but making educated
        font guesses based on explicit language information is not OK?
        [color=blue]
        > I guess we should use lang="und" then.)[/color]

        Or, rather lang="".
        [color=blue][color=green]
        > > Let's suppose I'm writing a content management system and I choose
        > > to use UTF-8 for all output - -
        > > What advice should I provide authors who want to use the system for
        > > publishing Polish or Chinese text? How should they make their
        > > suggestions?[/color]
        >
        > You mean for fonts?[/color]

        Yes.
        [color=blue]
        > By using font properties in CSS.[/color]

        The usual advice from Prilop and Flavell is not to do that.
        [color=blue]
        > I don't see how lang attributes would help in practice, though it would
        > be OK to declare the language as a preparation for the future.[/color]

        For example, on X11 platforms where there are separate fonts for various
        8-bit repertoires, Mozilla can choose a Central European (ISO-8859-2)
        font instead of a "Western" (ISO-8859-1) one for unaccented Latin
        characters as well as the accented ones if the content is marked up as
        Polish.

        For example, on Mac OS X, which comes with a variety of CJK fonts,
        Mozilla can choose a Simplified Chinese font or a Traditional Chinese
        font instead of a Japanese font if the content is marked up as zh-CN or
        zh-TW respectively.

        In both cases the lang attributes do help in *practice*.

        Test: http://iki.fi/hsivonen/test/lang.htm8

        --
        Henri Sivonen
        hsivonen@iki.fi

        Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

        Comment

        • Henri Sivonen

          #79
          Re: Lang attribute values

          In article <Pine.LNX.4.53. 0401221912070.1 7353@ppepc56.ph .gla.ac.uk>,
          "Alan J. Flavell" <flavell@ph.gla .ac.uk> wrote:
          [color=blue][color=green]
          > > How do you suggest the font heuristics should work with UTF-8[/color]
          >
          > What's wrong with displaying Latin characters using the selected Latin
          > font? And so on.[/color]

          Sometimes there are different fonts for different subsets of the Latin
          Unicode repertoire.

          --
          Henri Sivonen
          hsivonen@iki.fi

          Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

          Comment

          • Alan J. Flavell

            #80
            Re: Lang attribute values

            On Tue, 27 Jan 2004, Henri Sivonen wrote:
            [color=blue][color=green][color=darkred]
            > > > publishing Polish or Chinese text?[/color][/color][/color]

            [J.Korpela:][color=blue][color=green]
            > > By using font properties in CSS.[/color]
            >
            > The usual advice from Prilop and Flavell is not to do that.[/color]

            Could I stress, though, that CJK is _not_ my field: I'm aware that
            they'd probably want to disambiguate Han unified characters. I'd have
            to take advice as to whether that's better achieved by language
            markup, font suggestions, or both, though.

            But regarding Polish, I think the bottom line is the one I gave
            before, although I added a few more words to it yesterday:

            It may be that you have the best of intentions, and you could indeed
            help some proportion of readers whose browsers are not set-up
            optimally, but you also risk causing real harm to some other
            proportion of readers. Conversely, readers who are having problems
            with displaying what is otherwise a properly-made i18n document on
            their browsers, provided of course that those browsers have been set
            up well for the writing systems in question, might be advised to try
            telling the browser to ignore document-specified font selection



            Even in two major browser/families (IE, and Mozilla+relativ es), on
            Windows, there are considerations which would lead to contradictory
            choices here, as far as _font_ proposals are concerned. Other
            browsers, unknown to us, may well suffer from other shortcomings. I'm
            suggesting it would be better not to take that risk.

            [examples snipped]
            [color=blue]
            > In both cases the lang attributes do help in *practice*.[/color]

            I'm quite willing to believe that, in practice.

            As you well know, that's not the same as proposing a named font (I'm
            just stressing that point in case any reader might have got confused
            as we switched the discussion from one to another).
            [color=blue]
            > Test: http://iki.fi/hsivonen/test/lang.htm8[/color]

            Sorry, I don't have any systems conveniently at hand where that shows
            the distinction that you're aiming to prove. I'll try to remember to
            try it when I can.

            Comment

            • Philip Newton

              #81
              Re: Lang attribute values

              On Mon, 26 Jan 2004 08:53:22 +0000 (UTC), "Jukka K. Korpela"
              <jkorpela@cs.tu t.fi> wrote:
              [color=blue]
              > Philip Newton <pne-news-200401@newton.d igitalspace.net > wrote:
              >[color=green]
              > > Or somebody writing Romanian who'd prefer to have his LATIN SMALL
              > > LETTER S WITH CEDILLAs display with comma below instead, since
              > > he's not writing Turkish.[/color]
              >
              > This particular issue is somewhat different, and - not surprisingly
              > - confused in its own way. According to a statement by the Romanian
              > standards institute, Romanian uses s with comma, not s with cedilla,
              > so they see this as a character difference, not glyph difference,
              > and s with comma has been added into Unicode for this reason, with
              > quite some handwaving.[/color]

              Similarly, Polish could claim that Polish uses, say, o with kreska, not
              o with acute - yet as far as I know, Unicode treats this as a glyph
              difference.


              says, for example,

              You might have heard that the acute accent is used in Polish
              language. Wrong! The Polish kreska in a 8 point face seems similar
              to acute but if you look closer, you'll discover that a Polish
              kreska, when designed according to the requirements of Polish
              typography, is differently shaped and placed than the usual acute.

              So who can tell, really? I suppose it often boils down to the pecking
              order. (That document also talks about language-specific glyph
              substituting.)

              Cheers,
              Philip
              --
              Philip Newton <nospam.newton@ gmx.li>
              That really is my address; no need to remove anything to reply.
              If you're not part of the solution, you're part of the precipitate.

              Comment

              Working...