Lang attribute values

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jukka K. Korpela

    #61
    Re: Lang attribute values

    Bertilo Wennergren <bertilow@gmx.n et> wrote:
    [color=blue]
    > You should be aware that "Arial Unicode MS" can be installed on
    > Linux systems, but that on many such systems it will fail to render
    > any italics. So suggesting that font might disable italics for some
    > users.[/color]

    Sounds bad. But I would classify it as a browser error, no matter what
    the actual causes are. Such a situation will create problems without my
    help too, since if someone installs the font, he probably intends to
    use it at least casually, and he can himself tell his browser to use it
    as a default font.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

    Comment

    • Jukka K. Korpela

      #62
      Re: Lang attribute values

      Philip Newton <pne-news-200401@newton.d igitalspace.net > wrote:
      [color=blue]
      > Or if you want to have d-with-caron; you often can't use U+010F
      > LATIN SMALL LETTER D WITH CARON since this will typically have a
      > glyph with apostrophe after rather than caron above due to Czech
      > and Slovak typesetting habits (if I interpret the comment in the
      > Unicode standard correctly).[/color]

      If you have d with caron, then U+010F is the correct character.
      It is true that the appearance of the character usually has an
      apostrophe on the right of it rather than a caron above it, but this is
      glyph variation, which does not change the identity of a character.
      [color=blue]
      > But what if I'm not typesetting Czech
      > or Slovak, but a language which uses d-with-caron?[/color]

      You mean "which uses d-with-caron that should be displayed in a manner
      different from the usual one"?
      [color=blue]
      > (This is a real
      > example, though the language in question is not a natlang.)[/color]

      If it's a conlang, it'll probably lack a registered language code, so
      the use of a lang attribute would be somewhat pointless. Besides, the
      language should have been designed to use a different character, if the
      distinction is essential.

      On the practical side, using font settings directly is surely the way
      that has much better chances of creating the desired appearance than
      using lang="x-fictitional-martian" in the hope of encountering browsers
      that think "oh, so this not Czech or Slovak but some unknown language,
      maybe I should find a font where the diacritic really looks like a
      caron". :-)

      --
      Yucca, http://www.cs.tut.fi/~jkorpela/
      Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

      Comment

      • Bertilo Wennergren

        #63
        Re: Lang attribute values

        Jukka K. Korpela:
        [color=blue]
        > Bertilo Wennergren <bertilow@gmx.n et> wrote:[/color]
        [color=blue][color=green]
        >> You should be aware that "Arial Unicode MS" can be installed on
        >> Linux systems, but that on many such systems it will fail to render
        >> any italics. So suggesting that font might disable italics for some
        >> users.[/color][/color]
        [color=blue]
        > Sounds bad. But I would classify it as a browser error, no matter what
        > the actual causes are.[/color]

        It's a problem with the operating system, or rather with it's font
        handling. TTF fonts get italics only if there is a separate italic font
        variant.
        [color=blue]
        > Such a situation will create problems without my
        > help too, since if someone installs the font, he probably intends to
        > use it at least casually, and he can himself tell his browser to use it
        > as a default font.[/color]

        He might have it around for some special uses, intending it to be used
        only when he himself chooses it. That's the case for me. I have
        obviously not chosen it as a font to be used by default for any encoding
        in my browser.

        --
        Bertilo Wennergren <bertilow@gmx.n et> <http://www.bertilow.co m>

        Comment

        • Philip Newton

          #64
          Re: Lang attribute values

          On Sun, 25 Jan 2004 20:35:27 +0000 (UTC), "Jukka K. Korpela"
          <jkorpela@cs.tu t.fi> wrote:
          [color=blue]
          > Philip Newton <pne-news-200401@newton.d igitalspace.net > wrote:
          >[color=green]
          > > Or if you want to have d-with-caron; you often can't use U+010F
          > > LATIN SMALL LETTER D WITH CARON since this will typically have a
          > > glyph with apostrophe after rather than caron above due to Czech
          > > and Slovak typesetting habits (if I interpret the comment in the
          > > Unicode standard correctly).[/color]
          >
          > If you have d with caron, then U+010F is the correct character.[/color]

          Indeed.
          [color=blue]
          > It is true that the appearance of the character usually has an
          > apostrophe on the right of it rather than a caron above it, but this
          > is glyph variation, which does not change the identity of a
          > character.[/color]

          True.

          I was reacting to Henri Sivonen saying

          : I can't find a politically correct way of saying this, but there's
          : are pecking orders of language groups within scripts in terms of
          : font availability and quality. It's unfortunate.
          :
          : For example Polish looks ugly if some glyphs come from a "Western"
          : font and others come from a "Central European" font.

          to point out that Czech and Slovak appear to be higher on the pecking
          order here as well, and tend to impose their typographical preferences
          on others (not directly, but by choice of those who design the fonts).

          Similarly to how somebody writing Polish and who'd prefer a more acutely
          sloped accent on his ó will have difficulties due to other languages'
          preferences.

          Or somebody writing Romanian who'd prefer to have his LATIN SMALL LETTER
          S WITH CEDILLAs display with comma below instead, since he's not writing
          Turkish. (Some fonts I have display s with cedilla but t with comma
          below, which probably looks extra weird in Romanian: I can imagine it'd
          be better or more consistent to have both characters appear similar [if
          wrong] than to have them appear different.)
          [color=blue][color=green]
          > > But what if I'm not typesetting Czech or Slovak, but a language
          > > which uses d-with-caron?[/color]
          >
          > You mean "which uses d-with-caron that should be displayed in a
          > manner different from the usual one"?[/color]

          I mean "which uses d-with-caron that should be displayed in a manner
          different from the Czech and Slovak one". Using "usual" rather depends
          on the context.

          But I suppose that's quibbling. So yes, I suppose I agree. Yes, a
          language which uses d-with-caron that can be displayed as d-with-caron
          or d-with-circumflex (e.g. in some handwriting styles), but not with
          d-with-apostrophe.
          [color=blue]
          > Besides, the language should have been designed to use a different
          > character, if the distinction is essential.[/color]

          Hm? d-with-caron is the correct character. I'm saying that it's
          difficult to get a font showing an appropriate glyph due to pecking
          order constraints that determine which language decides what "the"
          reference glyph looks like. But the character is unambiguously LATIN
          SMALL LETTER D WITH CARON, alongside several other letters with caron
          which display correctly (e.g. C, R, or S).

          Cheers,
          Philip
          --
          Philip Newton <nospam.newton@ gmx.li>
          That really is my address; no need to remove anything to reply.
          If you're not part of the solution, you're part of the precipitate.

          Comment

          • Jukka K. Korpela

            #65
            Re: Lang attribute values

            Philip Newton <pne-news-200401@newton.d igitalspace.net > wrote:
            [color=blue]
            > Similarly to how somebody writing Polish and who'd prefer a more
            > acutely sloped accent on his ó will have difficulties due to other
            > languages' preferences.[/color]

            I think we agree on the principle that language information could be
            relevant to optimal selection of fonts - and this is among the
            officially listed benefits of lang markup. But I see this as rather
            marginal, mostly on practical grounds, since there's very little in the
            direction of supporting this idea, and browsers' attempts at using lang
            markup in font selection are basically wrong.

            By the way, we can't really blame browser vendors too much. How many
            people actually use lang markup? How many do it _right_? (I'm afraid
            there are page editors that routinely add lang="en" without telling
            their user or anyone else.) Besides, the specifications are vague.
            And at the top of the foolishness, HTML 4 has lang, XHTML 1 adds
            xml:lang, so maybe we should use both, except that lang appears to be
            getting deprecated. Yet, if any software actually utilizes language
            markup, I would expect it to know lang more probably than xml:lang.
            [color=blue]
            > Or somebody writing Romanian who'd prefer to have his LATIN SMALL
            > LETTER S WITH CEDILLAs display with comma below instead, since he's
            > not writing Turkish.[/color]

            This particular issue is somewhat different, and - not surprisingly -
            confused in its own way. According to a statement by the Romanian
            standards institute, Romanian uses s with comma, not s with cedilla, so
            they see this as a character difference, not glyph difference, and
            s with comma has been added into Unicode for this reason, with quite
            some handwaving.

            --
            Yucca, http://www.cs.tut.fi/~jkorpela/
            Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

            Comment

            • Neal

              #66
              Re: Lang attribute values

              On Mon, 26 Jan 2004 08:53:22 +0000 (UTC), Jukka K. Korpela
              <jkorpela@cs.tu t.fi> wrote:

              [color=blue]
              > And at the top of the foolishness, HTML 4 has lang, XHTML 1 adds
              > xml:lang, so maybe we should use both, except that lang appears to be
              > getting deprecated.[/color]

              The XHTML spec says to use both, and xml:lang takes preference. But I
              can't see why they changed the attribute.

              Is there difference in the syntax of lang and xml:lang? Is there a reason
              lang could not also be used in XHTML?

              Comment

              • Jukka K. Korpela

                #67
                Re: Lang attribute values

                Neal <neal413@spamrc n.com> wrote:
                [color=blue][color=green]
                >> And at the top of the foolishness, HTML 4 has lang, XHTML 1 adds
                >> xml:lang, so maybe we should use both, except that lang appears to
                >> be getting deprecated.[/color]
                >
                > The XHTML spec says to use both, and xml:lang takes preference.[/color]

                The XHTML 1.0 spec says so, but XHTML 1.1 has removed lang. On the
                other hand, XHTML 1.0 is mostly an exercise in futility, and XHTML 1.1
                is at least 1.1 times that. But the XHTML 2.0 draft, too, has xml:lang
                only.
                [color=blue]
                > But I can't see why they changed the attribute.[/color]

                To make the world safe for XML. Someone invented the idea that many XML
                based markup systems should have an attribute for specifying the
                language, so they defined xml:lang. Don't ask me why it needs to be
                prefixed. If they wanted to make it a reserved attribute, so that no
                XML based system should ever define a lang attribute except for a
                particular purpose with a particular syntax and meaning, they could
                have said that. But they were lost in namespace and couldn't say it
                without invoking "namespaces ".
                [color=blue]
                > Is there difference in the syntax of lang and xml:lang?[/color]

                No.
                [color=blue]
                > Is there a reason lang could not also be used in XHTML?[/color]

                Well it _can_ be used in XHTML. There is no formal prohibition in XML
                against using any attribute name you like for language information, but
                if you read between the lines,

                effectively tells that if you have an attribute for language, you had
                better use xml:lang. There's no particular reason for the XML spec to
                contain that part otherwise, since it does _not_ automatically make
                xml:lang part of XML itself. It says: "In valid documents, this
                attribute, like any other, must be declared if it is used."

                --
                Yucca, http://www.cs.tut.fi/~jkorpela/
                Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

                Comment

                • Andreas Prilop

                  #68
                  Re: Lang attribute values

                  Philip Newton <pne-news-200401@newton.d igitalspace.net > wrote:
                  [color=blue]
                  > Is "thuluth" what is sometimes called "sülüs"?[/color]

                  Zat's ze Turkish vay of spelling Arabic vords. ;-)

                  Comment

                  • Andreas Prilop

                    #69
                    Re: Lang attribute values

                    "Alan J. Flavell" <flavell@ph.gla .ac.uk> wrote:
                    [color=blue][color=green]
                    >> The language attribute in HTML also has an influence: some examples
                    >> are shown on my page.[/color]
                    >
                    > Please accept my apologies on this particular point. I now realise I
                    > was misremembering _that_ specific behaviour: it was in fact seen in
                    > Mozilla, not MSIE.[/color]

                    To compensate for this short-coming, I can offer you a dependency
                    of Internet Explorer on the DIR attribute. ;-)
                    <http://www.unics.uni-hannover.de/nhtcapri/temp/percent.html>
                    <http://www.unics.uni-hannover.de/nhtcapri/temp/percent.html6>
                    I have not yet fully understood this bug. Can you reproduce it?
                    You need to define some typeface without Arabic glyphs (such as
                    Verdana) as "Latin-preferred typeface". Then IE fails to display
                    the Arabic percent sign in some instances.

                    Comment

                    • Andreas Prilop

                      #70
                      Re: Lang attribute values

                      "Jukka K. Korpela" <jkorpela@cs.tu t.fi> wrote:
                      [color=blue]
                      > Does someone really think that a new version of the German language has
                      > been or is being created by the orthography reform that was officially
                      > started in 1998?[/color]

                      The differences are neglectable as compared with the differences
                      between en-GB-Oxford and en-US-Usenet. ;-)

                      Comment

                      • Andreas Prilop

                        #71
                        Re: Lang attribute values

                        "Jukka K. Korpela" <jkorpela@cs.tu t.fi> wrote:
                        [color=blue][color=green]
                        >> You should be aware that "Arial Unicode MS" can be installed on
                        >> Linux systems, but that on many such systems it will fail to render
                        >> any italics. So suggesting that font might disable italics for some
                        >> users.[/color]
                        >
                        > Sounds bad. But I would classify it as a browser error, no matter what
                        > the actual causes are.[/color]

                        That is subject to debate. Many people consider it an error of a
                        word-processing/layout/drawing program when it fakes an italic style
                        where no true italic font is available. Some (Mac & Windows) programs
                        actually don't let you choose "bold" or "italic" if the typeface has
                        no bold or italic font.

                        Comment

                        • Andreas Prilop

                          #72
                          Re: Lang attribute values

                          "Jukka K. Korpela" <jkorpela@cs.tu t.fi> wrote:
                          [color=blue]
                          > (I'm afraid
                          > there are page editors that routinely add lang="en" without telling
                          > their user or anyone else.)[/color]

                          MS Word and other Microsoft programs do this on the basis of the
                          current keyboard layout. When I type English text using a German
                          keyboard layout, MS Word includes "\lang1031" , i.e. language=German .
                          (You can check this by saving your documents in RTF.)

                          Comment

                          • Alan J. Flavell

                            #73
                            Re: Lang attribute values

                            On Mon, 26 Jan 2004, Andreas Prilop wrote:
                            [color=blue]
                            > To compensate for this short-coming, I can offer you a dependency
                            > of Internet Explorer on the DIR attribute. ;-)
                            > <http://www.unics.uni-hannover.de/nhtcapri/temp/percent.html>
                            > <http://www.unics.uni-hannover.de/nhtcapri/temp/percent.html6>
                            > I have not yet fully understood this bug. Can you reproduce it?[/color]

                            It seems so. To start the test, I picked, as Latin font, the first
                            font on the alphabetical list that I had on this win2k system, which
                            happens to be "Albertus Extra Bold", whose properties are reported to
                            be:

                            Supported Unicode Ranges:
                            ^ (yup, this area is completely empty!)

                            Supported code pages:

                            1252 Latin 1
                            1250 Latin 2:East Europe
                            1254 Turkish
                            1257 Windows Baltic.
                            [color=blue]
                            > You need to define some typeface without Arabic glyphs (such as
                            > Verdana) as "Latin-preferred typeface".[/color]

                            I guess that meets your criteria. No Arabic unicode ranges nor "code
                            pages".
                            [color=blue]
                            > Then IE fails to display the Arabic percent sign in some instances.[/color]

                            Indeed: I'm looking at the utf-8 version...

                            On the left-aligned lines, it's shown as an empty box on
                            line 3. On the right-aligned lines, it's shown as an empty box on
                            lines 2 and 3.

                            How odd. Mine is IE6 version 6.0.2800.1106 with SP1 and a couple of
                            Q-numbers, for the record.

                            On the 8859-6 version, on the other hand, they all show as
                            (Arabic-looking) percent signs.

                            OK then, some other font choices:

                            * Verdana: same results as above.

                            * Lucida Sans Unicode: same (it doesn't have Arabic)

                            * Palatino Linotype: WOOPS!!! Instead of empty boxes, it displays
                            fleurs-de-lys in place of the missing percent signs!!!


                            And then fonts which contain Arabic:

                            * Arial Unicode MS: all percent signs are shown (you knew that!).

                            * Code2000: same.


                            Btw, just for the record, let's see what the other fonts were at
                            the time, working down the registry list in numerical order till
                            we get to Arabic:

                            Greek: Arial Unicode MS
                            Cyrillic: ditto
                            Armenian: Code2000
                            Hebrew: Lucida Sans Unicode

                            Nothing deliberate - just the relics of earlier tests.

                            Comment

                            • Andreas Prilop

                              #74
                              Re: Lang attribute values

                              Mad Bad Rabbit <madbadrabbit@y ahoo.com> wrote:
                              [color=blue][color=green]
                              >> body { font-family: "Arial Unicode MS"; }[/color]
                              >
                              > Wouldn't it be safer to leave <body> alone,[/color]

                              Yes!
                              [color=blue]
                              > and only suggest
                              > an alternate font-family for parts of the document known to
                              > contain the problematic characters?[/color]

                              Perhaps.
                              [color=blue]
                              > For example, if I'm composing a Bible-study page that has a
                              > few scattered Greek words, oughtn't it just use:
                              > span.polytonic { font-family: "Palatino Linotype" }[/color]

                              I used to object _any_ typeface specification in HTML, thus following
                              <http://ppewww.ph.gla.a c.uk/~flavell/charset/browsers-fonts.html#dont >
                              But now I've done it myself for the poor souls using Internet Explorer.
                              <http://www.unics.uni-hannover.de/nhtcapri/urdu-alphabet.html>

                              Comment

                              • Jukka K. Korpela

                                #75
                                Re: Lang attribute values

                                Andreas Prilop <nhtcapri@rrz n-user.uni-hannover.de> wrote:
                                [color=blue][color=green][color=darkred]
                                >>> You should be aware that "Arial Unicode MS" can be installed on
                                >>> Linux systems, but that on many such systems it will fail to
                                >>> render any italics. So suggesting that font might disable italics
                                >>> for some users.[/color]
                                >>
                                >> Sounds bad. But I would classify it as a browser error, no matter
                                >> what the actual causes are.[/color]
                                >
                                > That is subject to debate. Many people consider it an error of a
                                > word-processing/layout/drawing program when it fakes an italic
                                > style where no true italic font is available.[/color]

                                It's theoretically subject to debate, now that HTML 2.0 is just
                                history. The good old spec _required_ that browsers render <em> and
                                <strong> as distinct from each other and from normal text. But that's
                                still pretty much the idea, is it not? So if a browser simply decides
                                not to italicize or slant text in <em> because the font in use is
                                e.g. Arial Unicode MS, then it's its responsibility to figure out
                                something else to make the difference.

                                --
                                Yucca, http://www.cs.tut.fi/~jkorpela/
                                Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

                                Comment

                                Working...