Arbitrary definition of class names by user agents

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steven Simpson

    Arbitrary definition of class names by user agents

    Stefan Ram wrote (in "More than one language in a page"):
    In this case, one might even use Google's new attribute value:
    >
    <p lang="en">The word
    <q><span lang="fr" class="notransl ate">chef</span></q>
    is of French origin.</p>
    >
    See
    >
    http://googlewebmastercentral.blogsp...ge-barrier.htm
    Is this a new trend of user-agent writers (Microformats, and now Google)
    staking claims on the @class namespace? I'm surely not the only one
    disturbed by this. Somehow, an author publishing on the web, with no
    control over which user agents will access his page, has to avoid
    clashes with the union of all names deemed special by all those user
    agents, now and in the future?

    I suppose the proponents justify this practice by a line in the HTML
    spec (HTML4.01 §7.5.2), that class names are also for "general purpose
    processing by user agents" as well as stylesheet selectors. It doesn't
    go into any further detail, but I don't think it was the intention that
    applications which the author has no control over (e.g. once a page is
    published) should define class names willy-nilly. More likely, the
    author would have opted in to some scheme, such as a company's internal
    robot to do some advanced indexing on all its own pages.

    Here are some ideas for external interpretation, i.e. by some 'third
    party' such as Google:

    * Opt in to a third party's scheme. Register ones URIs with Google,
    so they know that 'notranslate' means what they think on those
    pages. I don't fancy doing that with a lot of third parties, though.
    * Third parties register class names with an authority (e.g. W3C).
    But still, authors have to watch out for future uses of names.
    And third parties shouldn't have to register with W3C when they've
    already registered (for example) DNS names.
    * Define a sub-namespace not used by CSS to form DNS-like names,
    e.g. ':com:google:no translate'. Okay, but potentially verbose if
    used a lot. And it doesn't generally sidestep non-CSS mechanisms
    of defining class names.
    * Use head/@profile with a URI owned by the third party. This is
    what Microformats seem to be doing, but I don't think it is
    adequate. Independent microformats used in the same page still
    have to avoid clashing with each other, which means going back to
    some authority's third-party register. Plus, the author doesn't
    have control over the class names - it's all or nothing for a
    particular format.
    * Extend CSS with properties not related to style. There's nothing
    in the framework of CSS that limits it to just style (right?). I
    favour this, and shall elaborate on it...

    Google could define a CSS property which turns translation on or off,
    and the author could associate any class he chooses (indeed, any CSS
    selector) with that property:

    .notranslate { // Okay, so he chose the same one after all! ;-)
    -google-translation: disable;
    }

    Then, to avoid Google having to scan his stylesheets just to find this
    rule, the author links it in with:

    <link rel="stylesheet " media="translat or" href="...">

    Other user agents won't touch it, because they don't recognise
    "translator ". Google won't touch other stylesheets because they're not
    labelled with "translator ".

    A few issues raised by this approach are:

    * It's not style/presentation, which is what CSS was designed for.
    But I think this is a superficial problem - just regard the name
    "CSS" and rel="stylesheet " as historical accidents, and CSS
    becomes an application of arbitrary properties, that happens to
    include ones related to style.
    * It's now invading the CSS-property and media-type namespaces. But
    both of these could go the same way as XML namespaces and
    link/@rel schemas, if necessary.

    To summarise: Rather than user agents stomping over the heretofore
    author-defined namespace of class names, they should fit into it in the
    same way that CSS properties do. This would scale better, and would be
    less intrusive on the author's ability to choose.
  • Jukka K. Korpela

    #2
    Re: Arbitrary definition of class names by user agents

    Steven Simpson wrote:
    Is this a new trend of user-agent writers (Microformats, and now
    Google) staking claims on the @class namespace?
    It surely is, and all the warnings seem to get ignored. The idea of
    assigning fixed meanings to class names sounds _so_ cool and useful, and you
    don't need anybody's permission or time-wasting discussions!

    And it probably looks obvious that "notranslat e" won't accidentally be used
    for something else by someone else, so it looks safe to define it as you
    like. It might be different with shorter and more vague class names like
    "date" - does it refer to date notations, or dating, or something else? You
    cannot possibly know what the string "date" might intuitively mean to
    billions of people speaking hundreds of different languages. So by
    declaring, say, "date" as predefined, you would assign arbitrary meanings to
    an unknown number of constructs in documents, meanings that need not have
    anything to do with the intentions of their authors.

    In fact, "notranslat e" is potentially very risky too. It is true that in any
    existing document, it probably relates to someone's intentions of not having
    something translated. But it might also mean that something _has not_ been
    translated. Or it might mean 'do not translate (the content)' in a very
    specific and limited technical meaning, _not_ a universal declaration that
    the content should not be translated. For example, in some bilingual site
    maintenance approach, it might be an instruction to human translators to
    leave the content untranslated, since it shall be the same in both
    languages - without meaning that it should be the same in _all_ languages.

    The only sensible approach in using class attributes for purposes like
    "notranslat e" in the Google technique would have been to use a class name
    that is syntactically malformed by existing specifications. That way, no
    legitimate existing usage of the string as class attribute would have been
    affected.

    Even better, a new attribute (or element) should have been introduced.

    Someone might say that from the viewpoint of generalized markup, a
    processing instruction might have been the most adequate approach. But
    generalized markup is water under the bridge, and we live with tag sets that
    everyone can use as he likes and sees fit.

    And on the realistic side, translation instructions should not really be
    merged into markup. They are process-oriented, not data-oriented or
    structure-oriented. You typically have words or phrases that should not be
    translated, and would you really like to be forced to add
    non-translatability markup into each and every occurrence in each document,
    instead of having e.g. a site-wide glossary of terms that specifies them,
    among other things?

    Besides, the most common case for non-translatability that I can imagine
    right now is English words and phrases in non-English text. For them, common
    sense might say that it should suffice to declare their language as English.
    When translating, say, some text from Dutch to French, you are normally not
    supposed to translate any English words and phrases in them. If they are OK
    in the original, they're usually the right choice in the translation as
    well. So the only thing needed would be language markup.
    Google could define a CSS property which turns translation on or off,
    That would be even more wrong than using "predefined " class names, since
    translation issues are not presentational in the sense that CSS is supposed
    to be.
    * It's not style/presentation, which is what CSS was designed for.
    But I think this is a superficial problem - just regard the name
    "CSS" and rel="stylesheet " as historical accidents, and CSS
    becomes an application of arbitrary properties, that happens to
    include ones related to style.
    Excuse me while fall into despair.
    To summarise: Rather than user agents stomping over the heretofore
    author-defined namespace of class names, they should fit into it in
    the same way that CSS properties do.
    I cannot recognize parody any more, sorry.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/

    Comment

    • Ben Bacarisse

      #3
      Re: Arbitrary definition of class names by user agents

      "Jukka K. Korpela" <jkorpela@cs.tu t.fiwrites:
      Steven Simpson wrote:
      >
      >Is this a new trend of user-agent writers (Microformats, and now
      >Google) staking claims on the @class namespace?
      <snip>
      In fact, "notranslat e" is potentially very risky too. It is true that
      in any existing document, it probably relates to someone's intentions
      of not having something translated. But it might also mean that
      something _has not_ been translated. Or it might mean 'do not
      translate (the content)' in a very specific and limited technical
      meaning, _not_ a universal declaration that the content should not be
      translated. For example, in some bilingual site maintenance approach,
      it might be an instruction to human translators to leave the content
      untranslated, since it shall be the same in both languages - without
      meaning that it should be the same in _all_ languages.
      Agreed. It could also relate to the other meaning of "translate" --
      the geometric one. A paragraph which is to be left in its normal
      position, not translated in any direction, might well be marked
      "notranslat e".

      --
      Ben.

      Comment

      • Steven Simpson

        #4
        Re: Arbitrary definition of class names by user agents

        Jukka K. Korpela wrote:
        Steven Simpson wrote:
        >Google could define a CSS property which turns translation on or off,
        >
        That would be even more wrong than using "predefined " class names,
        since translation issues are not presentational in the sense that CSS
        is supposed to be.
        >
        > * It's not style/presentation, which is what CSS was designed for.
        > But I think this is a superficial problem - just regard the name
        > "CSS" and rel="stylesheet " as historical accidents, and CSS
        > becomes an application of arbitrary properties, that happens to
        > include ones related to style.
        >
        Excuse me while fall into despair.
        What's wrong? I'm not suggesting that we abandon the distinction
        between content and presentation, merely recognising that only two
        things constrain CSS technically to presentation:

        * the set of properties defined by various specs,
        * the media type/query filter,

        ....and by extending these together, you get a framework still capable of
        separating presentation from content, but also capable of separating
        other kinds of (erm) "interpretation " from content.

        Looking at it another way, if you wanted to devise a framework for the
        latter separation, you could easily come up with one identical to that
        used for the former, except that:

        * the file format's property set would differ from CSS's,
        * you'd have a different set of @media,
        * you wouldn't call the format CSS,
        * your @rel type wouldn't mention 'style'.

        It would be technically sufficient to continue using @rel="styleshee t",
        and rely on @media to distinguish between presentation and 'other kinds
        of interpretation' . But if that really is a problem, just use
        @rel="propertys heet".

        Comment

        • Harlan Messinger

          #5
          Re: Arbitrary definition of class names by user agents

          Jukka K. Korpela wrote:
          Steven Simpson wrote:
          >
          >Is this a new trend of user-agent writers (Microformats, and now
          >Google) staking claims on the @class namespace?
          >
          It surely is, and all the warnings seem to get ignored. The idea of
          assigning fixed meanings to class names sounds _so_ cool and useful, and
          you don't need anybody's permission or time-wasting discussions!
          >
          And it probably looks obvious that "notranslat e" won't accidentally be
          used for something else by someone else, so it looks safe to define it
          as you like. It might be different with shorter and more vague class
          names like "date" - does it refer to date notations, or dating, or
          something else? You cannot possibly know what the string "date" might
          intuitively mean to billions of people speaking hundreds of different
          languages. So by declaring, say, "date" as predefined, you would assign
          arbitrary meanings to an unknown number of constructs in documents,
          meanings that need not have anything to do with the intentions of their
          authors.
          >
          In fact, "notranslat e" is potentially very risky too. It is true that in
          any existing document, it probably relates to someone's intentions of
          not having something translated. But it might also mean that something
          _has not_ been translated. Or it might mean 'do not translate (the
          content)' in a very specific and limited technical meaning, _not_ a
          universal declaration that the content should not be translated. For
          example, in some bilingual site maintenance approach, it might be an
          instruction to human translators to leave the content untranslated,
          since it shall be the same in both languages - without meaning that it
          should be the same in _all_ languages.
          >
          The only sensible approach in using class attributes for purposes like
          "notranslat e" in the Google technique would have been to use a class
          name that is syntactically malformed by existing specifications. That
          way, no legitimate existing usage of the string as class attribute would
          have been affected.
          If Google had specified class="google:n otranslate" in place of
          class="notransl ate", despite the lack of any intrinsic significance of
          the x: in class names it would have gone a long way toward eliminating
          potential conflict.

          Comment

          Working...