non-breaking hyphen

**Jukka K. Korpela** · Jul 23 '05, 10:41 PM

Re: non-breaking hyphen

Lachlan Hunt <spam.my.gspot@ gmail.com> wrote:
[color=blue][color=green]
>> a/b says that a/b is a unit of information where all
>> characters belong together.[/color]
>
> That sounds like your just trying to apply semantics to an element
> that is defined as purely presentational.[/color]

That's because you have already decided so. Think about
rmdir /foo
versus
rmdir / foo
Is the difference purely presentational? That's what the Unicode
consortium thinks, when it allows the first expression to be divided as
rmdir /
foo
[color=blue][color=green]
>> It's surely _more_ semantic than the W3C approach which moves us
>> down to the character level.[/color]
>
> It depends. Some situations may be more appropriately marked up
> using elements, and others may be better left at the character level.[/color]

Moving it to character level means that presentational features have been
wired in into the document's textual content. Isn't this worse than
wiring it in into markup around the content? Things may change, of
course, if we regards line breaking issues as potentially belonging to
logical structure or semantics.
[color=blue]
> There are also a huge number of situations where I might want bold
> text.[/color]

Not really. Ignoring headings, table cells and things like that and
considering inline emphasis only, the odds are that the reason for
bolding text is strong emphasis. Whether this is too coarse a concept is
an interesting question, but it corresponds to the element.
Except for a small number of special cases, is just the vulgar way of
writing (and the original designers of HTML should be blamed for
this - _they_ decided to make the logical alternative's name five times
as long as the physical alternative's name).
[color=blue]
> I find the news: URIs more useful since clicking on one will
> automatically launcy my newsreader for me[/color]

It won't launch any newsreader unless the browser has been configured to
use one - and this is normally _not_ handled in any default settings.
[color=blue][color=green][color=darkred]
>>>In this case, <code> seems most approprate.[/color]
>>
>> Is the name computer code? I think it's a borderline case, and I
>> think you are just interpreting the semantics of <code> very freely[/color]
>
> Yes, it was a very loose interpretation, however <code> is very
> loosely defined in the spec.[/color]

We agree on that, though maybe for different values of "very". But the
reason for your choosing it was that you felt that you _needed_ _any_
element that you can regard as logical. That is, in an attempt to avoid
 and , you would have picked up virtually anything, even an
element that you wouldn't have dreamt of otherwise.

But it's not necessarily a bad choice.
[color=blue]
> It just states that it is a fragment of
> comoputer code, and I interpreted that very loosely as content that
> can be processed in some meaningful way be a computer.[/color]

That would mean that anything is <code>, wouldn't it? Surely you can feed
any text into a computer and process it in some meaningful way.

But a newsgroup name could be marked up as <code> because it is "computer
code" in the sense of having been _defined_ separately for use as input
to computer software, as an identifier of a group. This becomes more
obvious, perhaps, if you think how newsgroup names often have to be
distorted from the natural language expressions that they have been
derived from, e.g. by dropping accents away.

On the practical side, some automatic translation software (BabelFish)
treats text inside <code> as a literal string that remains invariant in
translation. And this is very natural and very desirable, since if we
have, say, some text about Unix, mentioning the <code>cat</code> command,
then we don't want that "cat" to become "chat" when translating into
French.
[color=blue][color=green]
>> in order to avoid the inevitable conclusion: in the great majority
>> of cases, the real alternative to is , which by
>> definition lacks _all_ semantics.[/color]
>
> As does , so in a sense you are correct.[/color]

No it doesn't. Even if you regard as purely presentational,
marking something with says _more_ than marking it with .
Just as says more than . The
former says, loosely speaking, 'here we have an element with undefined
meaning, but the preferred visual rendering is bold'. It does not say
what the meaning is, but it may give a hint.
[color=blue]
> Classes can be used to give author defined semantics, even to
> semantically empty elements.[/color]

What author defined semantics? The class name has no meaning; it is
simply a string. The author may have something in his mind, and someone
reading the source code might get a hint if he happens to know the
natural language from which the name had been taken. But this is
different from the hint given by (or by , even if you regard it
as presentational only), as defined by the _markup language_.
Would you understand the author defined semantics of
class="lauseke" or class="korostus "?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

**Henri Sivonen** · Jul 23 '05, 10:41 PM

Re: non-breaking hyphen

In article <Xns95A0E6BA543 79jkorpelacstut fi@193.229.0.31 >,
"Jukka K. Korpela" <jkorpela@cs.tu t.fi> wrote:
[color=blue]
> Similar considerations might even apply to
> the use of   - which is universally supported by browsers but which
> still might cause problems in cut & paste operations for example, since
> it is by definition a character distinct from the space character.
>
> Using avoids the problem.[/color]

If having a non-breaking space is important, isn't it important to copy
it as well?

--
Henri Sivonen
hsivonen@iki.fi

Henri Sivonen's pages

http://iki.fi/hsivonen/

Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html

**David Ross** · Jul 23 '05, 10:41 PM

Re: non-breaking hyphen

The Bicycling Guitarist wrote:[color=blue]
>
> Hi. I found the following when trying to learn if there is such a thing as a
> non-breaking hyphen. Apparently Unicode has a ‑ but that is not
> well-supported, especially in older browsers. Somebody somewhere said:
>
> Alternately, you can use CSS to declare a class having:
>
> .nowrap { white-space:nowrap }
>
> ... and then wrap the compound word in a tag (or
> any other suitable inline tag). You can also try { white-space:pre } ...
>
> I wasn't sure where to post this, because part of the question is about the
> character entity that apparently is NOT defined in html? However, what about
> the CSS idea for non-wrapping? On one of my pages
> www.TheBicyclingGuitarist.net/newstuff.htm I give credit to some folks at
> comp.infosystem s.www.authoring.site-design. I want the hyphen in between
> site and design to be a non-breaking one.[/color]

I don't understand. In Mozilla, hyphens (−, ISO 8859-1
-, 0x2D) are non-breaking.

--

David E. Ross
<http://www.rossde.com/>

I use Mozilla as my Web browser because I want a browser that
complies with Web standards. See <http://www.mozilla.org/>.

**Jukka K. Korpela** · Jul 23 '05, 10:41 PM

Re: non-breaking hyphen

David Ross <nobody@nowhere .not> wrote:
[color=blue]
> I don't understand. In Mozilla, hyphens (−, ISO 8859-1
> -, 0x2D) are non-breaking.[/color]

The entity reference − denotes the minus sign, which is not a
hyphen at all.

ISO 8859-1 is irrelevant here.

The character reference - denotes the hyphen-minus character
(Ascii hyphen), and 0x2D is a common way of mentioning its hexadecimal
code in several standards.

The Unicode line breaking rules allow a line break after a hyphen-minus
character, and IE (and Opera) applies this principle. The problem we are
discussing is that such breaks are often undesirable.

The rules don't imply that a program _must_ break a line after a
hyphen-minus character in any particular occasion. But IE (and Opera)
rather mechanically breaks after it.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

**Eric B. Bednarz** · Jul 23 '05, 10:41 PM

Re: non-breaking hyphen

Lachlan Hunt <spam.my.gspot@ gmail.com> writes:
[color=blue]
> Using is tag soup, so that's just being hypocritical.[/color]

Your favourite in-itself-hypocritical spec-of-the-week-club aside, it's
no more tag soup than PI substitutes like BR or HR or mystery-meat
attributes like WIDTH and HEIGHT.

--
| ) PiÃ¹ Cabernet,
-( meno Internet.
| ) http://bednarz.nl/

**Malcolm Dew-Jones** · Jul 23 '05, 10:42 PM

Re: non-breaking hyphen

Jukka K. Korpela (jkorpela@cs.tu t.fi) wrote:
: Lachlan Hunt <spam.my.gspot@ gmail.com> wrote:

: >> It's surely _more_ semantic than the W3C approach which moves us
: >> down to the character level.
: >
: > It depends. Some situations may be more appropriately marked up
: > using elements, and others may be better left at the character level.

: Moving it to character level means that presentational features have been
: wired in into the document's textual content. Isn't this worse than
: wiring it in into markup around the content?

Thank you for those words. That was what I was trying to get at.

No matter what the desirable semantics of html might be, it still seems
backwards to me that the "low level logic" of the individual symbols used
in the document could have more control over the presentation than the
"high level logic" of the markup language used by a tool that is very much
concerned with the presentation details of the document.

In another part of this thread I said[color=blue][color=green]
>> Surely they don't have any applicability in any text except as the
>> application chooses them to have applicability.[/color][/color]

and Alan J. Flavell responded
[color=blue]
>That looks like a tautology to me![/color]

What I meant was that the unicode standard, in my opinion, should not
define anything but the mapping of character values to the characters name
and an acceptable glyph for it. Everything else should be handled via
higher level logic. The exact details appropriate to that higher logic
would depend on the technologies being used, not on unicode. At least
that is my opinion after seeing how complex the whole issue of unicode
seems to have become, compared to the simple simple simple original idea
of solving character set problems by defining a new standard character set
that simply defined far more characters than the old standard ascii
character set and enabled this by simply requiring computers and software
to use more bits per character.

Well ok, for various reasons the characters appear to need to be able to
indicate certain things such as line breaks, but even that level of
formatting information in the character set should be de-supported, except
to define sets of reserved values available to applications to use as they
see fit (and obviously some of those values would end up having "well
known semantics").

Somehow, the original discussiom seemed to touch on those ideas, that's
all.

**Lachlan Hunt** · Jul 23 '05, 10:42 PM

Re: non-breaking hyphen

Eric B. Bednarz wrote:[color=blue]
> Lachlan Hunt <spam.my.gspot@ gmail.com> writes:[color=green]
>>Using is tag soup, so that's just being hypocritical.[/color]
>
> Your favourite in-itself-hypocritical spec-of-the-week-club aside,[/color]

I have no idea which "club" you're talking about. I try to avoid being
hypocritical, and if I have, could you please explain so I can correct
myself?
[color=blue]
> it's no more tag soup than PI substitutes like BR or HR or mystery-meat
> attributes like WIDTH and HEIGHT.[/color]

Unlike , those elements and attributes do actually exist in the
HTML specificaitions . Although, if your point is that they are also
presentational, then I would somewhat agree.

It is true that in some cases, those elements and attributes can be
considered presentational, especially given the poorly structured design
of the and <hr> elements, which I'm sure has been discussed many
times before. However, if they are used correctly, they can be
reasonably semantic and their use is certainly not as bad as .

--
Lachlan Hunt

Lachlan Hunt: Web Development Guru

http://lachy.id.au/

http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web

**Alan J. Flavell** · Jul 23 '05, 10:42 PM

Re: non-breaking hyphen

On Mon, 14 Nov 2004, Malcolm Dew-Jones wrote:
[color=blue]
> What I meant was that the unicode standard, in my opinion, should
> not define anything but the mapping of character values to the
> characters name and an acceptable glyph for it.[/color]

I think that's what the iso-10646 part aims to do. You'll recall that
there were originally two separate pushes trying to address the i18n
problem: iso-10646, and Unicode, and that they sort-of spliced
themselves together. This would have been in the early 1990's,
roughly, IIRC. But the joins still show in a number of places.

The Unicode specification goes quite some way beyond merely assigning
code points to characters. It not only codifies characters (in terms
of case mapping, directionality, combining properties etc.) but also
defines a number of characters meant to exercise control functions
without corresponding to any displayable glyph (zero-width joiner and
non-joiner, directionality-control etc.). These are defined primarily
for their use in plain-text data; their applicability in particular
applications such as a markup language is less obvious, and needs to
be codified by the markup language (as indeed occurs to some extent in
HTML).

You might be of the opinion that that was inadvisable - was being done
at the wrong protocol level etc. - and for sure you'd have quite a few
arguments in your favour, but I'm afraid we have to take Unicode as it
is now, whatever we might think about such details.
[color=blue]
> Everything else should be handled via higher level logic. The exact
> details appropriate to that higher logic would depend on the
> technologies being used, not on unicode.[/color]

Same answer, I guess. HTML -could- have said that characters like
zero-width joiner, pop directional format, etc. had no business being
in an HTML source document, and that such matters had to be resolved
at the markup or presentation level; but HTML didn't say that - quite
the contrary, in fact. For better or for worse.

**Spartanicus** · Jul 23 '05, 10:42 PM

Re: non-breaking hyphen

Lachlan Hunt <spam.my.gspot@ gmail.com> wrote:
[color=blue]
>Unlike , those elements and attributes do actually exist in the
>HTML specificaitions . Although, if your point is that they are also
>presentational , then I would somewhat agree.
>
>It is true that in some cases, those elements and attributes can be
>considered presentational, especially given the poorly structured design
>of the and <hr> elements, which I'm sure has been discussed many
>times before. However, if they are used correctly, they can be
>reasonably semantic and their use is certainly not as bad as .[/color]

UAs don't give a flying monkey if the markup is valid, proper use of
 causes no problems, it prevents several UAs from applying
ludicrous unicode breaking rules, and a custom DTD solves the errors on
validation.

So what is you argument on why it is "bad"?

--
Spartanicus

**Eric B. Bednarz** · Jul 23 '05, 10:42 PM

Re: non-breaking hyphen

Lachlan Hunt <spam.my.gspot@ gmail.com> writes:
[color=blue]
> Eric B. Bednarz wrote:[/color]
[color=blue][color=green]
>> Your favourite in-itself-hypocritical spec-of-the-week-club aside,[/color]
>
> I have no idea which "club" you're talking about.[/color]

I was forward-guessing that the real upshot about 'tag soup' was the
mere absence of NOBR in W3C specs.
[color=blue][color=green]
>> it's no more tag soup than PI substitutes like BR or HR or mystery-meat
>> attributes like WIDTH and HEIGHT.[/color]
>
> Unlike , those elements and attributes do actually exist in the
> HTML specificaitions .[/color]

Well, who cares a rat's private parts.
[color=blue]
> It is true that in some cases, those elements and attributes can be
> considered presentational, especially given the poorly structured
> design of the and <hr> elements,[/color]

Well, the vocabulary of HTML being so blunt a tool is the reason that
sometimes presentational markup is better than nothing (e.g. for
anything that is denoted with italics in conventional typography but
falls short of a corresponding element type in HTML is still richer than
SPAN -- or nothing). You can argue about the virtue of such issues
until the cows come home.

BR, however, is not about *descriptive markup* (in SGML: tags) at all.
It tells the application to *do* something (e.g. explode, play some
music, render a new line; in SGML: processing instructions -- though one
could probably also argue that a character reference should do the
trick: the parser collapses the HTML whitespace chars and resolved
character references for CR/LF are passed to the application for literal
rendering. This -- like anything SGML related in HTML -- doesn't have
anything to do with web browsers, or real life in general, of course.
[color=blue]
> However, if they are used correctly, they can be reasonably semantic
> and their use is certainly not as bad as .[/color]

I still do not see what is bad about NOBR (or WBR, for that matter). In
the worst case scenario nothing happens.

Let's look at a slightly modified version of your earlier statement, and
pretend the double hyphen/minus was an em dash.

| Those elements and attributes--unlike NOBR--do actually exist in the
| HTML specifications.

If you oberve UA behaviour and Unicode line breaking rules, you'll
realise that you need some presentational markup to the rescue:

| Those elements and attributes--unli ke
| NOBR--do actually exist in the
| HTML specifications.

Neat, no? :)

--
| ) PiÃ¹ Cabernet,
-( meno Internet.
| ) http://bednarz.nl/

**Shmuel (Seymour J.) Metz** · Jul 23 '05, 10:43 PM

Re: non-breaking hyphen

In <41985667@news. victoria.tc.ca> , on 11/14/2004
at 11:10 PM, yf110@vtn1.vict oria.tc.ca (Malcolm Dew-Jones) said:
[color=blue]
>No matter what the desirable semantics of html might be, it still
>seems backwards to me that the "low level logic" of the individual
>symbols used in the document could have more control over the
>presentation than the "high level logic" of the markup language used
>by a tool that is very much concerned with the presentation details
>of the document.[/color]

Actually, it is backwards for the HTML to be very much concerned with
the presentation details of the document. That's not what HTML was
intended for.
[color=blue]
>What I meant was that the unicode standard, in my opinion, should not
>define anything but the mapping of character values to the characters
>name and an acceptable glyph for it.[/color]

I strongly disagree. That might be acceptable for character data in
your HTML, but it breaks entry of data into forms.
[color=blue]
>At least that is my opinion after seeing how complex the whole issue
>of unicode seems to have become,[/color]

It isn't just Unicode, and it didn't "become" complex; it was always
complex. You're looking at it from the perspective of an Indo-European
language, and aren't seein all of the issues.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@librar y.lspace.org

**Malcolm Dew-Jones** · Jul 23 '05, 10:44 PM

Re: non-breaking hyphen

Shmuel (Seymour J.) Metz (spamtrap@libra ry.lspace.org.i nvalid) wrote:
: In <41985667@news. victoria.tc.ca> , on 11/14/2004
: at 11:10 PM, yf110@vtn1.vict oria.tc.ca (Malcolm Dew-Jones) said:

: >No matter what the desirable semantics of html might be, it still
: >seems backwards to me that the "low level logic" of the individual
: >symbols used in the document could have more control over the
: >presentation than the "high level logic" of the markup language used
: >by a tool that is very much concerned with the presentation details
: >of the document.

: Actually, it is backwards for the HTML to be very much concerned with
: the presentation details of the document. That's not what HTML was
: intended for.

I said that HTML was used by a _tool_ that is concerned with presentation.

Such tools makes extensive presentation decisions based on html, and the
ability of the tools to make correct presentation decisions in a variety
of environments has always been a prime purpose for html.

: >What I meant was that the unicode standard, in my opinion, should not
: >define anything but the mapping of character values to the characters
: >name and an acceptable glyph for it.

: I strongly disagree. That might be acceptable for character data in
: your HTML, but it breaks entry of data into forms.

How does it break the entry of data into forms?

: >At least that is my opinion after seeing how complex the whole issue
: >of unicode seems to have become,

: It isn't just Unicode, and it didn't "become" complex; it was always
: complex. You're looking at it from the perspective of an Indo-European
: language, and aren't seein all of the issues.

That's right, I am. Various language issues should not be dealt with at
the level of character data exactly because some human language issues do
not easily map to character data. Trying to do so just makes everything
unnecessarily complicated for the languages that do map reasonably well to
such a simple system.

Those other issues should be dealt with by a higher level protocol.

If unicode had been kept simple then we all would have been using it for
all western-style languages many years ago, and all the effort currently
being spent would instead be used working on systems that work more
naturally for non-western-languages.

However, as has been pointed out, the decisions have been made and we have
to live with them.

**Shmuel (Seymour J.) Metz** · Jul 23 '05, 10:44 PM

Re: non-breaking hyphen

In <419af24c@news. victoria.tc.ca> , on 11/16/2004
at 10:40 PM, yf110@vtn1.vict oria.tc.ca (Malcolm Dew-Jones) said:
[color=blue]
>I said that HTML was used by a _tool_ that is concerned with
>presentation .[/color]

And water is wet. Your point?
[color=blue]
>How does it break the entry of data into forms?[/color]

Because it fails to present the data as the user expects when the
user enters Unicode data that are intended to have an effect on
presentation.
[color=blue]
>That's right, I am. Various language issues should not be dealt
>with at the level of character data exactly because some human
>language issues do not easily map to character data.[/color]

The issues dealt with by Unicode do map easily.
[color=blue]
>Those other issues should be dealt with by a higher level protocol.[/color]

No. What you want would destroy interoperabilit y between applications.
[color=blue]
>If unicode had been kept simple then we all would have been using it
>for all western-style languages many years ago,[/color]

Why? And why would its adoption matter matter if it didn't include
true internationaliz ation?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@librar y.lspace.org

non-breaking hyphen

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment