Print formatted Strings with Umlauts

**Amy G** · Jul 18 '05, 08:22 AM

Re: Print formatted Strings with Umlauts

Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it.
[color=blue][color=green][color=darkred]
>>> a = 'äöü'
>>> len (a)[/color][/color][/color]
3
[color=blue][color=green][color=darkred]
>>> b = '123'
>>> print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)[/color][/color][/color]

äöü äöü
123 123

"Joerg Lehmann" <joerg.lehmann@ mail.com> wrote in message
news:91317660.0 402111249.4ccc6 e24@posting.goo gle.com...[color=blue]
> I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings[/color]
containing[color=blue]
> umlauts do not work as I would expect. Here is my example:
>[color=green][color=darkred]
> >>> a = 'äöü'
> >>> b = '123'
> >>> print "%-5s %-5s\n%-5s %-5s" % (a,a,b,b)[/color][/color]
> äöü äöü
> 123 123
>
> I would expect, that the displayed width of a or b is the same: 5[/color]
characters.[color=blue]
> I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
>[color=green][color=darkred]
> >>> print len(a), len(b)[/color][/color]
> 6 3
>
> I have tried to set the encoding in site.py to 'latin-1', but it did not[/color]
change[color=blue]
> my results. Is there no way to store umlauts in 1 byte??? What is the[/color]
right way[color=blue]
> to print strings containing umlauts in a tabular way (same field width)?
>
> Thanks!
> --
> Joerg Lehmann[/color]

**Jeff Epler** · Jul 18 '05, 08:22 AM

Re: Print formatted Strings with Umlauts

If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set:
[color=blue][color=green][color=darkred]
>>> b = '123'
>>> a = u'\344\366\374'
>>> print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")[/color][/color][/color]
Ã¤Ã¶Ã¼ Ã¤Ã¶Ã¼
123 123

However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want:[color=blue][color=green][color=darkred]
>>> u = u'\N{COMBINING DIAERESIS}'
>>> a = 'a%so%su%s' % (u,u,u)
>>> print a.encode("utf-8")[/color][/color][/color]
Ã¤Ã¶Ã¼[color=blue][color=green][color=darkred]
>>> print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")[/color][/color][/color]
Ã¤Ã¶Ã¼ Ã¤Ã¶Ã¼
123 123

You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example,[color=blue][color=green][color=darkred]
>>> a = u'\N{FULLWIDTH LATIN SMALL LETTER U}' * 3
>>> print (u"%-5s %-5s\n%-5s %-5s" % (a, a, b, b)).encode("utf-8")[/color][/color][/color]
ï½•ï½•ï½• ï½•ï½•ï½•
123 123

Jeff

**Martin v. Löwis** · Jul 18 '05, 08:23 AM

Re: Print formatted Strings with Umlauts

Joerg Lehmann wrote:[color=blue]
> I am using Python 2.2.3 (Fedora Core 1). ...
> I have tried to set the encoding in site.py to 'latin-1', but it did not change
> my results. Is there no way to store umlauts in 1 byte???[/color]

There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.
[color=blue]
> What is the right way
> to print strings containing umlauts in a tabular way (same field width)?[/color]

As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

**Joerg Lehmann** · Jul 18 '05, 08:24 AM

Re: Print formatted Strings with Umlauts

"Martin v. Löwis" <martin@v.loewi s.de> wrote in message news:<c0f8tb$5i u$05$1@news.t-online.com>...[color=blue]
> Joerg Lehmann wrote:[color=green]
> > I am using Python 2.2.3 (Fedora Core 1). ...
> > I have tried to set the encoding in site.py to 'latin-1', but it did not change
> > my results. Is there no way to store umlauts in 1 byte???[/color]
>
> There is, but Fedora Core 1 does not use it. Instead, it uses an
> encoding where an umlaut character needs two bytes (namely, UTF-8).
> Changing site.py does not change the way your system represents
> these characters.
>[color=green]
> > What is the right way
> > to print strings containing umlauts in a tabular way (same field width)?[/color]
>
> As Jeff explains: In the specific case, using Unicode strings would
> help. He is also right that, in general, it is very difficult to find
> out how many columns a single character uses, as some characters have
> width 0, and other characters have width 2 (in a mono-spaced terminal;
> for variable-spaced output, adding space characters to achieve
> formatting will never work reliably).
>
> Regards,
> Martin[/color]

I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:

I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:

LANG="en_US.ISO-8859-1"
SUPPORTED="en_U S.ISO-8859-1:en_US:en"
SYSFONT="latarc yrheb-sun16"

This fixed my problem, Umlauts are stored in one byte now.

Thanks for your inspirations.

PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.
--
Joerg Lehmann

Print formatted Strings with Umlauts

Print formatted Strings with Umlauts

Comment

Comment

Comment

Comment