Unicode troubles

Rodrigo Benenson
#1

Unicode troubles

Jul 18 '05, 03:32 AM

Hi!
I'm finishing a multiplatform collaborative realtime text editor (something
like SubEthaEdit but multiplatform and opensource) develloped using
Python+Twisted as a plugin for Leo.

Of course as the software run in different platforms in different places,
text encoding compatibility is an issue.
So the obvious choice was Tkencoding for client gui, unicode for system
internals and utf-8 for web outputs.
But I'm getting serious trouble using Tk and Unicode internals.

The system, being a text editor use string lenghts and position in the text
widget as parameters of most of the function critical algorithms.
Unfortunatelly I had discovered recently that some encoding does not provide
and equivalence between
num_of_chars/length_of_strin g/position_in_tex t_widget. As a result each time
someone press a non ascii key, the references are lose and the other clients
receive a soup of letters.

I had read on internet that Unicode was supposed to keep the relation
num_of_char/string_lenght (and thus the relation
string_length/num_of_char/position_in_tex t_widget). But this relation does
not occurs on all my machines.

Sometimes I get len(u"eló") = 3 (the good result) and other times
len(u"eló") = 4 (wrong result). These seems indiferent of the OS.

Could someone explain me this issue ? How I'm supposed to manage this
problem ? Do I have to compile python with special params to get unicode
chars and one length unit ?

Thanks.
Rodrigo Benenson.
Tags: None
Michael Radziej
#2

Jul 18 '05, 03:34 AM

Re: Unicode troubles

Rodrigo Benenson wrote:
[color=blue]
> Sometimes I get len(u"eló") = 3 (the good result) and other times
> len(u"eló") = 4 (wrong result). These seems indiferent of the OS.[/color]

There are different ways to express "special" characters.
E.g. you can describe "ó" as a single character,
or as accent + "o".
What you want is the "canonical form".
Take a look at unicodedata.nor malize (well, it came
new with Python 2.3)

Welcome to Python.org

http://www.python.org/doc/current/lib/module-unicodedata.html

The official home of the Python Programming Language

Hope this helps,

Michael Radziej
Comment

Unicode troubles

Comment