I've been trying to stay blissfully unaware of Unicode, however now it seems like it's my turn. From the outside it seems like a rather massive subject, so any pointers as to where I should _start_ reading would be appreciated. The usecase is a class:
class path(object):
...
def __str__(self):
return self.pathstr.en code(???)
the question is what to put at ??? to be most useful to programmers/end users?
Background: I've got a directory with a file called 'bæ', B-AE, (which everyone knows is what Norwegian sheep say :-). If I type u'bæ' at the Python command prompt, it returns:
u'b\x91'
(WinXP, 2.3.2, regular command window). With a little trial and error I found:
[color=blue][color=green][color=darkred]
>>> print u'b\x91'.encode ('latin1')[/color][/color][/color]
bæ
although
[color=blue][color=green][color=darkred]
>>> print u'b\x91'[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "e:\python23\li b\encodings\cp4 37.py", line 18, in encode
return codecs.charmap_ encode(input,er rors,encoding_m ap)
UnicodeEncodeEr ror: 'charmap' codec can't encode character '\u91' in position 1:
character maps to <undefined>
not sure I'm understanding this, and when I call:
os.listdir(os.g etcwdu())
I get back
u'b\xe6'
which isn't making much sense either...
help?!
-- bjorn
class path(object):
...
def __str__(self):
return self.pathstr.en code(???)
the question is what to put at ??? to be most useful to programmers/end users?
Background: I've got a directory with a file called 'bæ', B-AE, (which everyone knows is what Norwegian sheep say :-). If I type u'bæ' at the Python command prompt, it returns:
u'b\x91'
(WinXP, 2.3.2, regular command window). With a little trial and error I found:
[color=blue][color=green][color=darkred]
>>> print u'b\x91'.encode ('latin1')[/color][/color][/color]
bæ
although
[color=blue][color=green][color=darkred]
>>> print u'b\x91'[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "e:\python23\li b\encodings\cp4 37.py", line 18, in encode
return codecs.charmap_ encode(input,er rors,encoding_m ap)
UnicodeEncodeEr ror: 'charmap' codec can't encode character '\u91' in position 1:
character maps to <undefined>
not sure I'm understanding this, and when I call:
os.listdir(os.g etcwdu())
I get back
u'b\xe6'
which isn't making much sense either...
help?!
-- bjorn
Comment