I'm struggling with what should be a trivial problem but I can't seem to
come up with a proper solution: I am working on a CGI that takes utf-8
input from a browser. The input is nicely encoded so you get something
like this:
firstname=t%C3% A9s
where %C3CA9 is a single character in utf-8 encoding. Passing this
through urllib.unquote does not help:
[color=blue][color=green][color=darkred]
>>> urllib.unquote( u't%C3%A9st')[/color][/color][/color]
u't%C3%A9st'
The problem turned out to be that urllib.unquote( ) process processes
its input character by character which breaks when it tries to call
chr() for a character: it gets a character which is not valid ascii
(outside the legal range) or valid unicode (it's only half a utf-8
character) and as a result it fails:
[color=blue][color=green][color=darkred]
>>> chr(195) + u""[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I can't seem to find a working method to do this conversion correctly.
Can someone point me in the right direction? (and please cc me on
replies since I'm not currently subscribed to this list/newsgroup).
Wichert.
--
Wichert Akkerman <wichert@wiggy. net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.
come up with a proper solution: I am working on a CGI that takes utf-8
input from a browser. The input is nicely encoded so you get something
like this:
firstname=t%C3% A9s
where %C3CA9 is a single character in utf-8 encoding. Passing this
through urllib.unquote does not help:
[color=blue][color=green][color=darkred]
>>> urllib.unquote( u't%C3%A9st')[/color][/color][/color]
u't%C3%A9st'
The problem turned out to be that urllib.unquote( ) process processes
its input character by character which breaks when it tries to call
chr() for a character: it gets a character which is not valid ascii
(outside the legal range) or valid unicode (it's only half a utf-8
character) and as a result it fails:
[color=blue][color=green][color=darkred]
>>> chr(195) + u""[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I can't seem to find a working method to do this conversion correctly.
Can someone point me in the right direction? (and please cc me on
replies since I'm not currently subscribed to this list/newsgroup).
Wichert.
--
Wichert Akkerman <wichert@wiggy. net> It is simple to make things.
http://www.wiggy.net/ It is hard to make things simple.
Comment