Using Python 2.6, I am trying to pickle a dictionary (for Chinese pinyin) which contains both Unicode characters in the range 128-255 and 4-byte Unicode characters. I get allergic reactions from pickle.dump() under all protocols.
Here’s a simple test program:
1. Attempting to run this gives the error:
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xfc in position 10: ordinal not in range(128)
This is understandable, since protocol 0 is strictly ASCII and 0xfc is the character 'ü'.
2. With protocol=2 (or -1) I get a different, more mysterious error:
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
Well, let's try to use pickle.dumps() (which DOES work) and store the resulting string in a file.
3. Running this program, I get the error “can’t decode byte 0xfc in position 10” as in program 1.
Isn’t this horribly, and uselessly, frustrating?? The pickle module has been around long enough not to stub its toes on this dinky example. Or is there something I have missed?
There is a long discussion of this issue in Issue 2980: Pickle stream for unicode object may contain non-ASCII characters. - Python tracker , which seems to address this problem but does not solve it that I can see.
Thank you all for your help & understanding.
Here’s a simple test program:
Code:
# Program 1 (protocol 0), program 2 (protocol 2)
PickleFile = codecs.open('PFile.utf', 'w', 'utf-8')
Str1 = u'lǘelü'
pickle.dump(Str1, PickleFile, protocol=0) # Error here!
PickleFile.close()
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xfc in position 10: ordinal not in range(128)
This is understandable, since protocol 0 is strictly ASCII and 0xfc is the character 'ü'.
2. With protocol=2 (or -1) I get a different, more mysterious error:
UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
Well, let's try to use pickle.dumps() (which DOES work) and store the resulting string in a file.
Code:
# Program 3. Using pickle.dumps()
Str1 = u'lǘelü'
PickleStr1 = pickle.dumps(Str1) # So far so good!
SPickleFile = codecs.open('SpFile.utf', 'w', 'utf-8')
SPickleFile.write(PickleStr1) # Error here!
close(SPickleFile)
Isn’t this horribly, and uselessly, frustrating?? The pickle module has been around long enough not to stub its toes on this dinky example. Or is there something I have missed?
There is a long discussion of this issue in Issue 2980: Pickle stream for unicode object may contain non-ASCII characters. - Python tracker , which seems to address this problem but does not solve it that I can see.
Thank you all for your help & understanding.
Comment