raw_input encoding diferent from code encoding problem

**al san** · Dec 10 '10, 06:52 PM

Hello, everyone, any help wil be greatly appreciated.

I'm writing code for a japanese learning app in python;
so all my japanese strings have are preceeded by 'u' (eg. u'にほんご'); Everthing works like a charm, except for when I get input from the user.

Code:

>>> internal=[u'\u306f\u3057\u3063\u305f', u'\u306f\u3057\u3063\u3066', u'\u306f\u3057\u3089\u306a\u3044', u'\u306f\u3057\u308a\u307e\u3059', u'\u306f\u3057\u308b', u'\u306f\u3057\u308c', u'\u306f\u3057\u308d\u3046']
>>> user_input=['\x82\xcd\x82\xb5\x82\xc1\x82\xbd', '\x82\xcd\x82\xb5\x82\xc1\x82\xc4', '\x82\xcd\x82\xb5\x82\xe7\x82\xc8\x82\xa2', '\x82\xcd\x82\xb5\x82\xe8\x82\xdc\x82\xb7', '\x82\xcd\x82\xb5\x82\xe9', '\x82\xcd\x82\xb5\x82\xea', '\x82\xcd\x82\xb5\x82\xeb\x82\xa4']
>>> for i in range (len(internal)):
	print "code data: %s" % internal[i]
	print "raw_input: %s" % user_input[i]

code data: はしった
raw_input: はしった
code data: はしって
raw_input: はしって
code data: はしらない
raw_input: はしらない
code data: はしります
raw_input: はしります
code data: はしる
raw_input: はしる
code data: はしれ
raw_input: はしれ
code data: はしろう
raw_input: はしろう
>>>

As you can see, on the screen the text looks the same, but, behind the scenes the two encodings are totally different from eachother.
Why is this an issue?
Well, because I need to compare the two texts(raw_input vs program data) and if they
look the same on the screen, then I need the result to be True not False...

Thank you for your time.
Cheers