raw_input encoding diferent from code encoding problem

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • al san
    New Member
    • Dec 2010
    • 2

    raw_input encoding diferent from code encoding problem

    Hello, everyone, any help wil be greatly appreciated.

    I'm writing code for a japanese learning app in python;
    so all my japanese strings have are preceeded by 'u' (eg. u'にほんご'); Everthing works like a charm, except for when I get input from the user.
    Code:
    >>> internal=[u'\u306f\u3057\u3063\u305f', u'\u306f\u3057\u3063\u3066', u'\u306f\u3057\u3089\u306a\u3044', u'\u306f\u3057\u308a\u307e\u3059', u'\u306f\u3057\u308b', u'\u306f\u3057\u308c', u'\u306f\u3057\u308d\u3046']
    >>> user_input=['\x82\xcd\x82\xb5\x82\xc1\x82\xbd', '\x82\xcd\x82\xb5\x82\xc1\x82\xc4', '\x82\xcd\x82\xb5\x82\xe7\x82\xc8\x82\xa2', '\x82\xcd\x82\xb5\x82\xe8\x82\xdc\x82\xb7', '\x82\xcd\x82\xb5\x82\xe9', '\x82\xcd\x82\xb5\x82\xea', '\x82\xcd\x82\xb5\x82\xeb\x82\xa4']
    >>> for i in range (len(internal)):
    	print "code data: %s" % internal[i]
    	print "raw_input: %s" % user_input[i]
    
    code data: はしった
    raw_input: はしった
    code data: はしって
    raw_input: はしって
    code data: はしらない
    raw_input: はしらない
    code data: はしります
    raw_input: はしります
    code data: はしる
    raw_input: はしる
    code data: はしれ
    raw_input: はしれ
    code data: はしろう
    raw_input: はしろう
    >>>
    As you can see, on the screen the text looks the same, but, behind the scenes the two encodings are totally different from eachother.
    Why is this an issue?
    Well, because I need to compare the two texts(raw_input vs program data) and if they
    look the same on the screen, then I need the result to be True not False...

    Thank you for your time.
    Cheers
Working...