Utilizing unicode strings

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nihilium
    New Member
    • Mar 2008
    • 16

    Utilizing unicode strings

    Ok, I am having some trouble handling unicode strings. Let's say that the file example.txt contains the word Ångström. When I put the u in front of '%s' I'll get the error below. Without the u, the text will not show up properly.

    [CODE=python]
    import re
    inputfile = file('C:/example.txt', 'r')
    inputfile = inputfile.read( )
    patt = re.compile(r'(. *)')
    m = patt.search(inp utfile)
    print u'%s' % m.group(1)
    Traceback (most recent call last):
    File "<pyshell#8 >", line 1, in <module>
    print u'%s' % m.group(1)
    UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xe6 in position 2: ordinal not in range(128)
    [/CODE]
  • Laharl
    Recognized Expert Contributor
    • Sep 2007
    • 849

    #2
    u%s does not mean a Unicode string. To print Unicode, read this article, which, while old, should still work.

    Comment

    • nihilium
      New Member
      • Mar 2008
      • 16

      #3
      Originally posted by Laharl
      u%s does not mean a Unicode string. To print Unicode, read this article, which, while old, should still work.
      So if I use

      [CODE=python]outputFile= codecs.open('ou tputFile.txt', 'w', 'utf-8')[/CODE]

      how do I then implement newline, \n, ?

      Comment

      Working...