Re: Unicode confusion

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jerry Hill

    Re: Unicode confusion

    On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <timothywayne.c ook@gmail.comwr ote:
    if I say units=unicode(" °"). I get
    UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xc2 in position 0:
    ordinal not in range(128)
    >
    If I try x=unicode.decod e(x,'utf-8'). I get
    TypeError: descriptor 'decode' requires a 'unicode' object but received
    a 'str'
    >
    What is the correct way to interpret these symbols that come to me as a
    string?
    Part of it depends on where you're getting them from. If they are in
    your source code, just define them like this:
    >>units = u"°"
    >>print units
    °
    >>print repr(units)
    u'\xb0'

    If they're coming from an external source, you have to know the
    encoding they're being sent in. Then you can decode them into
    unicode, like this:
    >>units = "°"
    >>unicode_uni ts = units.decode('L atin-1')
    >>print repr(unicode_un its)
    u'\xb0'
    >>print unicode_units
    °

    --
    Jerry
  • Mark Tolonen

    #2
    Re: Unicode confusion


    "Jerry Hill" <malaclypse2@gm ail.comwrote in message
    news:mailman.14 .1216054283.922 .python-list@python.org ...
    On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <timothywayne.c ook@gmail.com>
    wrote:
    >if I say units=unicode(" °"). I get
    >UnicodeDecodeE rror: 'ascii' codec can't decode byte 0xc2 in position 0:
    >ordinal not in range(128)
    >>
    >If I try x=unicode.decod e(x,'utf-8'). I get
    >TypeError: descriptor 'decode' requires a 'unicode' object but received
    >a 'str'
    >>
    >What is the correct way to interpret these symbols that come to me as a
    >string?
    >
    Part of it depends on where you're getting them from. If they are in
    your source code, just define them like this:
    >
    >>>units = u"°"
    >>>print units
    °
    >>>print repr(units)
    u'\xb0'
    >
    If they're coming from an external source, you have to know the
    encoding they're being sent in. Then you can decode them into
    unicode, like this:
    >
    >>>units = "°"
    >>>unicode_unit s = units.decode('L atin-1')
    >>>print repr(unicode_un its)
    u'\xb0'
    >>>print unicode_units
    °
    >
    --
    Jerry
    >
    Even with source code you have to know the encoding. for pre-3.x, Python
    defaults to ascii encoding for source files:

    test.py contains:
    units = u"°"
    >>import test
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "test.py", line 1
    SyntaxError: Non-ASCII character '\xb0' in file test.py on line 1, but no
    encoding declared; see http://www.python.org/peps/pep-0263.html for details

    The encoding of the source file can be declared:

    # coding: latin-1
    units = u"°"
    >>import test
    >>test.units
    u'\xb0'
    >>print test.units
    °

    Make sure to use the correct encoding! Here the file was saved in latin-1,
    but declared utf8:

    # coding: utf8
    units = u"°"
    >>import test
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeEr ror: 'utf8' codec can't decode byte 0xb0 in position 0:
    unexpected code byte
    >>>
    --
    Mark

    Comment

    Working...