Python 3.0 and repr

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mark Tolonen

    Python 3.0 and repr

    I don't understand the behavior of the interpreter in Python 3.0. I am
    working at a command prompt in Windows (US English), which has a terminal
    encoding of cp437.

    In Python 2.5:

    Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
    (Intel)] on win
    32
    Type "help", "copyright" , "credits" or "license" for more information.
    >>x=u'\u5000'
    >>x
    u'\u5000'

    In Python 3.0:

    Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit
    (Intel)]
    on win32
    Type "help", "copyright" , "credits" or "license" for more information.
    >>x='\u5000'
    >>x
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "c:\dev\python3 0\lib\io.py", line 1486, in write
    b = encoder.encode( s)
    File "c:\dev\python3 0\lib\encodings \cp437.py", line 19, in encode
    return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
    UnicodeEncodeEr ror: 'charmap' codec can't encode character '\u5000' in
    position
    1: character maps to <undefined>

    Where I would have expected
    >>x
    '\u5000'

    Shouldn't a repr() of x work regardless of output encoding? Another test:

    Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit
    (Intel)]
    on win32
    Type "help", "copyright" , "credits" or "license" for more information.
    >>bytes(range(2 56)).decode('cp 437')
    '\x00\x01\x02\x 03\x04\x05\x06\ x07\x08\t\n\x0b \x0c\r\x0e\x0f\ x10\x11\x12\x13 \x14\
    x15\x16\x17\x18 \x19\x1a\x1b\x1 c\x1d\x1e\x1f
    !"#$%&\'()*+ ,-./0123456789:;<=> ?@ABC
    DEFGHIJKLMNOPQR STUVWXYZ[\\]^_`abcdefghijkl mnopqrstuvwxyz{ |}~\x7fÇüéâ äàåçêëèïîà ¬Ã„Ã…
    ÉæÆôöòûà ¹Ã¿Ã–Ü¢£¥₠§Æ’áíóúñÑ ªº¿⌐¬½¼ ¡«»░▒▓ │┤╡╢╖ ╕╣║╗╝ ╜╛┐└┴ ┬├─┼╞ ╟╚╔╩╦ ╠═╬╧╨╤ ╥╙╘╒╓ ╫╪┘┌█ ▄▌▐▀
    Î±ÃŸÎ“Ï€Î£ÏƒÂµÏ „ΦΘΩδ∞φΠµâˆ©â‰¡Â±â‰¥â‰¤ ⌠⌡÷≈°∙· √ⁿ²■\xa0'
    >>bytes(range(2 56)).decode('cp 437')[255]
    '\xa0'

    Characters that cannot be displayed in cp437 are being escaped, such as
    0x00-0x1F, 0x7F, and 0xA0. Even if I incorrectly decode a value, if the
    character exists in cp437, it is displayed:
    >>bytes(range(2 56)).decode('la tin-1')[255]
    'ÿ'

    However, for a character that isn't supported by cp437, incorrectly decoded:
    >>bytes(range(2 56)).decode('cp 437')[254]
    'â– '
    >>bytes(range(2 56)).decode('la tin-1')[254]
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "c:\dev\python3 0\lib\io.py", line 1486, in write
    b = encoder.encode( s)
    File "c:\dev\python3 0\lib\encodings \cp437.py", line 19, in encode
    return codecs.charmap_ encode(input,se lf.errors,encod ing_map)[0]
    UnicodeEncodeEr ror: 'charmap' codec can't encode character '\xfe' in
    position 1:
    character maps to <undefined>

    Why not display '\xfe' here? It seems like this inconsistency would make it
    difficult to write things like doctests that weren't dependent on the
    tester's terminal. It also makes it difficult to inspect variables without
    hex(ord(n)) on a character-by-character basis. Maybe repr() should always
    display the ASCII representation with escapes for all other characters,
    especially considering the "repr() should produce output suitable for eval()
    when possible" rule.

    What are others' opinions? Any insight to this design decision?

    -Mark


  • =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=

    #2
    Re: Python 3.0 and repr

    What are others' opinions? Any insight to this design decision?

    The intention is that all printable characters in a string get displayed
    in repr. This was in particular requested by Japanese users (but also by
    other users of non-ASCII characters) which complained that repr() is
    fairly useless if your strings actually contains *no* ASCII characters
    (but all of them are printable).

    Notice that repr() of the string actually succeeds; try
    >>x='\u5000'
    >>z=repr(x)
    It is the printing of the repr that fails.
    Maybe repr() should always display the ASCII representation with
    escapes for all other characters
    You can use the ascii() builtin if you want that.
    especially considering the "repr() should produce output suitable for
    eval() when possible" rule.
    But that is preserved under the new behavior, also! Just try

    pyx='\u5000'
    pyeval(repr(x)) ==x
    True

    Regards,
    Martin

    P.S. How did you manage to get U+5000 into your data, on a system where
    the terminal encoding is cp437? Google translates it as "Rash"; the
    Unihan database also has "bewildered ", "wildly".

    Comment

    • Mark Tolonen

      #3
      Re: Python 3.0 and repr


      ""Martin v. Löwis"" <martin@v.loewi s.dewrote in message
      news:48dffb54$0 $1082$9b622d9e@ news.freenet.de ...
      >What are others' opinions? Any insight to this design decision?
      >
      The intention is that all printable characters in a string get displayed
      in repr. This was in particular requested by Japanese users (but also by
      other users of non-ASCII characters) which complained that repr() is
      fairly useless if your strings actually contains *no* ASCII characters
      (but all of them are printable).
      >
      Notice that repr() of the string actually succeeds; try
      >
      >>>x='\u5000'
      >>>z=repr(x)
      >
      It is the printing of the repr that fails.
      >
      >Maybe repr() should always display the ASCII representation with
      >escapes for all other characters
      >
      You can use the ascii() builtin if you want that.
      >
      >especially considering the "repr() should produce output suitable for
      >eval() when possible" rule.
      >
      But that is preserved under the new behavior, also! Just try
      >
      pyx='\u5000'
      pyeval(repr(x)) ==x
      True
      >
      Regards,
      Martin
      Thanks Martin, it's clear now. I just read about the new ascii() function
      before seeing your reply.
      P.S. How did you manage to get U+5000 into your data, on a system where
      the terminal encoding is cp437? Google translates it as "Rash"; the
      Unihan database also has "bewildered ", "wildly".
      I just picked that example out of the air. I study Chinese and knew it was
      a character in that area of the Unicode map. My usual editors (PythonWin
      and PyAlaMode from wxPython) don't work with Python 3, which was why I was
      using the Windows cmd prompt.

      Thanks,
      Mark

      Comment

      Working...