Very strange unicode behaviour

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Syver Enstad

    Very strange unicode behaviour


    Here's the interactive session

    Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32
    Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
    >>> ord('\xe5')[/color][/color][/color]
    229[color=blue][color=green][color=darkred]
    >>> '\xe5'.find(u'' )[/color][/color][/color]
    -1[color=blue][color=green][color=darkred]
    >>> 'p\xe5'.find(u' ')[/color][/color][/color]
    UnicodeError: ASCII decoding error: ordinal not in range(128)[color=blue][color=green][color=darkred]
    >>> 'p\xe4'.find(u' ')[/color][/color][/color]
    -1[color=blue][color=green][color=darkred]
    >>> 'p\xe5'.find(u' ')[/color][/color][/color]
    UnicodeError: ASCII decoding error: ordinal not in range(128)[color=blue][color=green][color=darkred]
    >>> print '\xe5'[/color][/color][/color]
    Õ[color=blue][color=green][color=darkred]
    >>> print 'p\xe5'[/color][/color][/color]
    pÕ[color=blue][color=green][color=darkred]
    >>> 'p\xe5'[/color][/color][/color]
    'p\xe5'[color=blue][color=green][color=darkred]
    >>> def func():[/color][/color][/color]
    .... try:
    .... '\xe5'.find(u'' )
    .... except UnicodeError:
    .... pass
    ....[color=blue][color=green][color=darkred]
    >>> func()
    >>> for each in range(1):[/color][/color][/color]
    .... func()
    ....
    UnicodeError: ASCII decoding error: ordinal not in range(128)[color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]

    It's weird that \xe5 throws and not \xe4 but even weirder that the
    exception is not cleared so that the loop reports it.

    Is this behaviour the same on Python 2.3?


  • Thomas Heller

    #2
    Re: Very strange unicode behaviour

    Syver Enstad <syver@inout.no > writes:
    [color=blue]
    > Here's the interactive session
    >
    > Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32
    > Type "help", "copyright" , "credits" or "license" for more information.[color=green][color=darkred]
    >>>> ord('\xe5')[/color][/color]
    > 229[color=green][color=darkred]
    >>>> '\xe5'.find(u'' )[/color][/color]
    > -1[color=green][color=darkred]
    >>>> 'p\xe5'.find(u' ')[/color][/color]
    > UnicodeError: ASCII decoding error: ordinal not in range(128)[color=green][color=darkred]
    >>>> 'p\xe4'.find(u' ')[/color][/color]
    > -1[color=green][color=darkred]
    >>>> 'p\xe5'.find(u' ')[/color][/color]
    > UnicodeError: ASCII decoding error: ordinal not in range(128)[color=green][color=darkred]
    >>>> print '\xe5'[/color][/color]
    > Õ[color=green][color=darkred]
    >>>> print 'p\xe5'[/color][/color]
    > pÕ[color=green][color=darkred]
    >>>> 'p\xe5'[/color][/color]
    > 'p\xe5'[color=green][color=darkred]
    >>>> def func():[/color][/color]
    > ... try:
    > ... '\xe5'.find(u'' )
    > ... except UnicodeError:
    > ... pass
    > ...[color=green][color=darkred]
    >>>> func()
    >>>> for each in range(1):[/color][/color]
    > ... func()
    > ...
    > UnicodeError: ASCII decoding error: ordinal not in range(128)[color=green][color=darkred]
    >>>>[/color][/color]
    >
    > It's weird that \xe5 throws and not \xe4 but even weirder that the
    > exception is not cleared so that the loop reports it.
    >
    > Is this behaviour the same on Python 2.3?[/color]

    No, it behaves correctly as it seems:

    Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
    Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
    >>> ord('\xe5')[/color][/color][/color]
    229[color=blue][color=green][color=darkred]
    >>> '\xe5'.find(u'' )[/color][/color][/color]
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)[color=blue][color=green][color=darkred]
    >>> '\xe4'.find(u'' )[/color][/color][/color]
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)[color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]

    Thomas

    Comment

    • Syver Enstad

      #3
      Re: Very strange unicode behaviour


      I have seem to isolate the behaviour:
      [color=blue][color=green][color=darkred]
      >>> chr(127).find(u '')[/color][/color][/color]
      0[color=blue][color=green][color=darkred]
      >>>
      >>> chr(128).find(u '')[/color][/color][/color]
      -1[color=blue][color=green][color=darkred]
      >>>[/color][/color][/color]
      UnicodeError: ASCII decoding error: ordinal not in range(128)[color=blue][color=green][color=darkred]
      >>>[/color][/color][/color]

      Observe that the exception is first thrown on after execing a blank
      line, after calling find on a non ASCII string.

      I'll test this on 2.3 when I get home from work.

      Comment

      • Syver Enstad

        #4
        Re: Very strange unicode behaviour

        Thomas Heller <theller@python .net> writes:
        [color=blue][color=green]
        > > Is this behaviour the same on Python 2.3?[/color]
        >
        > No, it behaves correctly as it seems:[/color]

        That's a relief. See my later post to see why the first report I gave
        was *very* confusing.


        Comment

        • Thomas Heller

          #5
          Re: Very strange unicode behaviour

          Syver Enstad <syver@inout.no > writes:
          [color=blue]
          > Thomas Heller <theller@python .net> writes:
          >[color=green][color=darkred]
          >> > Is this behaviour the same on Python 2.3?[/color]
          >>
          >> No, it behaves correctly as it seems:[/color]
          >
          > That's a relief. See my later post to see why the first report I gave
          > was *very* confusing.[/color]

          It looks like an ignored exception somewhere without calling
          PyErr_Clear().

          Thomas

          Comment

          • Erik Max Francis

            #6
            Re: Very strange unicode behaviour

            Syver Enstad wrote:
            [color=blue]
            > I have seem to isolate the behaviour:
            >[color=green][color=darkred]
            > >>> chr(127).find(u '')[/color][/color]
            > 0[color=green][color=darkred]
            > >>>
            > >>> chr(128).find(u '')[/color][/color]
            > -1[color=green][color=darkred]
            > >>>[/color][/color]
            > UnicodeError: ASCII decoding error: ordinal not in range(128)[color=green][color=darkred]
            > >>>[/color][/color]
            >
            > Observe that the exception is first thrown on after execing a blank
            > line, after calling find on a non ASCII string.
            >
            > I'll test this on 2.3 when I get home from work.[/color]

            I can indeed reproduce this bizarre behavior on 2.2.3. The problem does
            not occur in 2.3.3:

            max@oxygen:~% python
            Python 2.3.3 (#1, Dec 22 2003, 23:44:26)
            [GCC 3.2.3] on linux2
            Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
            >>> chr(127).find(u '')[/color][/color][/color]
            0[color=blue][color=green][color=darkred]
            >>> chr(128).find(u '')[/color][/color][/color]
            Traceback (most recent call last):
            File "<stdin>", line 1, in ?
            UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0x80 in position 0:
            ordinal not in range(128)[color=blue][color=green][color=darkred]
            >>>[/color][/color][/color]
            max@oxygen:~% python2.2
            Python 2.2.3 (#1, Jan 8 2004, 22:40:34)
            [GCC 3.2.3] on linux2
            Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
            >>> chr(127).find(u '')[/color][/color][/color]
            0[color=blue][color=green][color=darkred]
            >>> chr(128).find(u '')[/color][/color][/color]
            -1[color=blue][color=green][color=darkred]
            >>>[/color][/color][/color]
            UnicodeError: ASCII decoding error: ordinal not in range(128)


            --
            __ Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
            / \ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
            \__/ The doors of Heaven and Hell are adjacent and identical.
            -- Nikos Kazantzakis

            Comment

            Working...