Unicode and exception strings

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rune Froysa

    Unicode and exception strings

    Assuming an exception like:

    x = ValueError(u'\x f8')

    AFAIK the common way to get a string representation of the exception
    as a message is to simply cast it to a string: str(x). This will
    result in an "UnicodeErr or: ASCII encoding error: ordinal not in
    range(128)".

    The common way to fix this is with something like
    u'\xf8'.encode( "ascii", 'replace'). However I can't find any way to
    tell ValueErrors __str__ method which encoding to use.

    Is it possible to solve this without using sys.setdefaulte ncoding()
    from sitecustomize?

    Regards,
    Rune Frøysa
  • Terry Carroll

    #2
    Re: Unicode and exception strings

    On 09 Jan 2004 13:18:39 +0100, Rune Froysa <rune.froysa@us it.uio.no>
    wrote:
    [color=blue]
    >Assuming an exception like:
    >
    > x = ValueError(u'\x f8')
    >
    >AFAIK the common way to get a string representation of the exception
    >as a message is to simply cast it to a string: str(x). This will
    >result in an "UnicodeErr or: ASCII encoding error: ordinal not in
    >range(128)".
    >
    >The common way to fix this is with something like
    >u'\xf8'.encode ("ascii", 'replace'). However I can't find any way to
    >tell ValueErrors __str__ method which encoding to use.[/color]

    Rune, I'm not understanding what your problem is.

    Is there any reason you're not using, for example, just repr(u'\xf8')?

    In one program I have that occasionally runs into a line that includes
    some (UTF-8) Unicode-encoded Chinese characters , I have something like
    this:

    try:
    _display_text = _display_text + "%s\n" % line
    except UnicodeDecodeEr ror:
    try:
    # decode those UTF8 nasties
    _display_text = _display_text + "%s\n" % line.decode('ut f-8'))
    except UnicodeDecodeEr ror:
    # if that still doesn't work, punt
    # (I don't think we'll ever reach this, but just in case)
    _display_text = _display_text + "%s\n" % repr(line)

    I don't know if this will help you or not.

    Comment

    • Terry Carroll

      #3
      Re: Unicode and exception strings

      On Fri, 09 Jan 2004 19:44:21 GMT, Terry Carroll <carroll@tjc.co m> wrote:
      [color=blue]
      >In one program I have that occasionally runs into a line that includes
      >some (UTF-8) Unicode-encoded Chinese characters , I have something like
      >this:[/color]

      Sorry, a stray parenthesis crept in here (since this is a pared down
      version of my actual code). It should read:


      try:
      _display_text = _display_text + "%s\n" % line
      except UnicodeDecodeEr ror:
      try:
      # decode those UTF8 nasties
      _display_text = _display_text + "%s\n" % line.decode('ut f-8')
      except UnicodeDecodeEr ror:
      # if that still doesn't work, punt
      # (I don't think we'll ever reach this, but just in case)
      _display_text = _display_text + "%s\n" % repr(line)
      [color=blue]
      > I don't know if this will help you or not.[/color]

      Comment

      • Rune Froysa

        #4
        Re: Unicode and exception strings

        Terry Carroll <carroll@tjc.co m> writes:
        [color=blue]
        > On 09 Jan 2004 13:18:39 +0100, Rune Froysa <rune.froysa@us it.uio.no>
        > wrote:
        >[color=green]
        > >Assuming an exception like:
        > >
        > > x = ValueError(u'\x f8')
        > >
        > >AFAIK the common way to get a string representation of the exception
        > >as a message is to simply cast it to a string: str(x). This will
        > >result in an "UnicodeErr or: ASCII encoding error: ordinal not in
        > >range(128)".
        > >
        > >The common way to fix this is with something like
        > >u'\xf8'.encode ("ascii", 'replace'). However I can't find any way to
        > >tell ValueErrors __str__ method which encoding to use.[/color]
        >
        > Rune, I'm not understanding what your problem is.
        >
        > Is there any reason you're not using, for example, just repr(u'\xf8')?[/color]

        The problem is that I have little control over the message string that
        is passed to ValueError(). All my program knows is that it has caught
        one such error, and that its message string is in unicode format. I
        need to access the message string (for logging etc.).
        [color=blue]
        > _display_text = _display_text + "%s\n" % line.decode('ut f-8'))[/color]

        This does not work, as I'm unable to get at the 'line', which is
        stored internally in the ValueError class (and generated by its __str_
        method).

        Regards,
        Rune Frøysa

        Comment

        • Terry Carroll

          #5
          Re: Unicode and exception strings

          On Wed, 14 Jan 2004 01:32:36 GMT, Terry Carroll <carroll@tjc.co m> wrote:
          [color=blue]
          >You can try to extract it as above, and then decode it with the codecs
          >module, but if it's only the first byte, it won't decode correctly:
          >[color=green][color=darkred]
          >>>> import codecs
          >>>> d = codecs.getdecod er('utf-8')
          >>>> x.args[0][/color][/color]
          >u'\xf8'[color=green][color=darkred]
          >>>> d.decode(x.args[0])[/color][/color]
          >Traceback (most recent call last):
          > File "<stdin>", line 1, in ?
          >AttributeError : 'builtin_functi on_or_method' object has no attribute
          >'decode'[color=green][color=darkred]
          >>>>[/color][/color][/color]

          Oops. Copy-and-pasted the wrong line here. Let's try that again:
          [color=blue][color=green][color=darkred]
          >>> x = ValueError(u'\x f8')
          >>> import codecs
          >>> d = codecs.getdecod er('utf-8')
          >>> d(x.args[0])[/color][/color][/color]
          Traceback (most recent call last):
          File "<stdin>", line 1, in ?
          UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xf8' in
          position 0:
          ordinal not in range(128)[color=blue][color=green][color=darkred]
          >>>[/color][/color][/color]

          *That's* the exception I was trying to show, not the AttributeError you
          get when you use the decoder wrongly!

          Comment

          Working...