Problems with unicode

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • James Laamnna

    Problems with unicode

    I'm trying to write out a XML document using a StringIO class, however
    I always run into the following error:

    UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0x92 in position
    4: ordinal
    not in range(128)

    Apparently in the batch that I'm encoding there is one string with
    non-ascii characters in it.
    Is there any way to just have it encode everything as unicode and not
    ascii?
    Or should I just strip out non-ascii characters (a last resort which I
    do not want to do)
    Thanks.
  • David Opstad

    #2
    Re: Problems with unicode

    In article <a091da2f.04031 31345.5e82b07e@ posting.google. com>,
    jamesl@appliedm inds.com (James Laamnna) wrote:
    [color=blue]
    > Apparently in the batch that I'm encoding there is one string with
    > non-ascii characters in it. Is there any way to just have it encode
    > everything as unicode and not ascii?[/color]

    A better question to ask is this: where did the supposed ASCII data come
    from in the first place? If, for instance, it came from a Windows
    machine, then there's a chance it's actually ISO-8859-1 encoding, in
    which case you can preserve the 0x92 by encoding using that codec,
    instead of the 'ascii' one. Similarly, if the original text came from a
    Mac, then it's likely in Mac Roman, so if you use the 'mac-roman' codec
    you'll be able to preserve the correct character in your resulting
    Unicode.

    Dave

    Comment

    • Jarek Zgoda

      #3
      Re: Problems with unicode

      David Opstad <opstad@batnet. com> pisze:
      [color=blue]
      > If, for instance, it came from a Windows
      > machine, then there's a chance it's actually ISO-8859-1 encoding[/color]

      If it came from Windows, it's actually CP-1252, not Latin-1.



      --
      Jarek Zgoda

      Comment

      Working...