Interpreting string containing \u000a

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Francis Girard

    Interpreting string containing \u000a

    Hi,

    I have an ISO-8859-1 file containing things like
    "Hello\u000d\u0 00aWorld", i.e. the character '\', followed by the
    character 'u' and then '0', etc.

    What is the easiest way to automatically translate these codes into
    unicode characters ?

    Thank you

    Francis Girard
  • Duncan Booth

    #2
    Re: Interpreting string containing \u000a

    "Francis Girard" <francis.girard 07@gmail.comwro te:
    I have an ISO-8859-1 file containing things like
    "Hello\u000d\u0 00aWorld", i.e. the character '\', followed by the
    character 'u' and then '0', etc.
    >
    What is the easiest way to automatically translate these codes into
    unicode characters ?
    >
    >>s = r"Hello\u000d\u 000aWorld"
    >>print s
    Hello\u000d\u00 0aWorld
    >>s.decode('i so-8859-1').decode('uni code-escape')
    u'Hello\r\nWorl d'
    >>>
    --
    Duncan Booth http://kupuguy.blogspot.com

    Comment

    • Peter Otten

      #3
      Re: Interpreting string containing \u000a

      Francis Girard wrote:
      I have an ISO-8859-1 file containing things like
      "Hello\u000d\u0 00aWorld", i.e. the character '\', followed by the
      character 'u' and then '0', etc.
      >
      What is the easiest way to automatically translate these codes into
      unicode characters ?
      If the file really contains the escape sequences use "unicode-escape" as the
      encoding:
      >>"Hello\\u000d \\u000aWorld".d ecode("unicode-escape")
      u'Hello\r\nWorl d'

      If it contains the raw bytes use "iso-8859-1":
      >>"Hello\x0d\x0 aWorld".decode( "iso-8859-1")
      u'Hello\r\nWorl d'

      Open the file with

      codecs.open(fil ename, encoding=encodi ng_as_determine d_above)

      instead of the builtin open().

      Peter

      Comment

      Working...