Interpreting string containing \u000a

**Duncan Booth** · Jun 27 '08, 04:29 PM

Re: Interpreting string containing \u000a

"Francis Girard" <francis.girard 07@gmail.comwro te:

I have an ISO-8859-1 file containing things like
"Hello\u000d\u0 00aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.
>
What is the easiest way to automatically translate these codes into
unicode characters ?
>

>>s = r"Hello\u000d\u 000aWorld"
>>print s

Hello\u000d\u00 0aWorld

>>s.decode('i so-8859-1').decode('uni code-escape')

u'Hello\r\nWorl d'

>>>

--
Duncan Booth http://kupuguy.blogspot.com

**Peter Otten** · Jun 27 '08, 04:29 PM

Re: Interpreting string containing \u000a

Francis Girard wrote:

I have an ISO-8859-1 file containing things like
"Hello\u000d\u0 00aWorld", i.e. the character '\', followed by the
character 'u' and then '0', etc.
>
What is the easiest way to automatically translate these codes into
unicode characters ?

If the file really contains the escape sequences use "unicode-escape" as the
encoding:

>>"Hello\\u000d \\u000aWorld".d ecode("unicode-escape")

u'Hello\r\nWorl d'

If it contains the raw bytes use "iso-8859-1":

>>"Hello\x0d\x0 aWorld".decode( "iso-8859-1")

u'Hello\r\nWorl d'

Open the file with

codecs.open(fil ename, encoding=encodi ng_as_determine d_above)

instead of the builtin open().

Peter

Interpreting string containing \u000a

Interpreting string containing \u000a

Comment

Comment