locale.CODESET / different in python shell and scripts

**Martin v. Löwis** · Jul 18 '05, 10:34 AM

Re: locale.CODESET / different in python shell and scripts

Nuff Said wrote:[color=blue]
> When I type the following code in the interactive python shell,
> I get 'UTF-8'; but if I put the code into a Python script and
> run the script - in the same terminal on my Linux box in which
> I opened the python shell before -, I get 'ANSI_X3.4-1968'.
>
> How does that come?[/color]

Because, for some reason, locale.setlocal e() is called in your
interactive startup, but not in the normal startup.

It is uncertain why this happens - setlocale is not normally
called automatically; not even in interactive mode. Perhaps
you have created your own startup file?

Regards,
Martin

**Michael Hudson** · Jul 18 '05, 10:35 AM

Re: locale.CODESET / different in python shell and scripts

"Martin v. LÃ¶wis" <martin@v.loewi s.de> writes:
[color=blue]
> Nuff Said wrote:[color=green]
> > When I type the following code in the interactive python shell,
> > I get 'UTF-8'; but if I put the code into a Python script and
> > run the script - in the same terminal on my Linux box in which
> > I opened the python shell before -, I get 'ANSI_X3.4-1968'.
> > How does that come?[/color]
>
> Because, for some reason, locale.setlocal e() is called in your
> interactive startup, but not in the normal startup.
>
> It is uncertain why this happens - setlocale is not normally
> called automatically; not even in interactive mode. Perhaps
> you have created your own startup file?[/color]

readline calls setlocale() iirc.

Cheers,
mwh

--
Not only does the English Language borrow words from other
languages, it sometimes chases them down dark alleys, hits
them over the head, and goes through their pockets. -- Eddy Peters

**Martin v. LÃ¶wis** · Jul 18 '05, 10:36 AM

Re: locale.CODESET / different in python shell and scripts

Michael Hudson wrote:[color=blue][color=green]
>>It is uncertain why this happens - setlocale is not normally
>>called automatically; not even in interactive mode. Perhaps
>>you have created your own startup file?[/color]
>
>
> readline calls setlocale() iirc.[/color]

Sure. However, we restore the locale to what it was before
readline initialization messes with the locale.

Regards,
Martin

**Nuff Said** · Jul 18 '05, 10:36 AM

Re: locale.CODESET / different in python shell and scripts

On Tue, 27 Apr 2004 22:29:59 +0200, Martin v. Löwis wrote:[color=blue]
> Because, for some reason, locale.setlocal e() is called in your
> interactive startup, but not in the normal startup.
>
> It is uncertain why this happens - setlocale is not normally
> called automatically; not even in interactive mode. Perhaps
> you have created your own startup file?[/color]

I use two Python versions on my Linux box (Fedora Core 1):
the Python 2.2 which came with Fedora and a Python 2.3 which
I compiled myself. (I didn't tinker with the last one;
Fedora's Python is a (well known) mess.)

Both Python versions give me 'ANSI_X3.4-1968' when I run a script
with 'print locale.nl_langi nfo(locale.CODE SET)'.
When I execute the same command in an interactive Python shell,
I get the (correct) 'UTF-8'.

(By 'correct', I mean that the bash command 'locale' gives me
'LANG=en_US.UTF-8, LC_CTYPE="en_US .UTF-8", ...'. This seems to
be correct, because e.g. the 'less ...' command shows files which
are UTF-8 encoded in the correct way; files which are e.g.
'ISO-8859-1' encoded are not shown in the correct way.)

Things are getting even worse:

I write a Python script which uses Unicode strings; now I want
to 'print ...' one of those strings (containing non-ASCII characters;
e.g. German umlauts).
With Fedora's Python 2.2 I have to use 'print s.encode('ISO-8859-1')
or something similar.
With my self-compiled Python 2.3, I have to use (the expected)
'print s.encode('UTF-8')' (though it shows me 'ANSI_X3.4-1968' when
using 'print locale.nl_langi nfo(locale.CODE SET)' in the same file).

???

Any ideas what's going wrong here?

(I tried 'python -S ...'; doesn't make a difference.)

**Martin v. Löwis** · Jul 18 '05, 10:38 AM

Re: locale.CODESET / different in python shell and scripts

Nuff Said wrote:[color=blue]
> Both Python versions give me 'ANSI_X3.4-1968' when I run a script
> with 'print locale.nl_langi nfo(locale.CODE SET)'.
> When I execute the same command in an interactive Python shell,
> I get the (correct) 'UTF-8'.[/color]

PLEASE invoke

locale.setlocal e(locale.LC_ALL , "")

before invoking nl_langinfo. Different C libraries behave differently
in their nl_langinfo responses if setlocale hasn't been called.

Regards,
Martin

**Nuff Said** · Jul 18 '05, 10:38 AM

Re: locale.CODESET / different in python shell and scripts

On Thu, 29 Apr 2004 22:14:23 +0200, Martin v. Löwis wrote:[color=blue]
> PLEASE invoke
>
> locale.setlocal e(locale.LC_ALL , "")
>
> before invoking nl_langinfo. Different C libraries behave differently
> in their nl_langinfo responses if setlocale hasn't been called.[/color]

Thanks a lot for your help!

That solved (part of) the problem; now I get 'UTF-8' (which is correct)
when running the following script (with either my self-compiled Python
2.3 or Fedora's Python 2.2):

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import locale

locale.setlocal e(locale.LC_ALL , "")
encoding = locale.nl_langi nfo(locale.CODE SET)
print encoding

Still, one problem remains:

When I add the following line to the above script

print u"schönes Mädchen".encode (encoding)

the result is:

schönes Mädchen (with my self-compiled Python 2.3)
schÃ¶nes MÃ¤dchen (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?

(Is that even possible? I recall something about a UCS2 resp.
UCS4 switch when compiling Python; but without Unicode support?
And if it would be possible, shouldn't a Python without Unicode
support disallow strings of the form u"..." resp. show a warning???)

This really drives me nuts because I thought the above approach
should be the correct way to assure that Python scripts can print
non-ASCII characters on any terminal (which is able to display
those characters in some encoding as UTF-8, ISO-8859-x, ...).

Is there something I do utterly wrong here?
Python can't be that complicated?

Nuff.

**Martin v. Löwis** · Jul 18 '05, 10:38 AM

Re: locale.CODESET / different in python shell and scripts

Nuff Said wrote:[color=blue]
> When I add the following line to the above script
>
> print u"schönes Mädchen".encode (encoding)
>
> the result is:
>
> schönes Mädchen (with my self-compiled Python 2.3)
> schÃ¶nes MÃ¤dchen (with Fedora's Python 2.2)
>
> I observed, that my Python gives me (the correct value) 15 for
> len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
> for each German umlaut, i.e. the len of the UTF-8 representation of
> the string; observe, that the file uses the coding cookie for UTF-8).
> Maybe Fedora's Python was compiled without Unicode support?[/color]

Certainly not: It would not support u"" literals without Unicode.

Please understand that you can use non-ASCII characters in source
code unless you also use the facilities described in

PEP 263 – Defining Python Source Code Encodings | peps.python.org

http://www.python.org/peps/pep-0263.html

This PEP proposes to introduce a syntax to declare the encoding of a Python source file. The encoding information is then used by the Python parser to interpret the file using the given encoding. Most notably this enhances the interpretation of Unicode ...

So instead of "ö", you should write "\xf6".
[color=blue]
> Is there something I do utterly wrong here?[/color]

Yes, you are.
[color=blue]
> Python can't be that complicated?[/color]

Python is not. Encodings are.

Regards,
Martin

**Nuff Said** · Jul 18 '05, 10:39 AM

Re: locale.CODESET / different in python shell and scripts

On Fri, 30 Apr 2004 04:30:34 +0200, Martin v. Löwis wrote:
[color=blue]
> Nuff Said wrote:[color=green]
>> When I add the following line to the above script
>>
>> print u"schönes Mädchen".encode (encoding)
>>
>> the result is:
>>
>> schönes Mädchen (with my self-compiled Python 2.3)
>> schÃ¶nes MÃ¤dchen (with Fedora's Python 2.2)
>>
>> I observed, that my Python gives me (the correct value) 15 for
>> len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
>> for each German umlaut, i.e. the len of the UTF-8 representation of
>> the string; observe, that the file uses the coding cookie for UTF-8).
>> Maybe Fedora's Python was compiled without Unicode support?[/color]
>
> Certainly not: It would not support u"" literals without Unicode.[/color]

That's what I thought.

[color=blue]
> Please understand that you can use non-ASCII characters in source
> code unless you also use the facilities described in
>
> http://www.python.org/peps/pep-0263.html
>
> So instead of "ö", you should write "\xf6".[/color]

But *I do use* the line

# -*- coding: UTF-8 -*-

from your PEP (directly after the shebang-line; s. the full source
code in my earlier posting). I thought, that allows me to write u"ö"
(which - as described above - works in one of my two Pythons).

??? Nuff.

**Nuff Said** · Jul 18 '05, 10:39 AM

Re: locale.CODESET / different in python shell and scripts

On Fri, 30 Apr 2004 11:56:19 +0200, Nuff Said wrote:[color=blue]
> But *I do use* the line
>
> # -*- coding: UTF-8 -*-
>
> from your PEP (directly after the shebang-line; s. the full source
> code in my earlier posting). I thought, that allows me to write u"ö"
> (which - as described above - works in one of my two Pythons).[/color]

Follow up to myself:

Arrgh!!! Think I got it now. Your PEP 263: 'Source Code Encodings' was
incorporated into Python 2.3 (i.e. my self-compiled Python) but not
into Python 2.2 (Fedora's Python).

Thanks for your help!

locale.CODESET / different in python shell and scripts

locale.CODESET / different in python shell and scripts

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment