locale.CODESET / different in python shell and scripts

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Nuff Said

    locale.CODESET / different in python shell and scripts

    When I type the following code in the interactive python shell,
    I get 'UTF-8'; but if I put the code into a Python script and
    run the script - in the same terminal on my Linux box in which
    I opened the python shell before -, I get 'ANSI_X3.4-1968'.

    How does that come?

    Thanks in advance for your answers! Nuff.


    The Code:

    import locale
    print locale.nl_langi nfo(locale.CODE SET)

  • Martin v. Löwis

    #2
    Re: locale.CODESET / different in python shell and scripts

    Nuff Said wrote:[color=blue]
    > When I type the following code in the interactive python shell,
    > I get 'UTF-8'; but if I put the code into a Python script and
    > run the script - in the same terminal on my Linux box in which
    > I opened the python shell before -, I get 'ANSI_X3.4-1968'.
    >
    > How does that come?[/color]

    Because, for some reason, locale.setlocal e() is called in your
    interactive startup, but not in the normal startup.

    It is uncertain why this happens - setlocale is not normally
    called automatically; not even in interactive mode. Perhaps
    you have created your own startup file?

    Regards,
    Martin

    Comment

    • Michael Hudson

      #3
      Re: locale.CODESET / different in python shell and scripts

      "Martin v. Löwis" <martin@v.loewi s.de> writes:
      [color=blue]
      > Nuff Said wrote:[color=green]
      > > When I type the following code in the interactive python shell,
      > > I get 'UTF-8'; but if I put the code into a Python script and
      > > run the script - in the same terminal on my Linux box in which
      > > I opened the python shell before -, I get 'ANSI_X3.4-1968'.
      > > How does that come?[/color]
      >
      > Because, for some reason, locale.setlocal e() is called in your
      > interactive startup, but not in the normal startup.
      >
      > It is uncertain why this happens - setlocale is not normally
      > called automatically; not even in interactive mode. Perhaps
      > you have created your own startup file?[/color]

      readline calls setlocale() iirc.

      Cheers,
      mwh

      --
      Not only does the English Language borrow words from other
      languages, it sometimes chases them down dark alleys, hits
      them over the head, and goes through their pockets. -- Eddy Peters

      Comment

      • Martin v. Löwis

        #4
        Re: locale.CODESET / different in python shell and scripts

        Michael Hudson wrote:[color=blue][color=green]
        >>It is uncertain why this happens - setlocale is not normally
        >>called automatically; not even in interactive mode. Perhaps
        >>you have created your own startup file?[/color]
        >
        >
        > readline calls setlocale() iirc.[/color]

        Sure. However, we restore the locale to what it was before
        readline initialization messes with the locale.

        Regards,
        Martin

        Comment

        • Nuff Said

          #5
          Re: locale.CODESET / different in python shell and scripts

          On Tue, 27 Apr 2004 22:29:59 +0200, Martin v. Löwis wrote:[color=blue]
          > Because, for some reason, locale.setlocal e() is called in your
          > interactive startup, but not in the normal startup.
          >
          > It is uncertain why this happens - setlocale is not normally
          > called automatically; not even in interactive mode. Perhaps
          > you have created your own startup file?[/color]

          I use two Python versions on my Linux box (Fedora Core 1):
          the Python 2.2 which came with Fedora and a Python 2.3 which
          I compiled myself. (I didn't tinker with the last one;
          Fedora's Python is a (well known) mess.)

          Both Python versions give me 'ANSI_X3.4-1968' when I run a script
          with 'print locale.nl_langi nfo(locale.CODE SET)'.
          When I execute the same command in an interactive Python shell,
          I get the (correct) 'UTF-8'.

          (By 'correct', I mean that the bash command 'locale' gives me
          'LANG=en_US.UTF-8, LC_CTYPE="en_US .UTF-8", ...'. This seems to
          be correct, because e.g. the 'less ...' command shows files which
          are UTF-8 encoded in the correct way; files which are e.g.
          'ISO-8859-1' encoded are not shown in the correct way.)


          Things are getting even worse:

          I write a Python script which uses Unicode strings; now I want
          to 'print ...' one of those strings (containing non-ASCII characters;
          e.g. German umlauts).
          With Fedora's Python 2.2 I have to use 'print s.encode('ISO-8859-1')
          or something similar.
          With my self-compiled Python 2.3, I have to use (the expected)
          'print s.encode('UTF-8')' (though it shows me 'ANSI_X3.4-1968' when
          using 'print locale.nl_langi nfo(locale.CODE SET)' in the same file).

          ???

          Any ideas what's going wrong here?

          (I tried 'python -S ...'; doesn't make a difference.)

          Comment

          • Martin v. Löwis

            #6
            Re: locale.CODESET / different in python shell and scripts

            Nuff Said wrote:[color=blue]
            > Both Python versions give me 'ANSI_X3.4-1968' when I run a script
            > with 'print locale.nl_langi nfo(locale.CODE SET)'.
            > When I execute the same command in an interactive Python shell,
            > I get the (correct) 'UTF-8'.[/color]

            PLEASE invoke

            locale.setlocal e(locale.LC_ALL , "")

            before invoking nl_langinfo. Different C libraries behave differently
            in their nl_langinfo responses if setlocale hasn't been called.

            Regards,
            Martin

            Comment

            • Nuff Said

              #7
              Re: locale.CODESET / different in python shell and scripts

              On Thu, 29 Apr 2004 22:14:23 +0200, Martin v. Löwis wrote:[color=blue]
              > PLEASE invoke
              >
              > locale.setlocal e(locale.LC_ALL , "")
              >
              > before invoking nl_langinfo. Different C libraries behave differently
              > in their nl_langinfo responses if setlocale hasn't been called.[/color]

              Thanks a lot for your help!

              That solved (part of) the problem; now I get 'UTF-8' (which is correct)
              when running the following script (with either my self-compiled Python
              2.3 or Fedora's Python 2.2):

              #!/usr/bin/env python
              # -*- coding: UTF-8 -*-

              import locale

              locale.setlocal e(locale.LC_ALL , "")
              encoding = locale.nl_langi nfo(locale.CODE SET)
              print encoding


              Still, one problem remains:

              When I add the following line to the above script

              print u"schönes Mädchen".encode (encoding)

              the result is:

              schönes Mädchen (with my self-compiled Python 2.3)
              schönes Mädchen (with Fedora's Python 2.2)

              I observed, that my Python gives me (the correct value) 15 for
              len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
              for each German umlaut, i.e. the len of the UTF-8 representation of
              the string; observe, that the file uses the coding cookie for UTF-8).
              Maybe Fedora's Python was compiled without Unicode support?

              (Is that even possible? I recall something about a UCS2 resp.
              UCS4 switch when compiling Python; but without Unicode support?
              And if it would be possible, shouldn't a Python without Unicode
              support disallow strings of the form u"..." resp. show a warning???)


              This really drives me nuts because I thought the above approach
              should be the correct way to assure that Python scripts can print
              non-ASCII characters on any terminal (which is able to display
              those characters in some encoding as UTF-8, ISO-8859-x, ...).

              Is there something I do utterly wrong here?
              Python can't be that complicated?

              Nuff.

              Comment

              • Martin v. Löwis

                #8
                Re: locale.CODESET / different in python shell and scripts

                Nuff Said wrote:[color=blue]
                > When I add the following line to the above script
                >
                > print u"schönes Mädchen".encode (encoding)
                >
                > the result is:
                >
                > schönes Mädchen (with my self-compiled Python 2.3)
                > schönes Mädchen (with Fedora's Python 2.2)
                >
                > I observed, that my Python gives me (the correct value) 15 for
                > len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
                > for each German umlaut, i.e. the len of the UTF-8 representation of
                > the string; observe, that the file uses the coding cookie for UTF-8).
                > Maybe Fedora's Python was compiled without Unicode support?[/color]

                Certainly not: It would not support u"" literals without Unicode.

                Please understand that you can use non-ASCII characters in source
                code unless you also use the facilities described in

                This PEP proposes to introduce a syntax to declare the encoding of a Python source file. The encoding information is then used by the Python parser to interpret the file using the given encoding. Most notably this enhances the interpretation of Unicode ...


                So instead of "ö", you should write "\xf6".
                [color=blue]
                > Is there something I do utterly wrong here?[/color]

                Yes, you are.
                [color=blue]
                > Python can't be that complicated?[/color]

                Python is not. Encodings are.

                Regards,
                Martin

                Comment

                • Nuff Said

                  #9
                  Re: locale.CODESET / different in python shell and scripts

                  On Fri, 30 Apr 2004 04:30:34 +0200, Martin v. Löwis wrote:
                  [color=blue]
                  > Nuff Said wrote:[color=green]
                  >> When I add the following line to the above script
                  >>
                  >> print u"schönes Mädchen".encode (encoding)
                  >>
                  >> the result is:
                  >>
                  >> schönes Mädchen (with my self-compiled Python 2.3)
                  >> schönes Mädchen (with Fedora's Python 2.2)
                  >>
                  >> I observed, that my Python gives me (the correct value) 15 for
                  >> len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
                  >> for each German umlaut, i.e. the len of the UTF-8 representation of
                  >> the string; observe, that the file uses the coding cookie for UTF-8).
                  >> Maybe Fedora's Python was compiled without Unicode support?[/color]
                  >
                  > Certainly not: It would not support u"" literals without Unicode.[/color]

                  That's what I thought.

                  [color=blue]
                  > Please understand that you can use non-ASCII characters in source
                  > code unless you also use the facilities described in
                  >
                  > http://www.python.org/peps/pep-0263.html
                  >
                  > So instead of "ö", you should write "\xf6".[/color]

                  But *I do use* the line

                  # -*- coding: UTF-8 -*-

                  from your PEP (directly after the shebang-line; s. the full source
                  code in my earlier posting). I thought, that allows me to write u"ö"
                  (which - as described above - works in one of my two Pythons).

                  ??? Nuff.


                  Comment

                  • Nuff Said

                    #10
                    Re: locale.CODESET / different in python shell and scripts

                    On Fri, 30 Apr 2004 11:56:19 +0200, Nuff Said wrote:[color=blue]
                    > But *I do use* the line
                    >
                    > # -*- coding: UTF-8 -*-
                    >
                    > from your PEP (directly after the shebang-line; s. the full source
                    > code in my earlier posting). I thought, that allows me to write u"ö"
                    > (which - as described above - works in one of my two Pythons).[/color]

                    Follow up to myself:

                    Arrgh!!! Think I got it now. Your PEP 263: 'Source Code Encodings' was
                    incorporated into Python 2.3 (i.e. my self-compiled Python) but not
                    into Python 2.2 (Fedora's Python).

                    Thanks for your help!

                    Comment

                    Working...