PEP-0263 and default encoding

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Klaus Alexander Seistrup

    PEP-0263 and default encoding

    Hi,

    After upgrading my Python interpreter to 2.3.1 I constantly get
    warnings like this:

    DeprecationWarn ing: Non-ASCII character '\xe6' in file
    mumble.py on line 2, but no encoding declared;
    see http://www.python.org/peps/pep-0263.html for details

    And while I understand the problem, I cannot fathom why Python
    doesn't simply rely on the encoding I have specified in site.py,
    which then calls sys.setdefaulte ncoding().

    Would anyone care to explain why the community has chosen to
    inconvenience the user for each python script with non-ASCII
    characters, rather than using the default encoding given in the
    site.py configuration file?

    Cheers,

    // Klaus

    --[color=blue]
    ><> o mordo tua nuora, o aro un autodromo[/color]
  • News M Claveau /Hamster-P

    #2
    Re: PEP-0263 and default encoding

    Yes ! Why ?

    And, me (french), i add :
    # -*- coding: cp1252 -*-
    at begin of my scripts. But... and if i want do a script for two, three,
    etc. languages ?

    or

    How have Ascii AND non-ascii caracters in scripts ?


    * sorry for my bad english *


    @-salutations
    --
    Michel Claveau




    Comment

    • John Roth

      #3
      Re: PEP-0263 and default encoding


      "News M Claveau /Hamster-P" <essai1@mci.loc al> wrote in message
      news:blaau5$b0r $1@news-reader3.wanadoo .fr...[color=blue]
      > Yes ! Why ?
      >
      > And, me (french), i add :
      > # -*- coding: cp1252 -*-
      > at begin of my scripts. But... and if i want do a script for two,[/color]
      three,[color=blue]
      > etc. languages ?
      >
      > or
      >
      > How have Ascii AND non-ascii caracters in scripts ?
      >
      >
      > * sorry for my bad english *[/color]

      Use UTF-8. That's what it's there for.

      Remember that the actual Python program has to be in
      ASCII - only the text in string literals can be in different
      character sets. I'm not sure about comments.

      John Roth
      [color=blue]
      >
      >
      > @-salutations
      > --
      > Michel Claveau
      >
      >
      >
      >[/color]


      Comment

      • Klaus Alexander Seistrup

        #4
        Re: PEP-0263 and default encoding

        John Roth wrote:
        [color=blue]
        > Remember that the actual Python program has to be in ASCII -
        > only the text in string literals can be in different character
        > sets. I'm not sure about comments.[/color]

        Python barfs if there are non-ASCII characters in comments and there
        is no coding-line.

        Still beats me why it doesn't use the sys.getdefaulte ncoding() instead
        of inconveniencing me.


        // Klaus

        --[color=blue]
        ><> unselfish actions pay back better[/color]

        Comment

        • Duncan Booth

          #5
          Re: PEP-0263 and default encoding

          Klaus Alexander Seistrup <spam@magneti c-ink.dk> wrote in news:3f79171c-
          d4c587aa-d4ac-44b9-97da-5e0024d4d268@ne ws.szn.dk:[color=blue]
          >
          > Still beats me why it doesn't use the sys.getdefaulte ncoding() instead
          > of inconveniencing me.
          >[/color]

          I think the reasoning was that you might give your scripts to someone else
          who has a different default encoding and it would then fail obscurely. A
          script should be portable, and that means it can't depend on things like
          the default encoding.

          i.e. Its an attempt to satisfy both of these:
          Explicit is better than implicit.
          Errors should never pass silently.

          --
          Duncan Booth duncan@rcp.co.u k
          int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
          "\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

          Comment

          • Martin v. Löwis

            #6
            Re: PEP-0263 and default encoding

            Klaus Alexander Seistrup <rasmus.klump@m yorange.dk> writes:
            [color=blue]
            > And while I understand the problem, I cannot fathom why Python
            > doesn't simply rely on the encoding I have specified in site.py,
            > which then calls sys.setdefaulte ncoding().[/color]

            There are several reasons. Procedurally, there was no suggestion to do
            so while the PEP was being discussed, and it was posted both to
            comp.lang.pytho n and python-dev several times. At the time, the most
            common comment was that Python should just reject any user-defined
            encoding, and declare that source code files are always UTF-8,
            period. The PEP gives some more flexibility over that position.

            Methodically, requiring the encoding to be declared in the source code
            is a good thing, as it allows to move code around across systems,
            which would not be that easy if the source encoding was part of the
            Python installation. Explicit is better than implicit.
            [color=blue]
            > Would anyone care to explain why the community has chosen to
            > inconvenience the user for each python script with non-ASCII
            > characters, rather than using the default encoding given in the
            > site.py configuration file?[/color]

            It is not clear to my why you want that. There are several
            possible rationales:

            1. You are have problem with existing code, and you are annoyed
            by the warning. Just silence the warning in site.py.

            2. You are writing new code, and you are annoyed by the encoding
            declaration. Just save your code as UTF-8, using the UTF-8 BOM.

            In neither case, relying on the system default encoding is necessary.

            Regards,
            Martin

            Comment

            • Martin v. Löwis

              #7
              Re: PEP-0263 and default encoding

              Klaus Alexander Seistrup <spam@magneti c-ink.dk> writes:
              [color=blue]
              > Python barfs if there are non-ASCII characters in comments and there
              > is no coding-line.[/color]

              Not necessarily. An UTF-8 BOM would do just as well.
              [color=blue]
              > Still beats me why it doesn't use the sys.getdefaulte ncoding() instead
              > of inconveniencing me.[/color]

              See my other message.

              REgards,
              Martin

              Comment

              • John Roth

                #8
                Re: PEP-0263 and default encoding


                "Martin v. Löwis" <martin@v.loewi s.de> wrote in message
                news:m3k77qdkey .fsf@mira.infor matik.hu-berlin.de...[color=blue]
                > Klaus Alexander Seistrup <rasmus.klump@m yorange.dk> writes:
                >
                >
                > 2. You are writing new code, and you are annoyed by the encoding
                > declaration. Just save your code as UTF-8, using the UTF-8 BOM.[/color]

                The problem with the UTF-8 BOM is that it precludes using the #! header
                line under Linux/Unix. Otherwise, it's a great solution.

                John Roth
                [color=blue]
                >
                > Regards,
                > Martin[/color]


                Comment

                • Martin v. Löwis

                  #9
                  Re: PEP-0263 and default encoding

                  "John Roth" <newsgroups@jhr othjr.com> writes:
                  [color=blue][color=green]
                  > > 2. You are writing new code, and you are annoyed by the encoding
                  > > declaration. Just save your code as UTF-8, using the UTF-8 BOM.[/color]
                  >
                  > The problem with the UTF-8 BOM is that it precludes using the #! header
                  > line under Linux/Unix. Otherwise, it's a great solution.[/color]

                  Indeed, in an executable script, you would use the encoding
                  declaration - or you would restrict yourself to ASCII only in the
                  script file (which might, as its only action, invoke a function from a
                  library, in which case there really isn't much need for non-ASCII
                  characters).

                  OTOH, I do hope that Unix, some day, recognizes UTF-8-BOM-#-! as
                  executable file.

                  Regards,
                  Martin

                  Comment

                  • Klaus Alexander Seistrup

                    #10
                    Re: PEP-0263 and default encoding

                    Martin v. Löwis wrote:
                    [color=blue]
                    > Just save your code as UTF-8, using the UTF-8 BOM.[/color]

                    Please, could you explain what you mean by "the UTF-8 BOM"?
                    [color=blue]
                    > In neither case, relying on the system default encoding is
                    > necessary.[/color]

                    I'd still prefer Python to rely on the system default encoding.
                    I can't see why it's there if Python ignores it.


                    // Klaus

                    --[color=blue]
                    ><> unselfish actions pay back better[/color]

                    Comment

                    • Erik Max Francis

                      #11
                      Re: PEP-0263 and default encoding

                      Klaus Alexander Seistrup wrote:
                      [color=blue]
                      > Please, could you explain what you mean by "the UTF-8 BOM"?[/color]

                      Byte order marker. It's a clever gimmick Unicode uses, where a few
                      valid Unicode characters are set aside for being used in sequence to
                      help determine whether an encoded Unicode stream is little-endian or
                      big-endian.

                      --
                      Erik Max Francis && max@alcyone.com && http://www.alcyone.com/max/
                      __ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
                      / \ People say that life is the thing, but I prefer reading.
                      \__/ Logan Pearsall Smith

                      Comment

                      • Klaus Alexander Seistrup

                        #12
                        Re: PEP-0263 and default encoding

                        Duncan Booth skrev:
                        [color=blue][color=green]
                        >> Still beats me why it doesn't use the sys.getdefaulte ncoding()
                        >> instead of inconveniencing me.[/color]
                        >
                        > I think the reasoning was that you might give your scripts to
                        > someone else who has a different default encoding and it would
                        > then fail obscurely.[/color]

                        You're probably right that's part of the reason.


                        // Klaus

                        --[color=blue]
                        ><> unselfish actions pay back better[/color]

                        Comment

                        • Klaus Alexander Seistrup

                          #13
                          Re: PEP-0263 and default encoding

                          Martin v. Löwis skrev:
                          [color=blue][color=green]
                          >> I cannot fathom why Python doesn't simply rely on the encoding
                          >> I have specified in site.py, which then calls setdefaultencod ing().[/color]
                          >
                          > There are several reasons. Procedurally, there was no suggestion
                          > to do so while the PEP was being discussed, and it was posted both
                          > to comp.lang.pytho n and python-dev several times.[/color]

                          It's a pity I didn't read c.l.python at that time, or I would have
                          protested.
                          [color=blue]
                          > It is not clear to my why you want that. There are several
                          > possible rationales:
                          >
                          > 1. You are have problem with existing code, and you are annoyed
                          > by the warning.[/color]

                          Yes, I literally have hundreds of scripts with non-ASCII characters
                          in them - even if it's just in the comments. Scripts that ran
                          silently, e.g. from crond. Now I have to manually correct each and
                          every script, even if I have stated in site.py that the default
                          encoding is iso-8859-1.
                          [color=blue]
                          > Just silence the warning in site.py.[/color]

                          Where and how in site.py can I do that?
                          [color=blue]
                          > 2. You are writing new code, and you are annoyed by the encoding
                          > declaration. Just save your code as UTF-8, using the UTF-8 BOM.[/color]

                          Would that help? I put a BOM in a python script just to test your
                          suggestion, and I got an unrelated exception: "SyntaxErro r: EOL
                          while scanning single-quoted string". That's even worse than a
                          DeprecationWarn ing.
                          [color=blue]
                          > In neither case, relying on the system default encoding is necessary.[/color]

                          I'd prefer Python to rely on the system default encoding unless I
                          have explicitly stated that the script is written using another
                          encoding.


                          // Klaus

                          --[color=blue]
                          ><> unselfish actions pay back better[/color]

                          Comment

                          • Klaus Alexander Seistrup

                            #14
                            Re: PEP-0263 and default encoding

                            Erik Max Francis skrev:
                            [color=blue][color=green]
                            >> Please, could you explain what you mean by "the UTF-8 BOM"?[/color]
                            >
                            > Byte order marker. It's a clever gimmick Unicode uses, where a few
                            > valid Unicode characters are set aside for being used in sequence to
                            > help determine whether an encoded Unicode stream is little-endian or
                            > big-endian.[/color]

                            Thanks, I also found a reference on unicode.org¹ that was useful.


                            // Klaus

                            ¹) <http://www.unicode.org/unicode/faq/utf_bom.html>
                            --[color=blue]
                            ><> unselfish actions pay back better[/color]

                            Comment

                            • Alex Martelli

                              #15
                              Re: PEP-0263 and default encoding

                              Klaus Alexander Seistrup wrote:
                              ...[color=blue][color=green]
                              >> Just silence the warning in site.py.[/color]
                              >
                              > Where and how in site.py can I do that?[/color]

                              Module warnings is well worth studying. You can insert (just about
                              anywhere you prefer in your site.py or site-customize py) the lines:

                              import warnings
                              warnings.filter warnings('ignor e', 'Non-ASCII character .*/peps/pep-0263',
                              DeprecationWarn ing)

                              this tells Python to ignore all warning messages of class DeprecationWarn ing
                              which match (case-insensitively) the regular expression given as the second
                              parameter of warnings.filter warnings (choose the RE you prefer, of course --
                              here, I'm asking that the warning message to be ignored start with
                              "Non-ASCII character " and contain "/peps/pep-0263" anywhere afterwards,
                              but you may easily choose to be either more or less permissive than this).


                              Alex



                              Comment

                              Working...