Re: str(bytes) in Python 3.0

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Christian Heimes

    Re: str(bytes) in Python 3.0

    Gabriel Genellina schrieb:
    On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
    above. But I get the same as repr(x) - is this on purpose?
    Yes, it's on purpose but it's a bug in your application to call str() on
    a bytes object or to compare bytes and unicode directly. Several months
    ago I added a bytes warning option to Python. Start Python as "python
    -bb" and try it again. ;)

    Christian

  • Kay Schluehr

    #2
    Re: str(bytes) in Python 3.0

    On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:
    Gabriel Genellina schrieb:
    >
    On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
    above. But I get the same as repr(x) - is this on purpose?
    >
    Yes, it's on purpose but it's a bug in your application to call str() on
    a bytes object or to compare bytes and unicode directly. Several months
    ago I added a bytes warning option to Python. Start Python as "python
    -bb" and try it again. ;)
    >
    Christian
    And making an utf-8 encoding default is not possible without writing a
    new function?

    Comment

    • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

      #3
      Re: str(bytes) in Python 3.0

      And making an utf-8 encoding default is not possible without writing a
      new function?
      There is no default encoding anymore in Python 3. This is by design,
      learning from the problems in Python 2.x.

      Regards,
      Martin

      Comment

      • Carl Banks

        #4
        Re: str(bytes) in Python 3.0

        On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@g mx.netwrote:
        On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:
        >
        Gabriel Genellina schrieb:
        >
        On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
        above. But I get the same as repr(x) - is this on purpose?
        >
        Yes, it's on purpose but it's a bug in your application to call str() on
        a bytes object or to compare bytes and unicode directly. Several months
        ago I added a bytes warning option to Python. Start Python as "python
        -bb" and try it again. ;)
        >
        Christian
        >
        And making an utf-8 encoding default is not possible without writing a
        new function?
        I believe the Zen in effect here is, "In the face of ambiguity, refuse
        the temptation to guess." How do you know if the bytes are utf-8
        encoded?

        I'm not sure if str() returning the repr() of a bytes object (when not
        passed an encoding) is the right thing, but it's probably better than
        throwing an exception. The problem is, str can't decide whether it's
        a type conversion operator or a formatted printing function--if it
        were strongly one or the other it would be a lot more obvious what to
        do.


        Carl Banks

        Comment

        • John J. Lee

          #5
          Re: str(bytes) in Python 3.0

          Christian Heimes <lists@cheimes. dewrites:
          Gabriel Genellina schrieb:
          >On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
          >above. But I get the same as repr(x) - is this on purpose?
          >
          Yes, it's on purpose but it's a bug in your application to call str() on
          a bytes object or to compare bytes and unicode directly. Several months
          ago I added a bytes warning option to Python. Start Python as "python
          -bb" and try it again. ;)
          Why hasn't the one-argument str(bytes_obj) been designed to raise an
          exception in Python 3?


          John

          Comment

          • Christian Heimes

            #6
            Re: str(bytes) in Python 3.0

            Carl Banks schrieb:
            I believe the Zen in effect here is, "In the face of ambiguity, refuse
            the temptation to guess." How do you know if the bytes are utf-8
            encoded?
            Indeed
            I'm not sure if str() returning the repr() of a bytes object (when not
            passed an encoding) is the right thing, but it's probably better than
            throwing an exception. The problem is, str can't decide whether it's
            a type conversion operator or a formatted printing function--if it
            were strongly one or the other it would be a lot more obvious what to
            do.
            I was against it and I also wanted to have 'egg' == b'egg' raise an
            exception but I was overruled by Guido. At least I was allowed to
            implement the byte warning feature (-b and -bb arguments). I *highly*
            recommend that everybody runs her unit tests with the -bb option.

            Christian

            Comment

            • Christian Heimes

              #7
              Re: str(bytes) in Python 3.0

              Carl Banks schrieb:
              I believe the Zen in effect here is, "In the face of ambiguity, refuse
              the temptation to guess." How do you know if the bytes are utf-8
              encoded?
              Indeed
              I'm not sure if str() returning the repr() of a bytes object (when not
              passed an encoding) is the right thing, but it's probably better than
              throwing an exception. The problem is, str can't decide whether it's
              a type conversion operator or a formatted printing function--if it
              were strongly one or the other it would be a lot more obvious what to
              do.
              I was against it and I also wanted to have 'egg' == b'egg' raise an
              exception but I was overruled by Guido. At least I was allowed to
              implement the byte warning feature (-b and -bb arguments). I *highly*
              recommend that everybody runs her unit tests with the -bb option.

              Christian

              Comment

              • Christian Heimes

                #8
                Re: str(bytes) in Python 3.0

                John J. Lee schrieb:
                Why hasn't the one-argument str(bytes_obj) been designed to raise an
                exception in Python 3?
                See for yourself:

                $ ./python
                Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
                [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
                Type "help", "copyright" , "credits" or "license" for more information.
                >>str(b'')
                "b''"
                [38544 refs]
                >>bytes("")
                Traceback (most recent call last):
                File "<stdin>", line 1, in <module>
                TypeError: string argument without an encoding
                [38585 refs]

                $ ./python -b
                Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
                [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
                Type "help", "copyright" , "credits" or "license" for more information.
                >>str(b'')
                __main__:1: BytesWarning: str() on a bytes instance
                "b''"
                [38649 refs]

                Christian

                Comment

                • Kay Schluehr

                  #9
                  Re: str(bytes) in Python 3.0

                  On 12 Apr., 16:29, Carl Banks <pavlovevide... @gmail.comwrote :
                  And making an utf-8 encoding default is not possible without writing a
                  new function?
                  >
                  I believe the Zen in effect here is, "In the face of ambiguity, refuse
                  the temptation to guess." How do you know if the bytes are utf-8
                  encoded?
                  How many "encodings" would you define for a Rectangle constructor?

                  Making things infinitely configurable is very nice and shows that the
                  programmer has worked hard. Sometimes however it suffices to provide a
                  mandatory default and some supplementary conversion methods. This
                  still won't exhaust all possible cases but provides a reasonable
                  coverage.

                  Comment

                  • Lorenzo Gatti

                    #10
                    Re: str(bytes) in Python 3.0

                    On Apr 12, 5:51 pm, Kay Schluehr <kay.schlu...@g mx.netwrote:
                    On 12 Apr., 16:29, Carl Banks <pavlovevide... @gmail.comwrote :
                    >
                    And making an utf-8 encoding default is not possible without writing a
                    new function?
                    >
                    I believe the Zen in effect here is, "In the face of ambiguity, refuse
                    the temptation to guess." How do you know if the bytes are utf-8
                    encoded?
                    >
                    How many "encodings" would you define for a Rectangle constructor?
                    >
                    Making things infinitely configurable is very nice and shows that the
                    programmer has worked hard. Sometimes however it suffices to provide a
                    mandatory default and some supplementary conversion methods. This
                    still won't exhaust all possible cases but provides a reasonable
                    coverage.
                    There is no sensible default because many incompatible encodings are
                    in common use; programmers need to take responsibility for tracking ot
                    guessing string encodings according to their needs, in ways that
                    depend on application architecture, characteristics of users and data,
                    and various risk and quality trade-offs.

                    In languages that, like Java, have a default encoding for convenience,
                    documents are routinely mangled by sloppy programmers who think that
                    they live in an ASCII or UTF-8 fairy land and that they don't need
                    tight control of the encoding of all text that enters and leaves the
                    system.
                    Ceasing to support this obsolete attitude with lenient APIs is the
                    only way forward; being forced to learn that encodings are important
                    is better than, say, discovering unrecoverable data corruption in a
                    working system.

                    Regards,
                    Lorenzo Gatti


                    Comment

                    • John Roth

                      #11
                      Re: str(bytes) in Python 3.0

                      On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:
                      Christian Heimes <li...@cheimes. dewrites:
                      Gabriel Genellina schrieb:
                      On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
                      above. But I get the same as repr(x) - is this on purpose?
                      >
                      Yes, it's on purpose but it's a bug in your application to call str() on
                      a bytes object or to compare bytes and unicode directly. Several months
                      ago I added a bytes warning option to Python. Start Python as "python
                      -bb" and try it again. ;)
                      >
                      Why hasn't the one-argument str(bytes_obj) been designed to raise an
                      exception in Python 3?
                      >
                      John
                      Because it's a fundamental rule that you should be able to call str()
                      on any object and get a sensible result.

                      The reason that calling str() on a bytes object returns a bytes
                      literal rather than an unadorned character string is that there are no
                      default encodings or decodings: there is no way of determining what
                      the corresponding string should be.

                      John Roth

                      Comment

                      • Dan Bishop

                        #12
                        Re: str(bytes) in Python 3.0

                        On Apr 12, 9:29 am, Carl Banks <pavlovevide... @gmail.comwrote :
                        On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@g mx.netwrote:
                        >
                        On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:
                        >
                        Gabriel Genellina schrieb:
                        >
                        On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
                        above. But I get the same as repr(x) - is this on purpose?
                        >
                        Yes, it's on purpose but it's a bug in your application to call str() on
                        a bytes object or to compare bytes and unicode directly. Several months
                        ago I added a bytes warning option to Python. Start Python as "python
                        -bb" and try it again. ;)
                        >
                        And making an utf-8 encoding default is not possible without writing a
                        new function?
                        >
                        I believe the Zen in effect here is, "In the face of ambiguity, refuse
                        the temptation to guess." How do you know if the bytes are utf-8
                        encoded?
                        True, you can't KNOW that. Maybe the author of those bytes actually
                        MEANT to say '¿Cómo estás?' instead of '¿Cómo estás?'. However,
                        it's statistically unlikely for a non-UTF-8-encoded string to just
                        happen to be valid UTF-8.

                        Comment

                        • Steve Holden

                          #13
                          Re: str(bytes) in Python 3.0

                          Dan Bishop wrote:
                          On Apr 12, 9:29 am, Carl Banks <pavlovevide... @gmail.comwrote :
                          >On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@g mx.netwrote:
                          >>
                          >>On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:
                          >>>Gabriel Genellina schrieb:
                          >>>>On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
                          >>>>above. But I get the same as repr(x) - is this on purpose?
                          >>>Yes, it's on purpose but it's a bug in your application to call str() on
                          >>>a bytes object or to compare bytes and unicode directly. Several months
                          >>>ago I added a bytes warning option to Python. Start Python as "python
                          >>>-bb" and try it again. ;)
                          >>And making an utf-8 encoding default is not possible without writing a
                          >>new function?
                          >I believe the Zen in effect here is, "In the face of ambiguity, refuse
                          >the temptation to guess." How do you know if the bytes are utf-8
                          >encoded?
                          >
                          True, you can't KNOW that. Maybe the author of those bytes actually
                          MEANT to say '¿Cómo estás?' instead of '¿Cómo estás?'. However,
                          it's statistically unlikely for a non-UTF-8-encoded string to just
                          happen to be valid UTF-8.
                          So you propose to perform a statistical analysis on your input to
                          determine whether it's UTF-8 or some other encoding?

                          regards
                          Steve
                          --
                          Steve Holden +1 571 484 6266 +1 800 494 3119
                          Holden Web LLC http://www.holdenweb.com/

                          Comment

                          • Gabriel Genellina

                            #14
                            Re: str(bytes) in Python 3.0

                            En Sat, 12 Apr 2008 11:25:59 -0300, Martin v. Löwis <martin@v.loewi s.de>
                            escribió:
                            >And making an utf-8 encoding default is not possible without writing a
                            >new function?
                            >
                            There is no default encoding anymore in Python 3. This is by design,
                            learning from the problems in Python 2.x.
                            So sys.getdefaulte ncoding() will disappear? Currently it returns "utf-8".
                            In case it stays, what is it used for?

                            --
                            Gabriel Genellina

                            Comment

                            • Terry Reedy

                              #15
                              Re: str(bytes) in Python 3.0


                              "John Roth" <johnroth1@gmai l.comwrote in message
                              news:29f280cc-4b33-4863-beee-d231df3d9a61@u3 g2000hsc.google groups.com...
                              | On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:
                              | Christian Heimes <li...@cheimes. dewrites:
                              | Gabriel Genellina schrieb:
                              | On the last line, str(x), I would expect 'abc' - same as str(x,
                              'ascii')
                              | above. But I get the same as repr(x) - is this on purpose?
                              | >
                              | Yes, it's on purpose but it's a bug in your application to call str()
                              on
                              | a bytes object or to compare bytes and unicode directly. Several
                              months
                              | ago I added a bytes warning option to Python. Start Python as
                              "python
                              | -bb" and try it again. ;)
                              | >
                              | Why hasn't the one-argument str(bytes_obj) been designed to raise an
                              | exception in Python 3?
                              | >
                              | John
                              |
                              | Because it's a fundamental rule that you should be able to call str()
                              | on any object and get a sensible result.
                              |
                              | The reason that calling str() on a bytes object returns a bytes
                              | literal rather than an unadorned character string is that there are no
                              | default encodings or decodings: there is no way of determining what
                              | the corresponding string should be.

                              In having a double meaning, str is much like type. Type(obj) echoes the
                              existing class of the object. Type(o,p,q) attempts to construct a new
                              class. Similarly, Str(obj) gives a string representing the obj (which, for
                              a string, is the string;-). Str(obj,obj2) attemps to construct a new
                              string.

                              tjr



                              Comment

                              Working...