Re: str(bytes) in Python 3.0

**Kay Schluehr** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:

Gabriel Genellina schrieb:
>

On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

>
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)
>
Christian

And making an utf-8 encoding default is not possible without writing a
new function?

**=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

And making an utf-8 encoding default is not possible without writing a

new function?

There is no default encoding anymore in Python 3. This is by design,
learning from the problems in Python 2.x.

Regards,
Martin

**Carl Banks** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@g mx.netwrote:

On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:
>

Gabriel Genellina schrieb:

>

On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

>

Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

>

Christian

>
And making an utf-8 encoding default is not possible without writing a
new function?

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.

Carl Banks

**John J. Lee** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

Christian Heimes <lists@cheimes. dewrites:

Gabriel Genellina schrieb:

>On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
>above. But I get the same as repr(x) - is this on purpose?

>
Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?

John

**Christian Heimes** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

Carl Banks schrieb:

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

Indeed

I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.

I was against it and I also wanted to have 'egg' == b'egg' raise an
exception but I was overruled by Guido. At least I was allowed to
implement the byte warning feature (-b and -bb arguments). I *highly*
recommend that everybody runs her unit tests with the -bb option.

Christian

**Christian Heimes** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

Carl Banks schrieb:

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

Indeed

I'm not sure if str() returning the repr() of a bytes object (when not
passed an encoding) is the right thing, but it's probably better than
throwing an exception. The problem is, str can't decide whether it's
a type conversion operator or a formatted printing function--if it
were strongly one or the other it would be a lot more obvious what to
do.

I was against it and I also wanted to have 'egg' == b'egg' raise an
exception but I was overruled by Guido. At least I was allowed to
implement the byte warning feature (-b and -bb arguments). I *highly*
recommend that everybody runs her unit tests with the -bb option.

Christian

**Christian Heimes** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

John J. Lee schrieb:

Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?

See for yourself:

$ ./python
Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.

>>str(b'')

"b''"
[38544 refs]

>>bytes("")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
[38585 refs]

$ ./python -b
Python 3.0a4+ (py3k:0, Apr 11 2008, 15:31:31)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.

>>str(b'')

__main__:1: BytesWarning: str() on a bytes instance
"b''"
[38649 refs]

Christian

**Kay Schluehr** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

On 12 Apr., 16:29, Carl Banks <pavlovevide... @gmail.comwrote :

And making an utf-8 encoding default is not possible without writing a
new function?

>
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

How many "encodings" would you define for a Rectangle constructor?

Making things infinitely configurable is very nice and shows that the
programmer has worked hard. Sometimes however it suffices to provide a
mandatory default and some supplementary conversion methods. This
still won't exhaust all possible cases but provides a reasonable
coverage.

**Lorenzo Gatti** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

On Apr 12, 5:51 pm, Kay Schluehr <kay.schlu...@g mx.netwrote:

On 12 Apr., 16:29, Carl Banks <pavlovevide... @gmail.comwrote :
>

And making an utf-8 encoding default is not possible without writing a
new function?

>

I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

>
How many "encodings" would you define for a Rectangle constructor?
>
Making things infinitely configurable is very nice and shows that the
programmer has worked hard. Sometimes however it suffices to provide a
mandatory default and some supplementary conversion methods. This
still won't exhaust all possible cases but provides a reasonable
coverage.

There is no sensible default because many incompatible encodings are
in common use; programmers need to take responsibility for tracking ot
guessing string encodings according to their needs, in ways that
depend on application architecture, characteristics of users and data,
and various risk and quality trade-offs.

In languages that, like Java, have a default encoding for convenience,
documents are routinely mangled by sloppy programmers who think that
they live in an ASCII or UTF-8 fairy land and that they don't need
tight control of the encoding of all text that enters and leaves the
system.
Ceasing to support this obsolete attitude with lenient APIs is the
only way forward; being forced to learn that encodings are important
is better than, say, discovering unrecoverable data corruption in a
working system.

Regards,
Lorenzo Gatti

**John Roth** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:

Christian Heimes <li...@cheimes. dewrites:

Gabriel Genellina schrieb:

On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

>

Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

>
Why hasn't the one-argument str(bytes_obj) been designed to raise an
exception in Python 3?
>
John

Because it's a fundamental rule that you should be able to call str()
on any object and get a sensible result.

The reason that calling str() on a bytes object returns a bytes
literal rather than an unadorned character string is that there are no
default encodings or decodings: there is no way of determining what
the corresponding string should be.

John Roth

**Dan Bishop** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

On Apr 12, 9:29 am, Carl Banks <pavlovevide... @gmail.comwrote :

On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@g mx.netwrote:
>

On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:

>

Gabriel Genellina schrieb:

>

On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
above. But I get the same as repr(x) - is this on purpose?

>

Yes, it's on purpose but it's a bug in your application to call str() on
a bytes object or to compare bytes and unicode directly. Several months
ago I added a bytes warning option to Python. Start Python as "python
-bb" and try it again. ;)

>

And making an utf-8 encoding default is not possible without writing a
new function?

>
I believe the Zen in effect here is, "In the face of ambiguity, refuse
the temptation to guess." How do you know if the bytes are utf-8
encoded?

True, you can't KNOW that. Maybe the author of those bytes actually
MEANT to say 'Â¿CÃ³mo estÃ¡s?' instead of '¿Cómo estás?'. However,
it's statistically unlikely for a non-UTF-8-encoded string to just
happen to be valid UTF-8.

**Steve Holden** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

Dan Bishop wrote:

On Apr 12, 9:29 am, Carl Banks <pavlovevide... @gmail.comwrote :

>On Apr 12, 10:06 am, Kay Schluehr <kay.schlu...@g mx.netwrote:
>>

>>On 12 Apr., 14:44, Christian Heimes <li...@cheimes. dewrote:
>>>Gabriel Genellina schrieb:
>>>>On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii')
>>>>above. But I get the same as repr(x) - is this on purpose?
>>>Yes, it's on purpose but it's a bug in your application to call str() on
>>>a bytes object or to compare bytes and unicode directly. Several months
>>>ago I added a bytes warning option to Python. Start Python as "python
>>>-bb" and try it again. ;)
>>And making an utf-8 encoding default is not possible without writing a
>>new function?

>I believe the Zen in effect here is, "In the face of ambiguity, refuse
>the temptation to guess." How do you know if the bytes are utf-8
>encoded?

>
True, you can't KNOW that. Maybe the author of those bytes actually
MEANT to say 'Â¿CÃ³mo estÃ¡s?' instead of '¿Cómo estás?'. However,
it's statistically unlikely for a non-UTF-8-encoded string to just
happen to be valid UTF-8.

So you propose to perform a statistical analysis on your input to
determine whether it's UTF-8 or some other encoding?

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

**Gabriel Genellina** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

En Sat, 12 Apr 2008 11:25:59 -0300, Martin v. Löwis <martin@v.loewi s.de>
escribió:

>And making an utf-8 encoding default is not possible without writing a
>new function?

>
There is no default encoding anymore in Python 3. This is by design,
learning from the problems in Python 2.x.

So sys.getdefaulte ncoding() will disappear? Currently it returns "utf-8".
In case it stays, what is it used for?

--
Gabriel Genellina

**Terry Reedy** · Jun 27 '08, 04:17 PM

Re: str(bytes) in Python 3.0

"John Roth" <johnroth1@gmai l.comwrote in message
news:29f280cc-4b33-4863-beee-d231df3d9a61@u3 g2000hsc.google groups.com...
| On Apr 12, 8:52 am, j...@pobox.com (John J. Lee) wrote:
| Christian Heimes <li...@cheimes. dewrites:
| Gabriel Genellina schrieb:
| On the last line, str(x), I would expect 'abc' - same as str(x,
'ascii')
| above. But I get the same as repr(x) - is this on purpose?
| >
| Yes, it's on purpose but it's a bug in your application to call str()
on
| a bytes object or to compare bytes and unicode directly. Several
months
| ago I added a bytes warning option to Python. Start Python as
"python
| -bb" and try it again. ;)
| >
| Why hasn't the one-argument str(bytes_obj) been designed to raise an
| exception in Python 3?
| >
| John
|
| Because it's a fundamental rule that you should be able to call str()
| on any object and get a sensible result.
|
| The reason that calling str() on a bytes object returns a bytes
| literal rather than an unadorned character string is that there are no
| default encodings or decodings: there is no way of determining what
| the corresponding string should be.

In having a double meaning, str is much like type. Type(obj) echoes the
existing class of the object. Type(o,p,q) attempts to construct a new
class. Similarly, Str(obj) gives a string representing the obj (which, for
a string, is the string;-). Str(obj,obj2) attemps to construct a new
string.

tjr

Re: str(bytes) in Python 3.0

Re: str(bytes) in Python 3.0

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment