Re: getc and "large&quo t; bytes
Keith Thompson wrote:
My case for distinguishabil ity was in the part you snipped,
labeled "1a)". It derives from the Standard's requirement that
bytes read back from a binary stream must compare equal to those
written to it (on the same implementation, not counting trailing
zeroes, et cetera). If there are fewer `int' values than there
are `unsigned char' values, then by the pigeonhole principle there
must be at least one collision where two distinct `unsigned char'
values V1 and V2 convert to the same `int' value. Then this
code fragment
putc(V1, stream);
putc(V2, stream);
rewind(stream);
assert(getc(str eam) == V1);
assert(getc(str eam) == V2);
.... cannot succeed. (Yes, I know, it's very bad to generate
side-effects in an assert(), but this is just for illustration.)
"Upon further review," as they say in American football, I
guess an implementation could choose to report an I/O error if
it ever encountered V2, say, on input. (If "helpful," it would
also report an error for any attempt to write V2.) That would
give an extremely low QoI, but the Standard does not forbid I/O
operations from failing "predictabl y." (Indeed, on many systems
fopen("/", "w") will fail predictably.) So perhaps a sufficiently
bad implementation could in fact claim conformance even if unable
to read and write all `unsigned char' values, and this would allow
signed magnitude and ones' complement (and two's complement with
one trap representation) .
And, of course, no argument based on the behavior of getc()
has any force for freestanding implementations .
--
Eric.Sosman@sun .com
Keith Thompson wrote:
Eric Sosman <Eric.Sosman@su n.comwrites:
[...]
[...]
>
How do you conclude that all 2**N distinct values of type unsigned
char must be distinguishable when converted to int? The result of the
conversion is implementation-defined. If, for example, int has the
range -32768 .. +32767, and unsigned char has the range 0 .. 65536, I
see nothing in the standard that forbids converting all unsigned char
values greater than 32767 to 32767 (saturation). It would break
stdio, but I'm not convinced that that would make it non-conforming
(particularly for a freestanding implementation that needn't provide
stdio).
[...]
> It seems to me that the behavior required of getc() places
>far-reaching requirements on implementations where `int' and
>`char' have the same width. Here are a few:
>>
> 1) Since `unsigned char' can represent 2**N distinct values
>and all of these must be distinguishable when converted to `int',
>it follows that `int' must also have 2**N distinct values. Thus,
>signed-magnitude and ones' complement representations are ruled
>out, and INT_MIN must have its most negative possible value
>(that is, INT_MIN == -INT_MAX - 1, all-bits-set cannot be a trap
>representation ).
>far-reaching requirements on implementations where `int' and
>`char' have the same width. Here are a few:
>>
> 1) Since `unsigned char' can represent 2**N distinct values
>and all of these must be distinguishable when converted to `int',
>it follows that `int' must also have 2**N distinct values. Thus,
>signed-magnitude and ones' complement representations are ruled
>out, and INT_MIN must have its most negative possible value
>(that is, INT_MIN == -INT_MAX - 1, all-bits-set cannot be a trap
>representation ).
>
How do you conclude that all 2**N distinct values of type unsigned
char must be distinguishable when converted to int? The result of the
conversion is implementation-defined. If, for example, int has the
range -32768 .. +32767, and unsigned char has the range 0 .. 65536, I
see nothing in the standard that forbids converting all unsigned char
values greater than 32767 to 32767 (saturation). It would break
stdio, but I'm not convinced that that would make it non-conforming
(particularly for a freestanding implementation that needn't provide
stdio).
labeled "1a)". It derives from the Standard's requirement that
bytes read back from a binary stream must compare equal to those
written to it (on the same implementation, not counting trailing
zeroes, et cetera). If there are fewer `int' values than there
are `unsigned char' values, then by the pigeonhole principle there
must be at least one collision where two distinct `unsigned char'
values V1 and V2 convert to the same `int' value. Then this
code fragment
putc(V1, stream);
putc(V2, stream);
rewind(stream);
assert(getc(str eam) == V1);
assert(getc(str eam) == V2);
.... cannot succeed. (Yes, I know, it's very bad to generate
side-effects in an assert(), but this is just for illustration.)
"Upon further review," as they say in American football, I
guess an implementation could choose to report an I/O error if
it ever encountered V2, say, on input. (If "helpful," it would
also report an error for any attempt to write V2.) That would
give an extremely low QoI, but the Standard does not forbid I/O
operations from failing "predictabl y." (Indeed, on many systems
fopen("/", "w") will fail predictably.) So perhaps a sufficiently
bad implementation could in fact claim conformance even if unable
to read and write all `unsigned char' values, and this would allow
signed magnitude and ones' complement (and two's complement with
one trap representation) .
And, of course, no argument based on the behavior of getc()
has any force for freestanding implementations .
--
Eric.Sosman@sun .com
Comment