Re: C Text/Binary Files
On Mon, 23 Jun 2008 14:43:50 -0700, Keith Thompson wrote:
Yes. This would normally cause nothing more than a constraint violation
(as you pointed out below) or syntax error, but in the special case of '
or ", the behaviour is explicitly undefined.
That's a fair point, though I'm not sure this is intended. As I understand
it, the point of making a stray " undefined was (in part) to allow for
implementations to support multi-line string literals as an extension. An
example similar to what I've posted on c.l.c before:
#define IGNORE(arg) /* nothing */
int main(void) {
IGNORE(")
void *p = 1;
IGNORE(")
}
Strictly by the standard, the two identical lines are tokenised as
{IGNORE}{(}{"}{ )}, which expands to nothing. So after preprocessing, an
non-zero integer constant is used to initialise a pointer, which violates
a constraint. Some implementations , however, are unable to diagnose this,
because they take the undefined behaviour of a stray " as permission to
tokenise the body of main as
{IGNORE}
{(}
{")\n void *p = 1;\n IGNORE("}
{)}
I believe that since the behaviour is undefined in translation phase 3,
any constraint violations in later phases should not require a diagnostic.
I cannot back this up with wording from the standard, only explain with
examples.
Yes, and then by my interpretation, the behaviour is undefined, so an
implementation may choose to make this a single string literal, with or
without a diagnostic, without any requirement on generated code (if any).
Agreed.
On Mon, 23 Jun 2008 14:43:50 -0700, Keith Thompson wrote:
Harald van Dþÿ3k <truedfx@gmail. comwrites:
[...]
"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.
>On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
>>
>"\w" does not match the syntax of a string literal, so by the rule of
>the longest match this is tokenised as {"}{\}{w}{"} . The behaviour is
>undefined if a double quote character occurs as a single token. There
>need not be any value given to "\w", and if there is, it need not be
>documented.
What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.
implementation-defined.
>"\w" does not match the syntax of a string literal, so by the rule of
>the longest match this is tokenised as {"}{\}{w}{"} . The behaviour is
>undefined if a double quote character occurs as a single token. There
>need not be any value given to "\w", and if there is, it need not be
>documented.
"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.
(as you pointed out below) or syntax error, but in the special case of '
or ", the behaviour is explicitly undefined.
In addition, though, this preprocessor token cannot be converted to a
token. The constraint in 6.4p2 is:
>
Each preprocessing token that is converted to a token shall have the
lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.
>
So, assuming that "\w" isn't surrounded by something like "#if 0" ...
"endif", it would seem to be a constraint violation. By C99 5.1.1.3,
this requires a diagnostic even if the behavior is also undefined.
token. The constraint in 6.4p2 is:
>
Each preprocessing token that is converted to a token shall have the
lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.
>
So, assuming that "\w" isn't surrounded by something like "#if 0" ...
"endif", it would seem to be a constraint violation. By C99 5.1.1.3,
this requires a diagnostic even if the behavior is also undefined.
it, the point of making a stray " undefined was (in part) to allow for
implementations to support multi-line string literals as an extension. An
example similar to what I've posted on c.l.c before:
#define IGNORE(arg) /* nothing */
int main(void) {
IGNORE(")
void *p = 1;
IGNORE(")
}
Strictly by the standard, the two identical lines are tokenised as
{IGNORE}{(}{"}{ )}, which expands to nothing. So after preprocessing, an
non-zero integer constant is used to initialise a pointer, which violates
a constraint. Some implementations , however, are unable to diagnose this,
because they take the undefined behaviour of a stray " as permission to
tokenise the body of main as
{IGNORE}
{(}
{")\n void *p = 1;\n IGNORE("}
{)}
I believe that since the behaviour is undefined in translation phase 3,
any constraint violations in later phases should not require a diagnostic.
I cannot back this up with wording from the standard, only explain with
examples.
Note that, by the same reasoning, "abcd\w" should be split into 5
preprocessing tokens:
>
" abcd \ w "
preprocessing tokens:
>
" abcd \ w "
implementation may choose to make this a single string literal, with or
without a diagnostic, without any requirement on generated code (if any).
which just seems confusing. But since such cases require a diagnostic
anyway, a compiler doesn't actually have to pp-tokenize it that way; as
long as it prints a warning or error message, its job is done.
>
Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to the
ones that are specified. Then "\w" would be a single pp-token and a
single token (a string literal), with a diagnostic required because of
the constraint violation.
anyway, a compiler doesn't actually have to pp-tokenize it that way; as
long as it prints a warning or error message, its job is done.
>
Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to the
ones that are specified. Then "\w" would be a single pp-token and a
single token (a string literal), with a diagnostic required because of
the constraint violation.
Comment