C Text/Binary Files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Harald van =?UTF-8?b?RMSzaw==?=

    #16
    Re: C Text/Binary Files

    On Mon, 23 Jun 2008 14:43:50 -0700, Keith Thompson wrote:
    Harald van Dþÿ3k <truedfx@gmail. comwrites:
    >On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
    What is "\w"? It's not a standard escape sequence; its value is
    implementation-defined.
    >>
    >"\w" does not match the syntax of a string literal, so by the rule of
    >the longest match this is tokenised as {"}{\}{w}{"} . The behaviour is
    >undefined if a double quote character occurs as a single token. There
    >need not be any value given to "\w", and if there is, it need not be
    >documented.
    [...]
    "\w" is split into 4 preprocessor tokens:
    " \ w "
    The " is not a punctuator; it's in the category "each non-white-space
    character that cannot be one of the above" (C99 6.4), which means the
    behavior is undefined.
    Yes. This would normally cause nothing more than a constraint violation
    (as you pointed out below) or syntax error, but in the special case of '
    or ", the behaviour is explicitly undefined.
    In addition, though, this preprocessor token cannot be converted to a
    token. The constraint in 6.4p2 is:
    >
    Each preprocessing token that is converted to a token shall have the
    lexical form of a keyword, an identifier, a constant, a string
    literal, or a punctuator.
    >
    So, assuming that "\w" isn't surrounded by something like "#if 0" ...
    "endif", it would seem to be a constraint violation. By C99 5.1.1.3,
    this requires a diagnostic even if the behavior is also undefined.
    That's a fair point, though I'm not sure this is intended. As I understand
    it, the point of making a stray " undefined was (in part) to allow for
    implementations to support multi-line string literals as an extension. An
    example similar to what I've posted on c.l.c before:

    #define IGNORE(arg) /* nothing */
    int main(void) {
    IGNORE(")
    void *p = 1;
    IGNORE(")
    }

    Strictly by the standard, the two identical lines are tokenised as
    {IGNORE}{(}{"}{ )}, which expands to nothing. So after preprocessing, an
    non-zero integer constant is used to initialise a pointer, which violates
    a constraint. Some implementations , however, are unable to diagnose this,
    because they take the undefined behaviour of a stray " as permission to
    tokenise the body of main as

    {IGNORE}
    {(}
    {")\n void *p = 1;\n IGNORE("}
    {)}

    I believe that since the behaviour is undefined in translation phase 3,
    any constraint violations in later phases should not require a diagnostic.
    I cannot back this up with wording from the standard, only explain with
    examples.
    Note that, by the same reasoning, "abcd\w" should be split into 5
    preprocessing tokens:
    >
    " abcd \ w "
    Yes, and then by my interpretation, the behaviour is undefined, so an
    implementation may choose to make this a single string literal, with or
    without a diagnostic, without any requirement on generated code (if any).
    which just seems confusing. But since such cases require a diagnostic
    anyway, a compiler doesn't actually have to pp-tokenize it that way; as
    long as it prints a warning or error message, its job is done.
    >
    Still, I think the description would have been simpler if a \ followed
    by any character in a character or string literal were allowed
    syntactically, with a constraint limiting the following character to the
    ones that are specified. Then "\w" would be a single pp-token and a
    single token (a string literal), with a diagnostic required because of
    the constraint violation.
    Agreed.

    Comment

    • CBFalconer

      #17
      Re: C Text/Binary Files

      Ali Karaali wrote:
      >
      >See section 7.19.5.4 of the standard for details.
      >>
      ><snip>
      >
      Anyway, How can I find out standard's documents?
      Some useful references about C:
      <http://www.ungerhu.com/jxh/clc.welcome.txt >
      <http://c-faq.com/ (C-faq)
      <http://benpfaff.org/writings/clc/off-topic.html>
      <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf(C99)
      <http://cbfalconer.home .att.net/download/n869_txt.bz2(C9 9, txt)
      <http://www.dinkumware. com/c99.aspx (C-library}
      <http://gcc.gnu.org/onlinedocs/ (GNU docs)
      <http://clc-wiki.net/wiki/C_community:com p.lang.c:Introd uction>

      --
      [mail]: Chuck F (cbfalconer at maineline dot net)
      [page]: <http://cbfalconer.home .att.net>
      Try the download section.


      ** Posted from http://www.teranews.com **

      Comment

      • Keith Thompson

        #18
        Re: C Text/Binary Files

        [Apologies for the binary garbage I posted earlier. I'm having
        multiple system problems, and the system I'm now using apparently
        didn't like the non-ASCII character in Harald's last name. My "From:"
        address has also been incorrect in most of today's postings; the
        "kst@cts.co m" address hasn't existed for several years.]

        Harald van D?k <truedfx@gmail. comwrites:
        On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
        What is "\w"? It's not a standard escape sequence; its value is
        implementation-defined.
        >
        "\w" does not match the syntax of a string literal, so by the rule
        of the longest match this is tokenised as {"}{\}{w}{"} . The
        behaviour is undefined if a double quote character occurs as a
        single token. There need not be any value given to "\w", and if
        there is, it need not be documented.
        I believe you're mostly or entirely right, and I was wrong.

        I misinterpreted the second clause of C99 6.4.4.4p10:

        The value of an integer character constant containing more than
        one character (e.g., 'ab'), or containing a character or escape
        sequence that does not map to a single-byte execution character,
        is implementation-defined.

        as applying to things like '\w'; instead, it applies to things like
        '\xffffffff'.

        "\w" is split into 4 preprocessor tokens:
        " \ w "
        The " is not a punctuator; it's in the category "each non-white-space
        character that cannot be one of the above" (C99 6.4), which means the
        behavior is undefined.

        In addition, though, this preprocessor token cannot be converted to a
        token. The constraint in 6.4p2 is:

        Each preprocessing token that is converted to a token shall have
        the lexical form of a keyword, an identifier, a constant, a string
        literal, or a punctuator.

        So, assuming that "\w" isn't surrounded by something like "#if 0"
        .... "endif", it would seem to be a constraint violation. By C99
        5.1.1.3, this requires a diagnostic even if the behavior is also
        undefined.

        Note that, by the same reasoning, "abcd\w" should be split into 5
        preprocessing tokens:

        " abcd \ w "

        which just seems confusing. But since such cases require a diagnostic
        anyway, a compiler doesn't actually have to pp-tokenize it that way;
        as long as it prints a warning or error message, its job is done.

        Still, I think the description would have been simpler if a \ followed
        by any character in a character or string literal were allowed
        syntactically, with a constraint limiting the following character to
        the ones that are specified. Then "\w" would be a single pp-token and
        a single token (a string literal), with a diagnostic required because
        of the constraint violation.

        --
        Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
        Nokia
        "We must do something. This is something. Therefore, we must do this."
        -- Antony Jay and Jonathan Lynn, "Yes Minister"

        Comment

        • Joachim Schmitz

          #19
          Re: C Text/Binary Files

          Bartc wrote:
          "Keith Thompson" <kst@cts.comwro te in message
          news:lzk5gfkjne .fsf@stalkings. ghoti.net...
          >"Bartc" <bc@freeuk.comw rites:
          >>"Bartc" <bc@freeuk.comw rote in message
          >>news:LCA7k.14 088$E41.12364@t ext.news.virgin media.com...
          >>>The stdin/stdout files of C seem to be always in Text mode.
          >>>
          >>Thanks for the replies.
          >>>
          >>I think if I use exclusively "\w" for newlines (ie. "\r\n") in
          >>strings and
          >>internal functions that generate newlines, then this will work for
          >>binary files.
          >[...]
          >>
          >What is "\w"? It's not a standard escape sequence; its value is
          >implementati on-defined.
          >
          Sorry. In my original post I'd indicated (not very clearly) that \w
          was a new escape in a language I was creating to wrap around C.
          >
          So it's not a C escape but is translated to "\r\n". It represents
          'windows newline'; (or more generally, the full newline sequence used
          in the target OS).
          So where then does your '\w' differ from C's '\n'? In Windows '\n' results
          in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on other
          platforms in whatever that platform uses to separate lines.

          Bye, Jojo


          Comment

          • Bartc

            #20
            Re: C Text/Binary Files


            "Joachim Schmitz" <nospam.jojo@sc hmitz-digital.dewrote in message
            news:g3qc5m$qic $1@online.de...
            Bartc wrote:
            >"Keith Thompson" <kst@cts.comwro te in message
            >news:lzk5gfkjn e.fsf@stalkings .ghoti.net...
            >>What is "\w"? It's not a standard escape sequence; its value is
            >>implementatio n-defined.
            >>
            >Sorry. In my original post I'd indicated (not very clearly) that \w
            >was a new escape in a language I was creating to wrap around C.
            >>
            >So it's not a C escape but is translated to "\r\n". It represents
            >'windows newline'; (or more generally, the full newline sequence used
            >in the target OS).
            So where then does your '\w' differ from C's '\n'? In Windows '\n' results
            in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on
            other platforms in whatever that platform uses to separate lines.
            \w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
            \n stays as \n (typically LF) at compile-time.

            \n only expands to all those other combinations at runtime, and only for
            text modes.
            At runtime, \w would result in \r followed by the expansion of \n, for text
            modes.

            Actual code:
            printf("Hello World\w")

            After translating to C:
            printf("Hello World\r\n");

            At runtime (using printf, stdout directed to a file):
            150C:0100 48 65 6C 6C 6F 20 57 6F-72 6C 64 0D 0D 0A 30 3A Hello
            World...0:

            --
            Bartc


            Comment

            • pete

              #21
              Re: C Text/Binary Files

              Joachim Schmitz wrote:
              Bartc wrote:
              >"Keith Thompson" <kst@cts.comwro te in message
              >news:lzk5gfkjn e.fsf@stalkings .ghoti.net...
              >>"Bartc" <bc@freeuk.comw rites:
              >>>"Bartc" <bc@freeuk.comw rote in message
              >>>news:LCA7k.1 4088$E41.12364@ text.news.virgi nmedia.com...
              >>>>The stdin/stdout files of C seem to be always in Text mode.
              >>>Thanks for the replies.
              >>>>
              >>>I think if I use exclusively "\w" for newlines (ie. "\r\n") in
              >>>strings and
              >>>internal functions that generate newlines, then this will work for
              >>>binary files.
              >>[...]
              >>>
              >>What is "\w"? It's not a standard escape sequence; its value is
              >>implementatio n-defined.
              >Sorry. In my original post I'd indicated (not very clearly) that \w
              >was a new escape in a language I was creating to wrap around C.
              >>
              >So it's not a C escape but is translated to "\r\n". It represents
              >'windows newline'; (or more generally, the full newline sequence used
              >in the target OS).
              So where then does your '\w' differ from C's '\n'? In Windows '\n' results
              in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on other
              platforms in whatever that platform uses to separate lines.
              I think Bartc just doesn't grok text mode.

              --
              pete

              Comment

              • Richard Tobin

                #22
                Re: C Text/Binary Files

                In article <tD38k.14756$E4 1.11751@text.ne ws.virginmedia. com>,
                Bartc <bc@freeuk.comw rote:
                >\w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
                >\n stays as \n (typically LF) at compile-time.
                I can see this might be useful for writing to binary files in the
                system's native text format.

                It's limited to systems where the line break is represented by a
                sequence of characters: it doesn't make sense on systems with lines
                implemented in some other way (e.g. with a count). Of course, you may
                not consider that important nowadays.

                For a purely C solution you could just define a macro; e.g. for
                Windows

                #define LINEEND "\015\012"

                and you can use it easily in constant strings

                "hello" LINEEND "world" LINEEND

                -- Richard
                --
                In the selection of the two characters immediately succeeding the numeral 9,
                consideration shall be given to their replacement by the graphics 10 and 11 to
                facilitate the adoption of the code in the sterling monetary area. (X3.4-1963)

                Comment

                • BigRelax

                  #23
                  Re: C Text/Binary Files

                  Hello ``
                  I am a student from china.
                  I like c.

                  If you make a friend with me, I am very happy.
                  My MSN ID is bigrelax@live.c n

                  --
                  Message posted using http://www.talkaboutprogramming.com/group/comp.lang.c/
                  More information at http://www.talkaboutprogramming.com/faq.html

                  Comment

                  • santosh

                    #24
                    Re: C Text/Binary Files

                    BigRelax wrote:
                    Hello ``
                    I am a student from china.
                    I like c.
                    >
                    If you make a friend with me, I am very happy.
                    My MSN ID is bigrelax@live.c n
                    This is not a group for "making friends" or idle chit-chat. If you have
                    questions or problem on standard C post them here.
                    Complain to the maintainer of the above forum that the signature
                    separator that they add is broken.

                    Comment

                    • CBFalconer

                      #25
                      Re: C Text/Binary Files

                      santosh wrote:
                      BigRelax wrote:
                      >
                      .... snip ...
                      >>
                      Complain to the maintainer of the above forum that the signature
                      separator that they add is broken.
                      Your comment would be much more useful if you pointed out how it
                      was broken. It requires a line containing exactly "-- ". Note the
                      terminal space.

                      --
                      [mail]: Chuck F (cbfalconer at maineline dot net)
                      [page]: <http://cbfalconer.home .att.net>
                      Try the download section.


                      ** Posted from http://www.teranews.com **

                      Comment

                      Working...