fgets() replacement

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paul D. Boyle

    fgets() replacement

    Hi all,

    There was a recent thread in this group which talked about the
    shortcomings of fgets(). I decided to try my hand at writing a
    replacement for fgets() using fgetc() and realloc() to read a line of
    arbitrary length. I know that the better programmers in this group could
    write a more robust function, but here is my shot at it anyway.
    I would appreciate people's comments on my fget_line() code below
    (usage example included). Any constructive criticism welcome regarding
    logic, design, style, etc. Thanks.

    Paul

    /* fget_line(): a function to read a line of input of arbitrary length.
    *
    * Arguments:
    * 'in' -- the input stream from which data is wanted.
    * 'buf' -- the address of a pointer to char. The read in results
    * will be contained in this buffer after the fget_line returns.
    * *** THE CALLER MUST FREE THIS POINTER ***
    * 'sz' -- the caller can supply an estimate of the length of line to be
    * read in. If this argument is 0, then fget_line() uses a
    * default.
    * 'validate' -- a user supplied callback function which is used to validate
    * each input character. This argument may be NULL in which
    * case no input validation is done.
    *
    * RETURN values:
    * fget_line() on success: returns the number of bytes read
    * realloc() related failure: returns -1 (#define'd below as ERROR_MEMORY)
    * illegal input: returns -2 (#define'd below as ERROR_ILLEGAL_C HAR)
    */
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    #include <errno.h>

    #ifndef USER_DIALOG_H /* if I am not using this as part of my
    * 'user_dialog' utilities.
    */
    #define LINE_LEN 80
    #define DELIMITER '\n'
    #define ERROR_MEMORY (-1)
    #define ERROR_ILLEGAL_C HAR (-2)
    #else
    #include "user_dialo g.h"
    #endif

    int fget_line( FILE *in, char **buf, size_t sz, int (*validate)(int ) )
    {
    int n_bytes = 0; /* total number of bytes read or error flag */
    size_t n_alloc = 0; /* number of bytes allocated */

    unsigned int mult = 1;

    char *tmp, *local_buffer = NULL;
    int read_in;
    *buf = NULL;

    if( 0 == sz ) sz = LINE_LEN;

    while( (read_in = fgetc( in )) != DELIMITER && read_in != EOF ) {
    if( 0 == n_alloc ) {
    n_alloc = sz * mult + n_bytes + 1;
    tmp = realloc( local_buffer, n_alloc );
    if ( NULL != tmp ) {
    local_buffer = tmp;
    mult++;
    }
    else {
    local_buffer[n_bytes] = '\0';
    *buf = local_buffer;
    return ERROR_MEMORY;
    }

    }
    if( NULL != validate ) {
    if( 0 != validate( read_in ) ) {
    local_buffer[n_bytes++] = read_in;
    n_alloc--;
    }
    else {
    local_buffer[n_bytes] = '\0';
    *buf = local_buffer;
    return ERROR_ILLEGAL_C HAR;
    }
    }

    }

    local_buffer[n_bytes] = '\0';

    /* trim excess memory if any */
    if( n_alloc > (size_t)n_bytes ) {
    tmp = realloc( local_buffer, n_bytes );
    if( NULL != tmp ) {
    local_buffer = tmp;
    }
    }

    local_buffer[n_bytes] = '\0';
    *buf = local_buffer;
    return n_bytes;
    }

    /* usage example */
    int main( void )
    {
    char *line = NULL;
    int ret_value;
    size_t len;

    fputs( "Enter a string: ", stdout );
    fflush( stdout );

    ret_value = fget_line( stdin, &line, 0, isalnum );
    len = strlen( line );

    fprintf( stdout, "fget_line( ) returned %d\nstrlen() returns %d bytes\n",
    ret_value, len );
    fprintf( stdout, "String is: \"%s\"\n", line );
    free( line );
    exit( EXIT_SUCCESS );
    }




    --
    Paul D. Boyle
    boyle@laue.chem .ncsu.edu
    North Carolina State University

  • Malcolm

    #2
    Re: fgets() replacement


    "Paul D. Boyle" <boyle@laue.che m.ncsu.edu> wrote[color=blue]
    > Hi all,
    >
    > There was a recent thread in this group which talked about the
    > shortcomings of fgets(). I decided to try my hand at writing a
    > replacement for fgets() using fgetc() and realloc() to read a line of
    > arbitrary length.
    >[/color]
    Why not look a CB Falconer's "ggets()"?

    <http://cbfalconer.home .att.net>

    or email him on

    Chuck F (cbfalconer@yah oo.com)

    Do you think your way of doing things is better?


    Comment

    • Paul D. Boyle

      #3
      Re: fgets() replacement

      Malcolm <malcolm@55bank .freeserve.co.u k> wrote:

      : "Paul D. Boyle" <boyle@laue.che m.ncsu.edu> wrote
      :> Hi all,
      :>
      :> There was a recent thread in this group which talked about the
      :> shortcomings of fgets(). I decided to try my hand at writing a
      :> replacement for fgets() using fgetc() and realloc() to read a line of
      :> arbitrary length.
      :>
      : Why not look a CB Falconer's "ggets()"?

      I did, but I wanted to try doing it (mostly for the heck of it) using
      fgetc() and realloc().

      : Do you think your way of doing things is better?

      I don't see it as a matter of better. I wrote a function which did the
      things I thought would make a safe and useful function, and I wanted
      other people's opinion of what I had done. In particular, in fget_line(),
      I provided a way to do some input validation. Was that a good and useful
      design decision?

      Paul

      --
      Paul D. Boyle
      boyle@laue.chem .ncsu.edu
      North Carolina State University

      Comment

      • Paul D. Boyle

        #4
        Re: fgets() replacement

        Paul D. Boyle <boyle@laue.che m.ncsu.edu> wrote:
        : if( NULL != validate ) {
        : if( 0 != validate( read_in ) ) {
        : local_buffer[n_bytes++] = read_in;
        : n_alloc--;
        : }
        : else {
        : local_buffer[n_bytes] = '\0';
        : *buf = local_buffer;
        : return ERROR_ILLEGAL_C HAR;
        : }
        : }
        :
        : }

        : local_buffer[n_bytes] = '\0';

        Naturally, I have to discover a little(?) *after* I post to
        comp.lang.c. (grrr). I am missing an 'else' block to cover the case
        where the 'validate' function pointer is NULL. The above code should be:

        if( NULL != validate ) {
        if( 0 != validate( read_in ) ) {
        local_buffer[n_bytes++] = read_in;
        n_alloc--;
        }
        else {
        local_buffer[n_bytes] = '\0';
        *buf = local_buffer;
        return ERROR_ILLEGAL_C HAR;
        }
        }
        else {
        local_buffer[n_bytes++] = read_in;
        n_alloc--;
        }

        }

        local_buffer[n_bytes] = '\0';

        /* and so on ... */

        Paul


        --
        Paul D. Boyle
        boyle@laue.chem .ncsu.edu
        North Carolina State University

        Comment

        • Eric Sosman

          #5
          Re: fgets() replacement

          Paul D. Boyle wrote:[color=blue]
          > Hi all,
          >
          > There was a recent thread in this group which talked about the
          > shortcomings of fgets(). I decided to try my hand at writing a
          > replacement for fgets() using fgetc() and realloc() to read a line of
          > arbitrary length. I know that the better programmers in this group could
          > write a more robust function, but here is my shot at it anyway.
          > I would appreciate people's comments on my fget_line() code below
          > (usage example included). Any constructive criticism welcome regarding
          > logic, design, style, etc. Thanks.
          >
          > Paul
          >
          > /* fget_line(): a function to read a line of input of arbitrary length.
          > *
          > * Arguments:
          > * 'in' -- the input stream from which data is wanted.
          > * 'buf' -- the address of a pointer to char. The read in results
          > * will be contained in this buffer after the fget_line returns.
          > * *** THE CALLER MUST FREE THIS POINTER ***
          > * 'sz' -- the caller can supply an estimate of the length of line to be
          > * read in. If this argument is 0, then fget_line() uses a
          > * default.
          > * 'validate' -- a user supplied callback function which is used to validate
          > * each input character. This argument may be NULL in which
          > * case no input validation is done.
          > *
          > * RETURN values:
          > * fget_line() on success: returns the number of bytes read
          > * realloc() related failure: returns -1 (#define'd below as ERROR_MEMORY)
          > * illegal input: returns -2 (#define'd below as ERROR_ILLEGAL_C HAR)
          > */[/color]

          First criticism: The function does too much. This is, of
          course, a matter of taste, but if the goal is "a replacement
          for fgets()" I think the validate() business is extraneous.
          (Even the `sz' parameter raises my eyebrows a little, albeit
          not a lot.)

          IMHO, a low-level library function should do one thing,
          do it well, and do it in a manner that facilitates combining
          it with other functions to create grander structures. Or as
          my old fencing coach used to admonish me when I got overenthused
          with tricky multiple-feint combinations: "Keep It Simple, Stupid."

          By the way, you've described the purpose of validate() but
          not how it is supposed to operate. What value(s) should it
          return to cause fget_line() to take this or that action? To
          find this out one must read the code of fget_line() -- and that,
          I think, is a poor form for documentation.
          [color=blue]
          > #include <stdio.h>
          > #include <stdlib.h>
          > #include <string.h>
          > #include <ctype.h>
          > #include <errno.h>
          >
          > #ifndef USER_DIALOG_H /* if I am not using this as part of my
          > * 'user_dialog' utilities.
          > */[/color]

          Here, I think, is another clue that the design leans too
          far towards the baroque. In effect, USER_DIALOG_H and the
          associated macros become part of the function's specification.
          That specification now encompasses one return value encoding
          three distinguishable states, four function arguments (two
          with usage rules not expressible to the compiler), and five
          macros. That strikes me as *way* too much for "a replacement
          for fgets()."

          ("All right, Smarty Pants, how would *you* do it?")

          Fair enough. Everybody, it seems, writes an fgets()
          replacement eventually, and here's the .h text for mine:

          char *getline(FILE *stream);
          /*
          * Reads a complete line from an input stream, stores
          * it and a NUL terminator in an internal buffer, and
          * returns a pointer to the start of the buffer. Returns
          * NULL if end-of-file occurs before any characters are
          * read, or if an I/O error occurs at any time, or if
          * unable to allocate buffer memory (in the latter cases,
          * any characters read before the I/O error or allocation
          * failure are lost). If the argument is NULL, frees the
          * internal buffer and returns NULL.
          *
          * A "complete line" consists of zero or more non-newline
          * characters followed by a newline, or one or more non-
          * newline characters followed by EOF.
          */

          Now, I'm not saying that this is the only way to replace fgets().
          I'm not even claiming it's the "best" way; some choices have been
          made that could just as well have been made differently. The
          point is to show that the specification can be a whole lot sparer
          than for fget_line() and still be useful. (In fact, my original
          getline() was sparer still: that "discard the buffer on a NULL
          argument" gadget was warted on afterwards. It's ugly and very
          little used; I may decide to go back to the simpler form.)

          Now, on to the implementation itself.
          [color=blue]
          > #define LINE_LEN 80
          > #define DELIMITER '\n'
          > #define ERROR_MEMORY (-1)
          > #define ERROR_ILLEGAL_C HAR (-2)[/color]

          Pointless parentheses.
          [color=blue]
          > #else
          > #include "user_dialo g.h"
          > #endif
          >
          > int fget_line( FILE *in, char **buf, size_t sz, int (*validate)(int ) )
          > {
          > int n_bytes = 0; /* total number of bytes read or error flag */
          > size_t n_alloc = 0; /* number of bytes allocated */[/color]

          This stands out as an Odd Thing: You're using a `size_t' to
          keep track of the allocated buffer's size, but a mere `int' to
          count the characters therein. Ah, yes: You also want to return
          negative values to indicate errors! But that doesn't excuse the
          type of `n_bytes', because the error codes are never stored in
          it; they're always transmitted as part of a `return' statement.

          ... in connection with which, I wonder about the wisdom of
          using an `int' as the function's value. Maybe a `long' would
          be better? At any rate, if you feel you must use `int' you
          should at least guard against lines longer than INT_MAX.
          [color=blue]
          > unsigned int mult = 1;
          >
          > char *tmp, *local_buffer = NULL;
          > int read_in;
          > *buf = NULL;
          >
          > if( 0 == sz ) sz = LINE_LEN;
          >
          > while( (read_in = fgetc( in )) != DELIMITER && read_in != EOF ) {[/color]

          Why fgetc() instead of getc()? In this instance they're
          functionally equivalent, but getc() is likely to have less
          overhead.
          [color=blue]
          > if( 0 == n_alloc ) {
          > n_alloc = sz * mult + n_bytes + 1;
          > tmp = realloc( local_buffer, n_alloc );
          > if ( NULL != tmp ) {
          > local_buffer = tmp;
          > mult++;
          > }
          > else {
          > local_buffer[n_bytes] = '\0';[/color]

          Undefined behavior if the very first realloc() fails,
          because `local_buffer' will still be NULL.
          [color=blue]
          > *buf = local_buffer;
          > return ERROR_MEMORY;
          > }
          >
          > }
          > if( NULL != validate ) {
          > if( 0 != validate( read_in ) ) {
          > local_buffer[n_bytes++] = read_in;
          > n_alloc--;
          > }
          > else {
          > local_buffer[n_bytes] = '\0';
          > *buf = local_buffer;
          > return ERROR_ILLEGAL_C HAR;
          > }
          > }[/color]

          You mentioned that `validate' could be given as NULL,
          but somehow you didn't mention that doing so would suppress
          *all* the input ...
          [color=blue]
          > }
          >
          > local_buffer[n_bytes] = '\0';[/color]

          Undefined behavior if you get EOF or DELIMITER on the
          very first fgetc(), because `local_buffer' will still be
          NULL.
          [color=blue]
          > /* trim excess memory if any */
          > if( n_alloc > (size_t)n_bytes ) {[/color]

          I think this test is wrong: You've been decrementing
          `n_alloc' with each character stored, so it is no longer
          the size of the allocated area. (The record-keeping of
          sizes in this function seems to involve a lot more work
          than is really necessary. Two variables should suffice
          for the job; fget_line() uses four.)
          [color=blue]
          > tmp = realloc( local_buffer, n_bytes );
          > if( NULL != tmp ) {
          > local_buffer = tmp;
          > }
          > }
          >
          > local_buffer[n_bytes] = '\0';[/color]

          What, again? Didn't we already do this, just a few
          lines ago? Oh, wait, it's different this time:

          Undefined behavior if the "trim excess" realloc()
          *succeeds*, because it writes beyond the end of the memory
          pointed to by `local_buffer'.
          [color=blue]
          > *buf = local_buffer;
          > return n_bytes;
          > }[/color]

          General impressions: The design is overcomplicated , the
          implementation is more intricate than even the complex
          design requires, insufficient attention has been paid to
          boundary conditions, and insufficient testing has been done.

          "Simplify! Simplify!" -- H.D. Thoreau

          --
          Eric.Sosman@sun .com

          Comment

          • CBFalconer

            #6
            Re: fgets() replacement

            "Paul D. Boyle" wrote:[color=blue]
            > Malcolm <malcolm@55bank .freeserve.co.u k> wrote:
            >: "Paul D. Boyle" <boyle@laue.che m.ncsu.edu> wrote
            >:>
            >:> There was a recent thread in this group which talked about the
            >:> shortcomings of fgets(). I decided to try my hand at writing a
            >:> replacement for fgets() using fgetc() and realloc() to read a
            >:> line of arbitrary length.
            >:>
            >: Why not look a CB Falconer's "ggets()"?
            >
            > I did, but I wanted to try doing it (mostly for the heck of it)
            > using fgetc() and realloc().
            >
            >: Do you think your way of doing things is better?
            >
            > I don't see it as a matter of better. I wrote a function which
            > did the things I thought would make a safe and useful function,
            > and I wanted other people's opinion of what I had done. In
            > particular, in fget_line(), I provided a way to do some input
            > validation. Was that a good and useful design decision?[/color]

            Thanks, Malcolm, for the kind words. The mail address you gave
            will reach my spam trap. The URL is good.

            My objective was to simplify the calling sequence as far as
            possible. So I didn't look at yours in detail. Your validation
            idea may well be useful in some areas. I don't think the
            preliminary size estimate is worthwhile, but that is just an
            opinion.

            If I were creating a routine with such input validation, I would
            probably simply pass it a routine to input a char, say "rdchar()",
            returning EOF on error or invalid. I have grave doubts that such
            will be useful in string input. For stream conversion to integer,
            real, etc. the tests belong in the conversion function. Again,
            IMO.

            --
            fix (vb.): 1. to paper over, obscure, hide from public view; 2.
            to work around, in a way that produces unintended consequences
            that are worse than the original problem. Usage: "Windows ME
            fixes many of the shortcomings of Windows 98 SE". - Hutchison

            Comment

            • Dan Pop

              #7
              Re: fgets() replacement

              In <c97ugq$9h4$1@n ews8.svr.pol.co .uk> "Malcolm" <malcolm@55bank .freeserve.co.u k> writes:

              [color=blue]
              >"Paul D. Boyle" <boyle@laue.che m.ncsu.edu> wrote[color=green]
              >> Hi all,
              >>
              >> There was a recent thread in this group which talked about the
              >> shortcomings of fgets(). I decided to try my hand at writing a
              >> replacement for fgets() using fgetc() and realloc() to read a line of
              >> arbitrary length.
              >>[/color]
              >Why not look a CB Falconer's "ggets()"?[/color]

              Because this is the kind of wheel most programmers prefer to reinvent
              on their own. I'm still using scanf and friends, since I have yet to
              write an application where getting arbitrarily long input lines makes
              sense.

              Dan
              --
              Dan Pop
              DESY Zeuthen, RZ group
              Email: Dan.Pop@ifh.de

              Comment

              • Eric Sosman

                #8
                Re: fgets() replacement

                Dan Pop wrote:[color=blue]
                > In <c97ugq$9h4$1@n ews8.svr.pol.co .uk> "Malcolm" <malcolm@55bank .freeserve.co.u k> writes:
                >
                >
                >[color=green]
                >>"Paul D. Boyle" <boyle@laue.che m.ncsu.edu> wrote
                >>[color=darkred]
                >>>Hi all,
                >>>
                >>>There was a recent thread in this group which talked about the
                >>>shortcomin gs of fgets(). I decided to try my hand at writing a
                >>>replacemen t for fgets() using fgetc() and realloc() to read a line of
                >>>arbitrary length.
                >>>[/color]
                >>
                >>Why not look a CB Falconer's "ggets()"?[/color]
                >
                >
                > Because this is the kind of wheel most programmers prefer to reinvent
                > on their own. I'm still using scanf and friends, since I have yet to
                > write an application where getting arbitrarily long input lines makes
                > sense.[/color]

                Arbitrarily long input lines are quite likely senseless.
                But the problem's really the other side of the issue: Arbitrarily
                *short* input lines -- meaning, "Input lines artificially truncated
                to a length J. Random Programmer chose at compile time" -- are
                not very sensible, either. (Off-topic aside: look up "curtation"
                in "The Computer Contradictionar y," or Google for "MOZDONG." These
                are limitations on output rather than input, but the idea is similar.)

                The utility of an fgets() substitute/wrapper/whatever isn't
                that one is now free to read "lines" of umpty-skillion gigabytes,
                but that one can stop worrying about the line length altogether.

                --
                Eric.Sosman@sun .com

                Comment

                • Dan Pop

                  #9
                  Re: fgets() replacement

                  In <40BCA5FF.60004 01@sun.com> Eric Sosman <Eric.Sosman@su n.com> writes:
                  [color=blue]
                  >Dan Pop wrote:[color=green]
                  >> In <c97ugq$9h4$1@n ews8.svr.pol.co .uk> "Malcolm" <malcolm@55bank .freeserve.co.u k> writes:
                  >>[color=darkred]
                  >>>"Paul D. Boyle" <boyle@laue.che m.ncsu.edu> wrote
                  >>>
                  >>>>Hi all,
                  >>>>
                  >>>>There was a recent thread in this group which talked about the
                  >>>>shortcoming s of fgets(). I decided to try my hand at writing a
                  >>>>replaceme nt for fgets() using fgetc() and realloc() to read a line of
                  >>>>arbitrary length.
                  >>>
                  >>>Why not look a CB Falconer's "ggets()"?[/color]
                  >>
                  >> Because this is the kind of wheel most programmers prefer to reinvent
                  >> on their own. I'm still using scanf and friends, since I have yet to
                  >> write an application where getting arbitrarily long input lines makes
                  >> sense.[/color]
                  >
                  > Arbitrarily long input lines are quite likely senseless.
                  >But the problem's really the other side of the issue: Arbitrarily
                  >*short* input lines -- meaning, "Input lines artificially truncated
                  >to a length J. Random Programmer chose at compile time" -- are
                  >not very sensible, either.[/color]

                  Most of the time, there are perfectly sensible limits that can be imposed
                  on the user input. As a matter of fact, I have yet to see a
                  counterexample. And when the user input is obtained interactively, the
                  user can be warned of those limits, in the text prompting him to
                  provide the input.

                  Of course, there is always the option of treating a line longer than the
                  limit as erroneous input and completely rejecting it, rather than
                  truncating it. It up to the programmer to decide what makes more sense
                  in the presence of nonsensical input...
                  [color=blue]
                  > The utility of an fgets() substitute/wrapper/whatever isn't
                  >that one is now free to read "lines" of umpty-skillion gigabytes,
                  >but that one can stop worrying about the line length altogether.[/color]

                  But it can be trivially abused into reading umpty-skillion gigabytes,
                  unless it imposes a limit ;-)

                  Dan
                  --
                  Dan Pop
                  DESY Zeuthen, RZ group
                  Email: Dan.Pop@ifh.de

                  Comment

                  • Malcolm

                    #10
                    Re: fgets() replacement


                    "Dan Pop" <Dan.Pop@cern.c h> wrote in message[color=blue]
                    > Most of the time, there are perfectly sensible limits that can be
                    > imposed on the user input. As a matter of fact, I have yet to see a
                    > counterexample. And when the user input is obtained interactively, > the[/color]
                    user can be warned of those limits, in the text prompting him to[color=blue]
                    > provide the input.
                    >[/color]
                    What limit should be imposed on a line of BASIC? Most human-readable code is
                    under 100 characters, but some code might be machine-generated, and someone
                    might add a long string as a single line.[color=blue]
                    >
                    > But it can be trivially abused into reading umpty-skillion gigabytes,
                    > unless it imposes a limit ;-)
                    >[/color]
                    My BASIC interpreter uses a recursive-decent parser, so very long
                    expressions could overflow the stack. There is actually a case for imposing
                    a line limit, though of course stack size / usage is hard to determine.


                    Comment

                    • James Kanze

                      #11
                      Re: fgets() replacement

                      "Malcolm" <malcolm@55bank .freeserve.co.u k> writes:

                      |> "Dan Pop" <Dan.Pop@cern.c h> wrote in message

                      |> > Most of the time, there are perfectly sensible limits that can be
                      |> > imposed on the user input. As a matter of fact, I have yet to see
                      |> > a counterexample. And when the user input is obtained
                      |> > interactively, the user can be warned of those limits, in the text
                      |> > prompting him to provide the input.

                      |> What limit should be imposed on a line of BASIC? Most human-readable
                      |> code is under 100 characters, but some code might be
                      |> machine-generated, and someone might add a long string as a single
                      |> line.

                      Take a look at the sources of any web page sometime. Many of the web
                      page editors put an entire paragraph on a single line.

                      --
                      James Kanze
                      Conseils en informatique orientée objet/
                      Beratung in objektorientier ter Datenverarbeitu ng
                      9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

                      Comment

                      • red floyd

                        #12
                        Re: fgets() replacement

                        Eric Sosman <Eric.Sosman@su n.com> wrote in message news:<40B79100. 8030500@sun.com >...[color=blue]
                        > [extremely long and well reasoned post redacted][color=green]
                        > > #define ERROR_MEMORY (-1)
                        > > #define ERROR_ILLEGAL_C HAR (-2)[/color]
                        >
                        > Pointless parentheses.[/color]
                        Actually, no. It avoids errors in the case of (admittedly bad)

                        ch=ERROR_MEMORY ;

                        Which would expand to

                        ch=-1;

                        I don't know if the standard formally made "=-" illegal, or kept the
                        K&R deprecation, but with older compilers, this might be an issue.

                        In general, I tend to make sure that any macros that include a - or a
                        ~ are parenthesized, to avoid issues with macro expansion and operator
                        precedence.

                        Comment

                        • Eric Sosman

                          #13
                          Re: fgets() replacement

                          red floyd wrote:[color=blue]
                          > Eric Sosman <Eric.Sosman@su n.com> wrote in message news:<40B79100. 8030500@sun.com >...
                          >[color=green]
                          >>[extremely long and well reasoned post redacted]
                          >>[color=darkred]
                          >>>#define ERROR_MEMORY (-1)
                          >>>#define ERROR_ILLEGAL_C HAR (-2)[/color]
                          >>
                          >> Pointless parentheses.[/color]
                          >
                          > Actually, no. It avoids errors in the case of (admittedly bad)
                          >
                          > ch=ERROR_MEMORY ;
                          >
                          > Which would expand to
                          >
                          > ch=-1;[/color]

                          Not in a Standard-conforming compiler. The preprocessor
                          sees the original line as containing four preprocessing tokens

                          ch
                          =
                          ERROR_MEMORY
                          ;

                          After macro substitution there are five (if the parentheses
                          are removed from the definition):

                          ch
                          =
                          -
                          1
                          ;

                          The two preprocessing tokens `=' and `-' are not magically
                          joined together into a `=-' preprocessing token. It is often
                          said that the preprocessor performs textual substitution, but
                          this is just loose talk: The preprocessor operates on text
                          that has already been tokenized. You can test this by
                          trying to construct a "modern" two-character operator:

                          #define PLUS +
                          int i = 0;
                          i PLUS= 42;

                          .... and scrutinizing the error messages you get.

                          Of course, your concern is with pre-Standard compilers,
                          some of whose preprocessors did in fact work with text even
                          though (IIRC) K&R said they shouldn't. The resulting bugs
                          were occasionally useful, as in the much-abused

                          #define PASTE(x,y) x/**/y

                          .... for which the C89 Standard had to invent a new notation.
                          [color=blue]
                          > I don't know if the standard formally made "=-" illegal, or kept the
                          > K&R deprecation, but with older compilers, this might be an issue.[/color]

                          When I first encountered C in the late 1970's, these
                          operators had already been respelled to their current forms.
                          The K&R of that vintage mentioned that some "older" compilers
                          might still be found in the wild somewhere. Now, thirty-plus
                          years further along, such compilers deserve a stronger word
                          than just "older."

                          Let's put it this way: What other accommodations do you
                          make on behalf of these "older than older" compilers? Do you
                          avoid using prototypes? Do you avoid `unsigned char'? Do you
                          avoid `long double', or `long'? Do you cast the result of
                          malloc()? Do you refrain from using `size_t' and `time_t'?
                          Do you steer clear of <stdarg.h>? Do you ... well, never mind:
                          The list is already long, and the point is already made.
                          [color=blue]
                          > In general, I tend to make sure that any macros that include a - or a
                          > ~ are parenthesized, to avoid issues with macro expansion and operator
                          > precedence.[/color]

                          Well, it's your choice. It's certainly harmless, and it
                          may help keep strong the habit of parenthesizing value-producing
                          macros in general. But in this case it's not necessary and hasn't
                          been necessary for going on three decades.

                          --
                          Eric.Sosman@sun .com

                          Comment

                          • Dan Pop

                            #14
                            Re: fgets() replacement

                            In <c9l3e9$r3o$1@n ews5.svr.pol.co .uk> "Malcolm" <malcolm@55bank .freeserve.co.u k> writes:

                            [color=blue]
                            >"Dan Pop" <Dan.Pop@cern.c h> wrote in message[color=green]
                            >> Most of the time, there are perfectly sensible limits that can be
                            >> imposed on the user input. As a matter of fact, I have yet to see a
                            >> counterexample. And when the user input is obtained interactively, > the[/color]
                            >user can be warned of those limits, in the text prompting him to[color=green]
                            >> provide the input.
                            >>[/color]
                            >What limit should be imposed on a line of BASIC? Most human-readable code is
                            >under 100 characters, but some code might be machine-generated, and someone
                            >might add a long string as a single line.[/color]

                            The only reason for reading a whole line of BASIC code in a buffer I can
                            imagine is for implementing the interactive line editing capability of a
                            simple minded BASIC interpreter.

                            If this is a small system, like the typical ones using such BASIC
                            interpreters, I'd use the whole memory available in the system and
                            give an error if it still doesn't fit (a common situation when the
                            system is running out of memory: my Spectrum used to beep when I wanted
                            to edit a line too large to be copied in the remaining free memory).

                            On a larger system, I'd use something like a 10..100k buffer and give an
                            error if the line doesn't fit. This is not a context where silent
                            truncation makes any sense.

                            Dan
                            --
                            Dan Pop
                            DESY Zeuthen, RZ group
                            Email: Dan.Pop@ifh.de

                            Comment

                            • Dan Pop

                              #15
                              Re: fgets() replacement

                              In <40BE49B4.20206 05@sun.com> Eric Sosman <Eric.Sosman@su n.com> writes:
                              [color=blue]
                              > When I first encountered C in the late 1970's, these
                              >operators had already been respelled to their current forms.
                              >The K&R of that vintage mentioned that some "older" compilers
                              >might still be found in the wild somewhere.[/color]

                              OTOH, when I first encountered VAX C, in the late 1980's, it was still
                              supporting the anachronic operators mentioned in K&R1 as things of the
                              past... It was after a long debugging session, trying to understand why
                              the compiler generates the "wrong" code for "i=-1;" (I was fluent in VAX
                              assembly and the assembly output of the compiler simply didn't make any
                              sense) that I realised that white space is really my friend when not
                              coerced to the rigours of fixed form FORTRAN.
                              [color=blue]
                              >Now, thirty-plus
                              >years further along, such compilers deserve a stronger word
                              >than just "older."[/color]

                              Yet, the only qualifier we can apply to them is pre-ANSI...

                              Dan
                              --
                              Dan Pop
                              DESY Zeuthen, RZ group
                              Email: Dan.Pop@ifh.de

                              Comment

                              Working...