Line input and implementation-defined behaviour

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Enrico `Trippo' Porreca

    Line input and implementation-defined behaviour

    Both K&R book and Steve Summit's tutorial define a getline() function
    correctly testing the return value of getchar() against EOF.

    I know that getchar() returns EOF or the character value cast to
    unsigned char.

    Since char may be signed (and if so, the return value of getchar() would
    be outside its range), doesn't the commented line in the following code
    produce implementation-defined behaviour?

    char s[SIZE];
    int c;
    size_t i = 0;

    while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    s[i] = c; /* ??? */
    i++;
    }

    s[i] = '\0';

    If this is indeed implementation defined, is there any solution?

    --
    Enrico `Trippo' Porreca

  • Simon Biber

    #2
    Re: Line input and implementation-defined behaviour

    "Enrico `Trippo' Porreca" <trippo@lombard iacom.it> wrote:[color=blue]
    > Since char may be signed (and if so, the return value of getchar()
    > would be outside its range), doesn't the commented line in the
    > following code produce implementation-defined behaviour?[/color]

    Almost. If a character is read whose code is out of the range of
    signed char, it produces an implementation-defined result, or an
    implementation-defined signal is raised. This is not quite as bad
    as implementation-defined behaviour, but almost.
    [color=blue]
    > char s[SIZE];
    > int c;
    > size_t i = 0;
    >
    > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
    > s[i] = c; /* ??? */
    > i++;
    > }
    >
    > s[i] = '\0';
    >
    > If this is indeed implementation defined, is there any solution?[/color]

    If char is signed, and the value of the character is outside the
    range of signed char, then you have an out-of-range conversion to
    a signed integer type, so: "either the result is implementation-defined
    or an implementation-defined signal is raised." (C99 6.3.1.3#3)

    However, because this is such an incredibly common operation in
    existing C code, an implementor would be absolutely idiotic to
    define this to have any undesired effects.

    --
    Simon.


    Comment

    • Enrico `Trippo' Porreca

      #3
      Re: Line input and implementation-defined behaviour

      Simon Biber wrote:[color=blue][color=green]
      >>char s[SIZE];
      >>int c;
      >>size_t i = 0;
      >>
      >>while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
      >> s[i] = c; /* ??? */
      >> i++;
      >>}
      >>
      >>s[i] = '\0';
      >>
      >>If this is indeed implementation defined, is there any solution?[/color]
      >
      > If char is signed, and the value of the character is outside the
      > range of signed char, then you have an out-of-range conversion to
      > a signed integer type, so: "either the result is implementation-defined
      > or an implementation-defined signal is raised." (C99 6.3.1.3#3)
      >
      > However, because this is such an incredibly common operation in
      > existing C code, an implementor would be absolutely idiotic to
      > define this to have any undesired effects.[/color]

      I agree, but AFAIK the implementor is allowed to be idiot...
      Am I right?

      Is the following a plausible solution (i.e. without any trap
      representation or type conversion or something-defined behaviour problem)?

      char s[SIZE];
      unsigned char *t = (unsigned char *) s;
      int c;
      size_t i = 0;

      while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
      t[i] = c; /* ??? */
      i++;
      }

      s[i] = '\0';

      --
      Enrico `Trippo' Porreca

      Comment

      • Simon Biber

        #4
        Re: Line input and implementation-defined behaviour

        "Enrico `Trippo' Porreca" <trippo@lombard iacom.it> wrote:[color=blue]
        > I agree, but AFAIK the implementor is allowed to be idiot...
        > Am I right?[/color]

        Yes, but trust me, anyone who fouled up the char<->int conversion
        would break a large proportion of existing code that is considered
        to be completely portable. Therefore their implementation would
        not sell.

        Consider the <ctype.h> functions, which require that the input is
        an int whose value is within the range of unsigned char. That is
        why we suggest that people cast to unsigned char like this:
        char *p, s[] = "hello";
        for(p = s; *p; p++)
        *p = toupper((unsign ed char)*p);
        Now if the value of *p was negative, now when converted to unsigned
        char it is positive and outside the range of signed char. So this
        could theoretically be outside the range of int, if int and signed
        char have the same range. Therefore you have the same situation in
        reverse - unsigned char to int conversion is not guaranteed to be
        within range.
        [color=blue]
        > Is the following a plausible solution (i.e. without any trap
        > representation or type conversion or something-defined behaviour
        > problem)?
        >
        > char s[SIZE];
        > unsigned char *t = (unsigned char *) s;
        > int c;
        > size_t i = 0;
        >
        > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
        > t[i] = c; /* ??? */[/color]

        The assignment itself is safe, but since it places an arbitrary
        representation into the elements of the array s, which are char
        objects and possibly signed, it might generate a trap
        representation. That is if signed char can have trap
        representations . I'm not completely sure.
        [color=blue]
        > i++;
        > }
        >
        > s[i] = '\0';[/color]

        --
        Simon.


        Comment

        • Malcolm

          #5
          Re: Line input and implementation-defined behaviour


          "Simon Biber" <news@ralminNOS PAM.cc> wrote in message[color=blue]
          >[color=green]
          > > char s[SIZE];
          > > unsigned char *t = (unsigned char *) s;
          > > int c;
          > > size_t i = 0;
          > >
          > > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
          > > t[i] = c; /* ??? */[/color][/color]

          s[i] = 0;[color=blue]
          >
          > The assignment itself is safe, but since it places an arbitrary
          > representation into the elements of the array s, which are char
          > objects and possibly signed, it might generate a trap
          > representation. That is if signed char can have trap
          > representations . I'm not completely sure.
          >[/color]
          signed chars can trap. unsigned chars are guaranteed to be able to hold
          arbitrary data so cannot.
          You would have to be desperately unlucky for the implementation to allow
          non-chars to be read in from stdin, and then for the function to trap. The
          most likely place for the trap to trigger would be the assignment s[i] = 0,
          since the compiler probably won't realise that pointer t actually points to
          a buffer declared as straight char.


          Comment

          • Peter Nilsson

            #6
            Re: Line input and implementation-defined behaviour

            "Malcolm" <malcolm@55bank .freeserve.co.u k> wrote in message
            news:bl52k9$ure $1@news6.svr.po l.co.uk...[color=blue]
            >
            > "Simon Biber" <news@ralminNOS PAM.cc> wrote in message[color=green]
            > >[color=darkred]
            > > > char s[SIZE];
            > > > unsigned char *t = (unsigned char *) s;
            > > > int c;
            > > > size_t i = 0;
            > > >
            > > > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
            > > > t[i] = c; /* ??? */[/color][/color]
            >
            > s[i] = 0;[color=green]
            > >
            > > The assignment itself is safe, but since it places an arbitrary
            > > representation into the elements of the array s, which are char
            > > objects and possibly signed, it might generate a trap
            > > representation. That is if signed char can have trap
            > > representations . I'm not completely sure.
            > >[/color]
            > signed chars can trap. unsigned chars are guaranteed to be able to hold
            > arbitrary data so cannot.
            > You would have to be desperately unlucky for the implementation to allow
            > non-chars to be read in from stdin, and then for the function to trap. The
            > most likely place for the trap to trigger would be the assignment s[i] =[/color]
            0,

            0 is a value in the range of signed char, so it is not possible for a
            conforming compiler to replace the contents of object s[i] with a trap
            representation.

            [You can always initialise an unitialised automatic variable for instance,
            even if it's uninitialised state is a trap representation.]
            [color=blue]
            > since the compiler probably won't realise that pointer t actually points[/color]
            to[color=blue]
            > a buffer declared as straight char.[/color]

            You seem to be confusing 'trap representations ' for 'trap'. The latter term
            commonly being used for raised exceptions on many architectures. A trap
            representation, in and of itself, need not raise an exception.

            Indeed, whilst the standards allow signed char to have trap representations ,
            sections like 6.2.6.1p5 effectively say that all reads via character lvalues
            are privileged. So at worst, it would seem, reading a character trap
            representation will only yield an unspecified value. [Non-trapping trap
            representations !]

            --
            Peter


            Comment

            • Malcolm

              #7
              Re: Line input and implementation-defined behaviour


              "Peter Nilsson" <airia@acay.com .au> wrote in message[color=blue]
              >[color=green]
              > > The most likely place for the trap to trigger would be the assignment
              > > s[i] = 0,[/color]
              >
              > 0 is a value in the range of signed char, so it is not possible for a
              > conforming compiler to replace the contents of object s[i] with a trap
              > representation.
              >[/color]
              What I meant was that the assignment may trigger the trap, if illegal
              characters are stored into the array s. This is because values from s may be
              loaded into registers as chars.[color=blue]
              >
              > Indeed, whilst the standards allow signed char to have trap
              > representations , sections like 6.2.6.1p5 effectively say that all reads[/color]
              via[color=blue]
              > character lvalues are privileged. So at worst, it would seem, reading a
              > character trap representation will only yield an unspecified value. [Non-
              > trapping trap representations !]
              >[/color]
              It seems it would be unacceptable for the line

              fgets(line, sizeof line, fp);

              to cause a program abort if fed an illegal character, with nothing the
              programmer can do to stop it. OTOH reads are the most likely way for corrupt
              data to get into the data, and the whole point of trap representations is to
              close down any program that is malfunctioning.



              Comment

              • Enrico `Trippo' Porreca

                #8
                Re: Line input and implementation-defined behaviour

                Simon Biber wrote:[color=blue][color=green]
                >> I agree, but AFAIK the implementor is allowed to be idiot...
                >> Am I right?[/color]
                >
                > Yes, but trust me, anyone who fouled up the char<->int conversion
                > would break a large proportion of existing code that is considered
                > to be completely portable. Therefore their implementation would
                > not sell.[/color]

                Uhm... So I think I should use K&R's getline(), without being too
                paranoid about it...

                Thanks.

                --
                Enrico `Trippo' Porreca

                Comment

                • CBFalconer

                  #9
                  Re: Line input and implementation-defined behaviour

                  Enrico `Trippo' Porreca wrote:[color=blue]
                  > Simon Biber wrote:
                  >[color=green][color=darkred]
                  > >> I agree, but AFAIK the implementor is allowed to be idiot...
                  > >> Am I right?[/color]
                  > >
                  > > Yes, but trust me, anyone who fouled up the char<->int conversion
                  > > would break a large proportion of existing code that is considered
                  > > to be completely portable. Therefore their implementation would
                  > > not sell.[/color]
                  >
                  > Uhm... So I think I should use K&R's getline(), without being too
                  > paranoid about it...[/color]

                  Consider ggets, available at:

                  <http://cbfalconer.home .att.net/download/>

                  which has the convenience of gets without the insecurities.

                  --
                  Chuck F (cbfalconer@yah oo.com) (cbfalconer@wor ldnet.att.net)
                  Available for consulting/temporary embedded and systems.
                  <http://cbfalconer.home .att.net> USE worldnet address!


                  Comment

                  • Dan Pop

                    #10
                    Re: Line input and implementation-defined behaviour

                    In <3f75cf48$0$418 9$afc38c87@news .optusnet.com.a u> "Simon Biber" <news@ralminNOS PAM.cc> writes:
                    [color=blue]
                    >"Enrico `Trippo' Porreca" <trippo@lombard iacom.it> wrote:[color=green]
                    >> Since char may be signed (and if so, the return value of getchar()
                    >> would be outside its range), doesn't the commented line in the
                    >> following code produce implementation-defined behaviour?[/color]
                    >
                    >Almost. If a character is read whose code is out of the range of
                    >signed char, it produces an implementation-defined result, or an
                    >implementati on-defined signal is raised. This is not quite as bad
                    >as implementation-defined behaviour, but almost.[/color]

                    No implementation-defined signal is raised in C89 and I strongly doubt
                    that any *real* C99 implementation would do that, breaking existing C89
                    code.

                    Dan
                    --
                    Dan Pop
                    DESY Zeuthen, RZ group
                    Email: Dan.Pop@ifh.de

                    Comment

                    • Simon Biber

                      #11
                      Re: Line input and implementation-defined behaviour

                      Added comp.std.c - we are discussing the effect of conversion of
                      an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
                      "either the result is implementation-defined or an
                      implementation-defined signal is raised."

                      "Dan Pop" <Dan.Pop@cern.c h> wrote:[color=blue]
                      > "Simon Biber" <news@ralminNOS PAM.cc> writes:[color=green]
                      > >Almost. If a character is read whose code is out of the range of
                      > >signed char, it produces an implementation-defined result, or an
                      > >implementati on-defined signal is raised. This is not quite as bad
                      > >as implementation-defined behaviour, but almost.[/color]
                      >
                      > No implementation-defined signal is raised in C89 and I strongly doubt
                      > that any *real* C99 implementation would do that, breaking existing C89
                      > code.[/color]

                      Why was the 'implementation-defined signal' for signed integer
                      conversions added in C99? Was there some implementation that
                      required it, in order to be conforming?

                      --
                      Simon.


                      Comment

                      • Clive D. W. Feather

                        #12
                        Re: Line input and implementation-defined behaviour

                        In article <3f789d28$0$269 24$afc38c87@new s.optusnet.com. au>, Simon Biber
                        <news@ralminNOS PAM.cc> writes[color=blue]
                        >Added comp.std.c - we are discussing the effect of conversion of
                        >an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
                        > "either the result is implementation-defined or an
                        > implementation-defined signal is raised."[/color]
                        [...][color=blue]
                        >Why was the 'implementation-defined signal' for signed integer
                        >conversions added in C99? Was there some implementation that
                        >required it, in order to be conforming?[/color]

                        No.

                        However, the point was raised - and many of us considered it a good one
                        - that the C89 Standard *requires* the silent generation of a nonsense
                        value with no easy way to detect that fact. In some programming
                        situations ("mission-critical code"), you'd much rather the compiler
                        generated code to trap this case and alert you in some way - a panic is
                        far better than a bad value slipping into a later calculation.

                        So we decided to offer this option to the compiler writer. There's no
                        requirement to take it, but it's available.

                        --
                        Clive D.W. Feather, writing for himself | Home: <clive@davros.o rg>
                        Tel: +44 20 8371 1138 (work) | Web: <http://www.davros.org>
                        Fax: +44 870 051 9937 | Work: <clive@demon.ne t>
                        Written on my laptop; please observe the Reply-To address

                        Comment

                        • Douglas A. Gwyn

                          #13
                          Re: Line input and implementation-defined behaviour

                          "Clive D. W. Feather" wrote:[color=blue]
                          > Simon Biber <news@ralminNOS PAM.cc> writes[color=green]
                          > >Added comp.std.c - we are discussing the effect of conversion of
                          > >an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
                          > > "either the result is implementation-defined or an
                          > > implementation-defined signal is raised."
                          > >Why was the 'implementation-defined signal' for signed integer
                          > >conversions added in C99? Was there some implementation that
                          > >required it, in order to be conforming?[/color]
                          > However, the point was raised - and many of us considered it a good one
                          > - that the C89 Standard *requires* the silent generation of a nonsense
                          > value with no easy way to detect that fact. In some programming
                          > situations ("mission-critical code"), you'd much rather the compiler
                          > generated code to trap this case and alert you in some way - a panic is
                          > far better than a bad value slipping into a later calculation.[/color]

                          Note that not everybody involved agrees with that reasoning.
                          In fact this is fundamentally flawed, since such conversions
                          can occur at translation time (within the #if constant-
                          expression) but the signal is an execution-time notion.

                          Comment

                          • Dan Pop

                            #14
                            Re: Line input and implementation-defined behaviour

                            In <iSuee7DwZTe$Ew gC@romana.davro s.org> "Clive D. W. Feather" <clive@on-the-train.demon.co. uk> writes:
                            [color=blue]
                            >In article <3f789d28$0$269 24$afc38c87@new s.optusnet.com. au>, Simon Biber
                            ><news@ralminNO SPAM.cc> writes[color=green]
                            >>Added comp.std.c - we are discussing the effect of conversion of
                            >>an out-of-range value to a signed integer type, as in C99 6.3.1.3#3
                            >> "either the result is implementation-defined or an
                            >> implementation-defined signal is raised."[/color]
                            >[...][color=green]
                            >>Why was the 'implementation-defined signal' for signed integer
                            >>conversions added in C99? Was there some implementation that
                            >>required it, in order to be conforming?[/color]
                            >
                            >No.
                            >
                            >However, the point was raised - and many of us considered it a good one
                            >- that the C89 Standard *requires* the silent generation of a nonsense
                            >value with no easy way to detect that fact.[/color]

                            C89 offers a very easy way of detecting it, where it actually matters:
                            compare the value before the conversion to the limits of the target type.

                            It also allows the detection of these limits, when they are not known at
                            compile time (see below).
                            [color=blue]
                            >In some programming
                            >situations ("mission-critical code"), you'd much rather the compiler
                            >generated code to trap this case and alert you in some way - a panic is
                            >far better than a bad value slipping into a later calculation.[/color]

                            A panic is seldom desirable in mission-critical code and there is no way
                            to recover after the generation of such a signal without invoking
                            undefined behaviour. Therefore, mission-critical code has to do it the
                            C89 way, anyway.
                            [color=blue]
                            >So we decided to offer this option to the compiler writer. There's no
                            >requirement to take it, but it's available.[/color]

                            It breaks portable C89 code that attempts to find the maximum value
                            that can be represented in an unknown signed integer type, say type_t:

                            unsigned long max = -1;

                            while ((type_t)max < 0 || (type_t)max != max) max >>= 1;

                            So, it is perfectly possible to write C89 code that is immune to
                            nonsensical values resulting from the conversion. There is NO way
                            to rewrite this code in *portable* C99.

                            Dan
                            --
                            Dan Pop
                            DESY Zeuthen, RZ group
                            Email: Dan.Pop@ifh.de

                            Comment

                            • lawrence.jones@eds.com

                              #15
                              Re: Line input and implementation-defined behaviour

                              In comp.std.c Simon Biber <news@ralminnos pam.cc> wrote:[color=blue]
                              >
                              > Why was the 'implementation-defined signal' for signed integer
                              > conversions added in C99? Was there some implementation that
                              > required it, in order to be conforming?[/color]

                              Because raising an "overflow" signal is an entirely reasonable thing to
                              do in that situation. In C89, it wasn't entirely clear whether
                              "implementa tion-defined behavior" allowed that or not, but in C99 it's
                              perfectly clear that it does not, so the explicit license was added.

                              -Larry Jones

                              This sounds suspiciously like one of Dad's plots to build my character.
                              -- Calvin

                              Comment

                              Working...