integer overflow in scanf functions

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • vid512@gmail.com

    integer overflow in scanf functions

    hi.

    i wanted to know why doesn't the scanf functions check for overflow
    when reading number. For example scanf("%d" on 32bit machine considers
    "1" and "4294967297 " to be the same.

    I tracked to code to where the conversion itself happens. Code in
    scanfs just ignores return value from conversion procedures.

    More info in case of glibc posted here:


    AFAIK, implementation doesn't define behavior in case of overflow, so
    glibc could consider this error and return errno=ERANGE

  • Walter Roberson

    #2
    Re: integer overflow in scanf functions

    In article <1166211027.569 204.109900@79g2 000cws.googlegr oups.com>,
    vid512@gmail.co m <vid512@gmail.c omwrote:
    >i wanted to know why doesn't the scanf functions check for overflow
    >when reading number. For example scanf("%d" on 32bit machine considers
    >"1" and "4294967297 " to be the same.
    Because that's how it is spec'd.

    "An input item is defined as the longest matching sequence of
    characters, unless that exceeds a specified field width, in
    which case it is the initial subsequence of that length in
    the sequence." [...]

    "Except in the case of a % specifier, the input item (or, in the
    case of a %n directive, the count of input characters) is
    converted to a type appropriate for the conversion specifier. [...]
    Unless assignment suppression was indicated by a *, the result
    of the conversion is placed in the object pointed to by the first
    argument following the format argument that has not
    already received a conversion result. If this object does not
    have an appropriate type, or if the result of the conversion cannot
    be represented in the space provided, the behaviour is undefined."


    So there you have it: if you didn't put in a field width, then
    the %d is *required* to pull in all the decimal digits there, and
    if that's too big for an int, then the result is officially undefined.
    This is how fscanf (and hence scanf) are -required- to work according
    to the standard.
    --
    I was very young in those days, but I was also rather dim.
    -- Christopher Priest

    Comment

    • Random832

      #3
      Re: integer overflow in scanf functions

      2006-12-15 <elutkf$8mk$1@c anopus.cc.umani toba.ca>,
      Walter Roberson wrote:
      In article <1166211027.569 204.109900@79g2 000cws.googlegr oups.com>,
      vid512@gmail.co m <vid512@gmail.c omwrote:
      >
      >>i wanted to know why doesn't the scanf functions check for overflow
      >>when reading number. For example scanf("%d" on 32bit machine considers
      >>"1" and "4294967297 " to be the same.
      >
      Because that's how it is spec'd.
      >
      "An input item is defined as the longest matching sequence of
      characters,
      And in what way is "429496729" a matching sequence of characters, if
      there is no such integer value?
      unless that exceeds a specified field width, in
      which case it is the initial subsequence of that length in
      the sequence." [...]
      >
      "Except in the case of a % specifier, the input item (or, in the
      case of a %n directive, the count of input characters) is
      converted to a type appropriate for the conversion specifier. [...]
      Unless assignment suppression was indicated by a *, the result
      of the conversion is placed in the object pointed to by the first
      argument following the format argument that has not
      already received a conversion result. If this object does not
      have an appropriate type, or if the result of the conversion cannot
      be represented in the space provided, the behaviour is undefined."
      It's undefined. Which means there _are_ no requirements. An
      implementation is free to treat it as 1, or as 429496729 with 7 still on
      the stream, or as such with 7 _not_ still on the stream, or as
      4294967295 (saturation), etc, etc

      Anyway, I found a possible situation in which my scanf is
      non-conformant:

      Numerical strings are truncated to 512 characters; for example, %f
      and %d are implicitly %512f and %512d.

      So, if I send %f

      1.0000000000000 000000000000000 000000000000000 000000000000000 000
      000000000000000 000000000000000 000000000000000 000000000000000 000
      000000000000000 000000000000000 000000000000000 000000000000000 000
      000000000000000 000000000000000 000000000000000 000000000000000 000
      000000000000000 000000000000000 000000000000000 000000000000000 000
      000000000000000 000000000000000 000000000000000 000000000000000 000
      000000000000000 000000000000000 000000000000000 000000000000000 000
      000000000000000 000000000000000 000000000000000 000000000000000 000
      e1

      it converts to 1 instead of 10. Does the standard allow this?

      Comment

      • jacob navia

        #4
        Re: integer overflow in scanf functions

        Walter Roberson a écrit :
        In article <1166211027.569 204.109900@79g2 000cws.googlegr oups.com>,
        vid512@gmail.co m <vid512@gmail.c omwrote:
        >
        >
        >>i wanted to know why doesn't the scanf functions check for overflow
        >>when reading number. For example scanf("%d" on 32bit machine considers
        >>"1" and "4294967297 " to be the same.
        >
        >
        Because that's how it is spec'd.
        >
        "An input item is defined as the longest matching sequence of
        characters, unless that exceeds a specified field width, in
        which case it is the initial subsequence of that length in
        the sequence." [...]
        >
        "Except in the case of a % specifier, the input item (or, in the
        case of a %n directive, the count of input characters) is
        converted to a type appropriate for the conversion specifier. [...]
        Unless assignment suppression was indicated by a *, the result
        of the conversion is placed in the object pointed to by the first
        argument following the format argument that has not
        already received a conversion result. If this object does not
        have an appropriate type, or if the result of the conversion cannot
        be represented in the space provided, the behaviour is undefined."
        >
        >
        So there you have it: if you didn't put in a field width, then
        the %d is *required* to pull in all the decimal digits there, and
        if that's too big for an int, then the result is officially undefined.
        This is how fscanf (and hence scanf) are -required- to work according
        to the standard.
        In general functions like scanf are unusable. They are so
        problematic, that it is a wonder when they work at all.

        Use strtol, or a similar function that will give reasonable
        error returns...

        Comment

        • Walter Roberson

          #5
          Re: integer overflow in scanf functions

          In article <slrneo60qq.478 .random@rlaptop .random.yi.org> ,
          Random832 <random832@gmai l.comwrote:
          >2006-12-15 <elutkf$8mk$1@c anopus.cc.umani toba.ca>,
          >Walter Roberson wrote:
          >In article <1166211027.569 204.109900@79g2 000cws.googlegr oups.com>,
          >vid512@gmail.co m <vid512@gmail.c omwrote:
          >>>i wanted to know why doesn't the scanf functions check for overflow
          >"An input item is defined as the longest matching sequence of
          >characters,
          >And in what way is "429496729" a matching sequence of characters, if
          >there is no such integer value?
          The match is based upon the lexical grammar, and the lexical
          grammar does not put limitations on the number or content of the
          decimal digits.
          --
          Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson

          Comment

          • Random832

            #6
            Re: integer overflow in scanf functions

            2006-12-15 <elv3os$gmt$1@c anopus.cc.umani toba.ca>,
            Walter Roberson wrote:
            In article <slrneo60qq.478 .random@rlaptop .random.yi.org> ,
            Random832 <random832@gmai l.comwrote:
            >>2006-12-15 <elutkf$8mk$1@c anopus.cc.umani toba.ca>,
            >>Walter Roberson wrote:
            >>In article <1166211027.569 204.109900@79g2 000cws.googlegr oups.com>,
            >>vid512@gmail.co m <vid512@gmail.c omwrote:
            >
            >>>>i wanted to know why doesn't the scanf functions check for overflow
            >
            >>"An input item is defined as the longest matching sequence of
            >>characters,
            >
            >>And in what way is "429496729" a matching sequence of characters, if
            >>there is no such integer value?
            >
            The match is based upon the lexical grammar, and the lexical
            grammar does not put limitations on the number or content of the
            decimal digits.
            OK. The rest of my post stands. undefined is undefined, it's not
            "required" to do anything in such a case.

            Comment

            • vid512@gmail.com

              #7
              Re: integer overflow in scanf functions

              so, we agree, it's undefined.

              wouldn't it be better to return this overflow as error? 10 digits would
              be read off the file/stream/whatever, and function will return as if
              number format was invalid, with errno=ERANGE.

              i don't think that current behavior is what people await. and scanf
              functions are doing lot of "smart" stuff already, just because people
              await such behavior.

              Comment

              • Walter Roberson

                #8
                Re: integer overflow in scanf functions

                In article <slrneo60qq.478 .random@rlaptop .random.yi.org> ,
                Random832 <random832@gmai l.comwrote:
                >"Except in the case of a % specifier, the input item (or, in the
                >case of a %n directive, the count of input characters) is
                >converted to a type appropriate for the conversion specifier. [...]
                >Unless assignment suppression was indicated by a *, the result
                >of the conversion is placed in the object pointed to by the first
                >argument following the format argument that has not
                >already received a conversion result. If this object does not
                >have an appropriate type, or if the result of the conversion cannot
                >be represented in the space provided, the behaviour is undefined."
                >It's undefined. Which means there _are_ no requirements. An
                >implementati on is free to treat it as 1, or as 429496729 with 7 still on
                >the stream, or as such with 7 _not_ still on the stream, or as
                >4294967295 (saturation), etc, etc
                No, consumption of the maximum characters is -required-. It cannot
                leave the other characters in the stream. The undefined part comes
                in the valuation and storage of the overly-long result, not in
                how many characters are consumed from input.
                --
                All is vanity. -- Ecclesiastes

                Comment

                • Eric Sosman

                  #9
                  Re: integer overflow in scanf functions

                  Walter Roberson wrote:
                  In article <slrneo60qq.478 .random@rlaptop .random.yi.org> ,
                  Random832 <random832@gmai l.comwrote:
                  >>
                  >It's undefined. Which means there _are_ no requirements. An
                  >implementati on is free to treat it as 1, or as 429496729 with 7 still on
                  >the stream, or as such with 7 _not_ still on the stream, or as
                  >4294967295 (saturation), etc, etc
                  >
                  No, consumption of the maximum characters is -required-. It cannot
                  leave the other characters in the stream. The undefined part comes
                  in the valuation and storage of the overly-long result, not in
                  how many characters are consumed from input.
                  Once undefined behavior strikes, the program has no way
                  to tell how many characters were or were not consumed. All
                  requirements lose their force in the face of U.B.

                  --
                  Eric Sosman
                  esosman@acm-dot-org.invalid

                  Comment

                  • Random832

                    #10
                    Re: integer overflow in scanf functions

                    2006-12-15 <elv7ne$lvr$1@c anopus.cc.umani toba.ca>,
                    Walter Roberson wrote:
                    In article <slrneo60qq.478 .random@rlaptop .random.yi.org> ,
                    Random832 <random832@gmai l.comwrote:
                    >>"Except in the case of a % specifier, the input item (or, in the
                    >>case of a %n directive, the count of input characters) is
                    >>converted to a type appropriate for the conversion specifier. [...]
                    >>Unless assignment suppression was indicated by a *, the result
                    >>of the conversion is placed in the object pointed to by the first
                    >>argument following the format argument that has not
                    >>already received a conversion result. If this object does not
                    >>have an appropriate type, or if the result of the conversion cannot
                    >>be represented in the space provided, the behaviour is undefined."
                    >
                    >>It's undefined. Which means there _are_ no requirements. An
                    >>implementatio n is free to treat it as 1, or as 429496729 with 7 still on
                    >>the stream, or as such with 7 _not_ still on the stream, or as
                    >>4294967295 (saturation), etc, etc
                    >
                    No, consumption of the maximum characters is -required-. It cannot
                    leave the other characters in the stream. The undefined part comes
                    in the valuation and storage of the overly-long result, not in
                    how many characters are consumed from input.
                    No, I don't think you get it.

                    In an undefined situation, the standard forbids nothing.

                    Meaning the implementation gets to do whatever the f*** it wants to,
                    regarding anything, once anything has happened that has been undefined.

                    Comment

                    • CBFalconer

                      #11
                      Re: integer overflow in scanf functions

                      jacob navia wrote:
                      >
                      .... snip ...
                      >
                      In general functions like scanf are unusable. They are so
                      problematic, that it is a wonder when they work at all.
                      >
                      Use strtol, or a similar function that will give reasonable
                      error returns...
                      No, that requires assigning a buffer of sufficient size, which is
                      unknown a-priori. Instead take a look at:

                      <http://cbfalconer.home .att.net/download/txtio.zip>

                      (which has been revised, but not posted) for a method of reading
                      values from a text stream without any buffer assignment needed. In
                      particular see txtinput.c.

                      --
                      Chuck F (cbfalconer at maineline dot net)
                      Available for consulting/temporary embedded and systems.
                      <http://cbfalconer.home .att.net>


                      Comment

                      • Peter Nilsson

                        #12
                        Re: integer overflow in scanf functions

                        Eric Sosman wrote:
                        Walter Roberson wrote:
                        In article <slrneo60qq.478 .random@rlaptop .random.yi.org> ,
                        Random832 <random832@gmai l.comwrote:
                        >
                        It's undefined. Which means there _are_ no requirements. An
                        implementation is free to treat it as 1, or as 429496729 with 7 still on
                        the stream, or as such with 7 _not_ still on the stream, or as
                        4294967295 (saturation), etc, etc
                        No, consumption of the maximum characters is -required-. It cannot
                        leave the other characters in the stream. The undefined part comes
                        in the valuation and storage of the overly-long result, not in
                        how many characters are consumed from input.
                        >
                        Once undefined behavior strikes, the program has no way
                        to tell how many characters were or were not consumed.
                        All requirements lose their force in the face of U.B.
                        True, but suppose an implementation defines the usual non-trapping 2c
                        overflow or strtoxxx style behaviour for the %d fscanf case, then the
                        behaviour is no longer undefined and the normal rules apply.

                        Of course, few implementations go so far as to actually define (i.e.
                        guarantee) such behaviour, let alone document it.

                        --
                        Peter

                        Comment

                        • Walter Roberson

                          #13
                          Re: integer overflow in scanf functions

                          In article <slrneo6dlm.4g3 .random@rlaptop .random.yi.org> ,
                          Random832 <random832@gmai l.comwrote:
                          >2006-12-15 <elv7ne$lvr$1@c anopus.cc.umani toba.ca>,
                          >Walter Roberson wrote:
                          >In article <slrneo60qq.478 .random@rlaptop .random.yi.org> ,
                          >Random832 <random832@gmai l.comwrote:
                          >>>Unless assignment suppression was indicated by a *, the result
                          >>>of the conversion is placed in the object pointed to by the first
                          >>>argument following the format argument that has not
                          >>>already received a conversion result. If this object does not
                          >>>have an appropriate type, or if the result of the conversion cannot
                          >>>be represented in the space provided, the behaviour is undefined."
                          >No, I don't think you get it.
                          >In an undefined situation, the standard forbids nothing.
                          >Meaning the implementation gets to do whatever the f*** it wants to,
                          >regarding anything, once anything has happened that has been undefined.
                          The C90 standard defines a three-part operation, first reading
                          the characters, then converting the type of the value, and then
                          attempting to store the received value. The first two parts
                          do not allow for undefined behaviour: only the storage aspect does.

                          Therefor, in a conforming C90 implementation, the complete sequence
                          of decimal digits is certain to be read. Stopping reading the stream
                          at the maximum usable int length (for %d) is not one of the options.
                          The "undefined behaviour" might then go through the trouble of
                          "putting back" the extra characters somehow, but read them first it
                          must.

                          Ah, there's a simple way to tell: use assignment supression. Then no
                          actual storage attempt takes place, so whether the receiving variable
                          is the right size or type is not at question, and undefined behaviour
                          cannot take place. If you then have another format element to read a
                          value, or use %n to find the number of characters read, you can
                          determine where the %d scan left off. C90 tells you where you
                          should be (i.e., after the sequence of decimal characters); if
                          your system does leave you in the middle then your system is wrong.
                          --
                          There are some ideas so wrong that only a very intelligent person
                          could believe in them. -- George Orwell

                          Comment

                          • Chris Torek

                            #14
                            Re: integer overflow in scanf functions

                            In article <slrneo60qq.478 .random@rlaptop .random.yi.org>
                            Random832 <random832@gmai l.comwrote:
                            >Anyway, I found a possible situation in which my scanf is
                            >non-conformant:
                            >
                            Numerical strings are truncated to 512 characters; for example, %f
                            and %d are implicitly %512f and %512d.
                            >
                            >So, if I send %f
                            >
                            >1.000000000000 000000000000000 000000000000000 000000000000000 0000
                            >00000000000000 000000000000000 000000000000000 000000000000000 0000
                            >00000000000000 000000000000000 000000000000000 000000000000000 0000
                            >00000000000000 000000000000000 000000000000000 000000000000000 0000
                            >00000000000000 000000000000000 000000000000000 000000000000000 0000
                            >00000000000000 000000000000000 000000000000000 000000000000000 0000
                            >00000000000000 000000000000000 000000000000000 000000000000000 0000
                            >00000000000000 000000000000000 000000000000000 000000000000000 0000
                            >e1
                            >
                            >it converts to 1 instead of 10. Does the standard allow this?
                            Yes:

                            Environmental limits

                            [#7] An implementation shall support text files with lines
                            containing at least 254 characters, including the
                            terminating new-line character. The value of the macro
                            BUFSIZ shall be at least 256.

                            (under "7.13.2 Streams" in the draft .txt file I keep handy).

                            Most stdio implementations will have *some* convenient limit, as
                            they will read numerical input into a buffer and then use strtol(),
                            strtoll(), strtod(), etc., to perform the actual conversions. That
                            limit must be at least 254, but need not be as high as BUFSIZ (that
                            is, just because BUFSIZ is, say, 8192, does not mean that scanf()
                            must be able to eat 8192-digit numbers).
                            --
                            In-Real-Life: Chris Torek, Wind River Systems
                            Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
                            email: forget about it http://web.torek.net/torek/index.html
                            Reading email is like searching for food in the garbage, thanks to spammers.

                            Comment

                            • Random832

                              #15
                              Re: integer overflow in scanf functions

                              2006-12-16 <em01uf$ql9$1@c anopus.cc.umani toba.ca>,
                              Walter Roberson wrote:
                              The C90 standard defines a three-part operation, first reading
                              the characters, then converting the type of the value, and then
                              attempting to store the received value. The first two parts
                              do not allow for undefined behaviour: only the storage aspect does.
                              And once the storage aspect _does_ have undefined behavior, it can
                              then go backwards in time and change how the other two aspects operated
                              in the first place.

                              In an undefined situation, the C standard forbids nothing.
                              Therefor, in a conforming C90 implementation, the complete sequence
                              of decimal digits is certain to be read. Stopping reading the stream
                              at the maximum usable int length (for %d) is not one of the options.
                              The "undefined behaviour" might then go through the trouble of
                              "putting back" the extra characters somehow, but read them first it
                              must.
                              It's undefined, there's no rule against time paradoxes.
                              Ah, there's a simple way to tell: use assignment supression. Then no
                              actual storage attempt takes place, so whether the receiving variable
                              is the right size or type is not at question, and undefined behaviour
                              cannot take place.
                              But since the behavior is undefined when assignment suppression is not
                              used, it's free to act differently than if it is used.

                              Comment

                              Working...