unget vs. putback

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Kevin Saff

    unget vs. putback

    Why are these two similar functions provided? (unget & putback)

    I'm working with a file format where the type of record to read in next
    is
    stored in the first two bytes of the record. From a design standpoint,
    I
    wanted the class general enough to handle longer/more complex keys, so I
    thought I should store this key value in the record class itself, and
    query
    the record types by a "canRead(std::i stream&)" function. This would
    seem to
    require an extended peek capability, since if the record type cannot
    read
    the stream, it must return the stream to its previous location.

    I initially tried using tellg and seekg for this, and was rewarded by
    60% of
    the program being spent within these functions. Replacing these by
    "unget"ing the proper number of characters reduced this to less than 1%.

    All I know that unget() is much better on my PC, with my compiler than
    seekg. Is this likely to be true in general for relatively small (<100)
    numbers of bytes? When would I want to use putback(char) instead?

    The only explanation I can think of is the file may be buffered somehow;
    then ungetting might take you past the beginning of the buffer, whereas
    putback'ing will be able to expand the buffer in this case.




    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
    [ comp.lang.c++.m oderated. First time posters: Do this! ]
  • Dietmar Kuehl

    #2
    Re: unget vs. putback

    Kevin Saff wrote:[color=blue]
    > All I know that unget() is much better on my PC, with my compiler than
    > seekg. Is this likely to be true in general for relatively small (<100)
    > numbers of bytes? When would I want to use putback(char) instead?[/color]

    To best answer these questions, lets have a look at the underlying
    machinery. IOStreams are built on top of stream buffers (that is,
    object of type 'std::basic_str eambuf<cT, traits>'). As the name says
    this class provides a concept of a buffer although it is possible to
    create unbuffered stream buffers (that is, the name is somewhat
    misleading). File streams are very likely to use the internal buffer,
    except, maybe, when using some special files like a tty or a named
    piped. If a buffer is set up for the stream buffer, most operations
    are simple pointer operations: check whether the pointers are in the
    allowed range and do something with the respective character.

    For 'sungetc()', the stream buffer function called by input stream's
    'unget()', this basically means to check whether the current read
    pointer is at the beginning of the buffer and if it is not to move it
    on character back. This operation is very likely to be very fast. If
    'sungetc()' is at the beginning of the buffer, it will call
    'pbackfail(trai ts::eof())'.

    The operation of 'sputbackc()', as you have correctly guessed the
    stream buffer function called by the input stream's 'putback()'
    function, is a little bit more complex and slower: it starts by
    checking whether the current position is at the beginning of the
    buffer and if it is, it checks whether the previous character matches
    the one being put back. If either of these fails, 'sputbackc()' calls
    'pbackfail(c)' with the character put back character as argument
    (after being converted to 'int_type' using 'traits::to_int _type()').
    Otherwise the current read position is moved one character back.

    For the case that putting back a character does not hit a buffer
    boundary this explains that the 'unget()' should be fast. A few
    questions obviously remain:

    - How many characters can be safely put back? The answer is quite
    simple: none. If you are at a buffer boundary, put back can fail
    and there is no guarantee in the standard one the number of
    available put back positions. I would expect any reasonable
    standard library implementation to allow at least one character
    being put back but this is really a quality of implementation
    issue - and it is unclear what is better quality here: there is
    rarely a need for put back (eg. none in the standard library I/O
    functions) and providing a put back buffer would incur unnecessary
    overhead. Also, it is easly worked around this problem by
    providing a filtering stream buffer which allows eg. a specified
    number of put back characters.
    - What does 'pbackfail()' do? Well, it obviously tries to back up
    one position in the stream. In case a wrong character was put back
    it can choose to accept these (ie. using 'putback()' you might be
    able to put characters into the stream which have not been there).
    In case of hitting the beginning of the buffer it might read the
    previous page or simply put the character passed to 'pbackfail()'
    into the buffer after making room somehow, thereby assuming that
    the character was the right one (that is, 'putback()' might be
    successful when 'unget()' is not).
    - What happens when the END of a buffer is reached? Are characters
    retained for put back? When the end of the input buffer is
    reached, 'underflow()' is called. This function is supposed to
    make new buffer with at least one character available. It can set
    up the new buffer in such a way that old characters are retained:
    The buffer is set up with the call 'setg(begin, current, end)'.
    The first argument is the beginning of the buffer, the second is
    the current read position (ie. it points to the character made
    available by 'underflow()'), and the third is the end of the
    buffer. That is, the range [begin, current) is available for put
    back. A library can copy "n" characters from the end of the
    previous buffer to the beginning of the new buffer. Unfortunately,
    there is no guarantee that "n > 0" for file buffers.

    In practical terms, this means, that you cannot rely on the put back
    doing anything useful for the standard streams! There are a few paths
    how to work around this problem:

    - Check the documentation of the standard library you are using: It
    might provide better guarantees for file streams. Of course, this
    way you become dependent on a particular implementation.

    - If the documentation does not tell you anything, you might by able
    to look at the implementation. Note, however, that this is a very
    dangerous path because the implementation for the next version may
    be change.

    - The safest approach would be the creation of a simple filtering
    stream buffer: if you know that you are simply reading the stream
    from beginning to end, except for putting back a maximum of "n"
    characters, such a filtering stream buffer is simple to write. If
    you mix things with seeking within the stream, things become
    somewhat more complex...

    - Avoid put back in the first place. What is the point of processing
    read characters again? There is no problem with peeking at the
    current read position: this always works. ... and for many cases
    this is sufficient.
    [color=blue]
    > The only explanation I can think of is the file may be buffered somehow;[/color]

    There is no guarantee that files are buffered (and, in fact, you can
    turn off buffering by calling 'setbuf(0, 0)' on the stream buffer)
    but I would bet that buffered file streams are the default on all
    implementations : unbuffered file reading is just slow.
    [color=blue]
    > then ungetting might take you past the beginning of the buffer, whereas
    > putback'ing will be able to expand the buffer in this case.[/color]

    This is roughly the deal. Of course, you cannot count on it being
    the case...
    --
    <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
    Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
    [ comp.lang.c++.m oderated. First time posters: Do this! ]

    Comment

    • Siemel Naran

      #3
      Re: unget vs. putback

      "Kevin Saff" <google.com@kev in.saff.net> wrote in message
      [color=blue]
      > I'm working with a file format where the type of record to read in next
      > is
      > stored in the first two bytes of the record. From a design standpoint,
      > I
      > wanted the class general enough to handle longer/more complex keys, so I
      > thought I should store this key value in the record class itself, and
      > query
      > the record types by a "canRead(std::i stream&)" function. This would
      > seem to
      > require an extended peek capability, since if the record type cannot
      > read
      > the stream, it must return the stream to its previous location.[/color]

      How about this? In the generic read function read the first two bytes.
      Then iterate through all the object types in the registry and call the
      (possibly virtual) canRead function. Or perhaps you can set up a map of
      2-bytes to object (or a hashtable or direct address table), in order to
      quickly determine the object type to read. If you find a match then call
      the read function for that object and pass in a parameter to indicate that
      you already read the first 2-bytes. If no match, then use seekg to move the
      stream to the previous position, then throw an error.

      [color=blue]
      > I initially tried using tellg and seekg for this, and was rewarded by
      > 60% of
      > the program being spent within these functions. Replacing these by
      > "unget"ing the proper number of characters reduced this to less than 1%.[/color]

      Do you have a large number of types that can be read? What implementation
      of streams are you using?

      [color=blue]
      > All I know that unget() is much better on my PC, with my compiler than
      > seekg. Is this likely to be true in general for relatively small (<100)
      > numbers of bytes? When would I want to use putback(char) instead?
      >
      > The only explanation I can think of is the file may be buffered somehow;
      > then ungetting might take you past the beginning of the buffer, whereas
      > putback'ing will be able to expand the buffer in this case.[/color]

      I don't know why they have putback(char). Though I wrote a stream buffer
      class that lets you call putback to put an unlimited number of characters
      into the stream, to simulate the user typing those characters. I also have
      a function putback(const char *).

      Both unget and putback may call the streambuf's virtual pbackfail if the
      input sequence cannot be backed up. Putback will also call pbackfail if the
      previous character in the input sequence is not equal to the character you
      are putting back. So putback may be slightly slower. As for pbackfail, I'm
      not exactly sure what it does. It may always return eof() which indicates
      failure, namely that the stream could not be backed up.

      --
      +++++++++++
      Siemel Naran


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.m oderated. First time posters: Do this! ]

      Comment

      • Hans Bos

        #4
        Re: unget vs. putback

        "Dietmar Kuehl" <dietmar_kuehl@ yahoo.com> wrote in message
        news:blsjlp$g07 vp$1@ID-86292.news.uni-berlin.de...[color=blue]
        >
        > For the case that putting back a character does not hit a buffer
        > boundary this explains that the 'unget()' should be fast. A few
        > questions obviously remain:
        >
        > - How many characters can be safely put back? The answer is quite
        > simple: none. If you are at a buffer boundary, put back can fail
        > and there is no guarantee in the standard one the number of
        > available put back positions. I would expect any reasonable
        > standard library implementation to allow at least one character
        > being put back but this is really a quality of implementation
        > issue - and it is unclear what is better quality here: there is
        > rarely a need for put back (eg. none in the standard library I/O
        > functions) and providing a put back buffer would incur unnecessary
        > overhead.[/color]

        When you read a character with snextc you can always putback that character.

        If the character to be read (with snextc) is already in the streambuf's buffer
        it can always be ungetted.

        Otherwise uflow() will be called and it is required
        to transfer the returned character to the "backup sequence".
        Which when nonempty is defined as the gptr() - eback() begining at eback().
        Then gptr() - eback() > 0.
        So in this case you can also putback the last character.

        You can also read characters with sgetc, which will call underflow().
        Underflow() is also required to set the streambuf pointers, so that the
        character
        is in the buffer.

        In the standard library basic_ostream:: operator<<(basi c_streambuf *) uses this
        behaviour.
        When it cannot put a character read from the streambuf into the ostream, the
        character is
        not extracted from the streambuf.

        This can only be done if the streambuf is required to buffer the last character
        read.

        Greetings,
        Hans.


        [ See http://www.gotw.ca/resources/clcm.htm for info about ]
        [ comp.lang.c++.m oderated. First time posters: Do this! ]

        Comment

        • Dietmar Kuehl

          #5
          Re: unget vs. putback

          "Hans Bos" <hans.bos@xelio n.nl> wrote:[color=blue]
          > When you read a character with snextc you can always putback that character.[/color]

          I don't think so...
          [color=blue]
          > If the character to be read (with snextc) is already in the streambuf's buffer
          > it can always be ungetted.[/color]

          Yes, there is no problem with ungetting characters sitting in the current
          buffer. Things become interesting with unbuffered stream buffers and at
          buffer boundaries.
          [color=blue]
          > Otherwise uflow() will be called and it is required
          > to transfer the returned character to the "backup sequence".[/color]

          Which, according to 27.5.2.4.3 paragraph 11, can be empty. I don't see
          any requirement which states that the backup sequence is non-empty after
          the transfer - even though the requirement for 'uflow()' mentions transfer
          of a character to the backup sequence. Effectively, the whole purpose of
          the function 'uflow()' is to allow unbuffered stream buffers in the first
          place. Well, only the current character, as returned by 'sgetc()' has to
          be remembered.
          [color=blue]
          > Which when nonempty is defined as the gptr() - eback() begining at eback().
          > Then gptr() - eback() > 0.
          > So in this case you can also putback the last character.[/color]

          I don't think you can rely on this! Maybe the standard is unclear on this
          isse: I seem to a have rather different interpretation of what happens
          when a character is transfered to an empty backup sequence than you. Can
          any of the other standard library implementers comment on this issue?
          [color=blue]
          > You can also read characters with sgetc, which will call underflow().
          > Underflow() is also required to set the streambuf pointers, so that the
          > character is in the buffer.[/color]

          'underflow()' is definitely *NOT* required to setup the pointer such that
          the character is in the buffer! Great pains are taken to avoid this
          particular requirement: there are several sequences defined and it is
          always said how the alternative would look like if there is explicitly no
          buffer at all. 'underflow()' has, however, to arrange for the stream buffer
          to remember the character being returned: multiple calls to 'underflow()'
          without intervening calls to 'setg()' or 'uflow()' are required to return
          the same character.
          [color=blue]
          > In the standard library basic_ostream:: operator<<(basi c_streambuf *) uses this
          > behaviour.[/color]

          If it relies on this behavior, I think the implementation is non-conforming.
          The buffer may, however, be used directly if available to improve the
          performance. If there is no buffer, the mentioned operator has to be
          careful not to extract a character before sending it is inserted.
          [color=blue]
          > When it cannot put a character read from the streambuf into the ostream, the
          > character is not extracted from the streambuf.[/color]

          Correct. But this does not at all require any form of put back: it merely
          requires that 'underflow()' does not extract the character. Character
          extraction is done either by adjusting the internal pointer or by calling
          'uflow()'.
          [color=blue]
          > This can only be done if the streambuf is required to buffer the last
          > character read.[/color]

          This is correct but a different issue: the stream buffer has to remember
          the last character returned by 'underflow()'. This is, however, not the
          character you can put back: it is the predeeding character which you can
          put back and there is no requirement that this character is buffered!
          Actually, there is no point in putting back the character returned from
          'underflow()' because this character was not yet extracted in the first
          place.

          Even if 'uflow()' would be required to retain a character in the backup
          sequence: a call to 'sbumpc()' at the end of the buffer just returns the
          character from the pending sequence and afterward the pending sequence
          is empty (ie. 'gptr() == egptr()'). If the next operation is 'sgetc()',
          it is 'underflow()' that is getting called. There is no requirement for
          this function to setup a backup sequence.
          --
          <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
          Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

          [ See http://www.gotw.ca/resources/clcm.htm for info about ]
          [ comp.lang.c++.m oderated. First time posters: Do this! ]

          Comment

          • kanze@gabi-soft.fr

            #6
            Re: unget vs. putback

            dietmar_kuehl@y ahoo.com (Dietmar Kuehl) wrote in message
            news:<5b15f8fd. 0310080253.1c15 8e70@posting.go ogle.com>...
            [color=blue]
            > 'underflow()' has, however, to arrange
            > for the stream buffer to remember the character being returned:
            > multiple calls to 'underflow()' without intervening calls to 'setg()'
            > or 'uflow()' are required to return the same character.[/color]

            Do you have any reference in the standard for this. It would seem to be
            common sense, and is what I had always expected. It is not, however,
            what most implementations I've used do. In particular, with most of the
            implementations I've tested (g++ 3.3.1 is the exception), underflow()
            may return EOF on one call, and a legal character on the next call. I'd
            like for this to be illegal, but I can't find anything in the standard
            to back up my wishes.

            The problem is easy to put in evidence under Unix or Windows: try the
            following program:

            #ifndef OLD
            #include <iostream>
            #include <ios>
            #include <ostream>
            #include <streambuf>
            #else
            #include <iostream.h>
            #define std
            #endif

            int
            main()
            {
            std::streambuf* sb = std::cin.rdbuf( ) ;
            while ( sb->sgetc() != EOF ) {
            sb->sbumpc() ;
            }
            std::cout << "EOF seen" << std::endl ;
            if ( sb->sgetc() != EOF ) {
            std::cout << "BROKEN" << std::endl ;
            }
            return 0 ;
            }

            Under Unix, run it, enter ^D, and when "EOF seen" appears, enter any
            character but ^D. Under Windows, run it, then enter ^Z Return, and when
            "EOF seen" appears, a line without a ^Z in it.

            (Obviously, this doesn't test underflow directly. But IMHO, the
            important aspect for the user is that sgetc() always returns the same
            thing, as long as there are no intervening calls to other functions of
            the streambuf.)

            For what it's worth, this didn't work with the orginal USL <iostream.h>
            either, so in some cases, espeically when OLD is defined, it may be a
            case of intentional bug compatibility rather than an error. (IMHO, the
            argument doesn't hold when OLD is not defined, since the new streams
            aren't compatible with the old ones anyway.)

            --
            James Kanze GABI Software mailto:kanze@ga bi-soft.fr
            Conseils en informatique orientée objet/ http://www.gabi-soft.fr
            Beratung in objektorientier ter Datenverarbeitu ng
            11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

            [ See http://www.gotw.ca/resources/clcm.htm for info about ]
            [ comp.lang.c++.m oderated. First time posters: Do this! ]

            Comment

            • Hans Bos

              #7
              Re: unget vs. putback


              "Dietmar Kuehl" <dietmar_kuehl@ yahoo.com> wrote in message
              news:5b15f8fd.0 310080253.1c158 e70@posting.goo gle.com...[color=blue]
              > "Hans Bos" <hans.bos@xelio n.nl> wrote:[color=green]
              > > When you read a character with snextc you can always putback that character.[/color]
              >
              > I don't think so...
              >[color=green]
              > > Otherwise uflow() will be called and it is required
              > > to transfer the returned character to the "backup sequence".[/color]
              >
              > Which, according to 27.5.2.4.3 paragraph 11, can be empty. I don't see
              > any requirement which states that the backup sequence is non-empty after
              > the transfer - even though the requirement for 'uflow()' mentions transfer
              > of a character to the backup sequence. Effectively, the whole purpose of
              > the function 'uflow()' is to allow unbuffered stream buffers in the first
              > place. Well, only the current character, as returned by 'sgetc()' has to
              > be remembered.
              >[color=green]
              > > Which when nonempty is defined as the gptr() - eback() begining at eback().
              > > Then gptr() - eback() > 0.
              > > So in this case you can also putback the last character.[/color]
              >
              > I don't think you can rely on this! Maybe the standard is unclear on this
              > isse: I seem to a have rather different interpretation of what happens
              > when a character is transfered to an empty backup sequence than you. Can
              > any of the other standard library implementers comment on this issue?
              >[/color]
              If you transfer something it doesn't disappear.
              If transfer money from one account to another I don't expect it to disappear,
              even though an account can be empty.

              Since the standard doesn't explicitly mention that the backup sequence may
              be empty after the character is transfered to it, I say the character must be
              in the backup sequence.
              And when the backup sequence is not empty, you can putback the character.

              [color=blue][color=green]
              > > You can also read characters with sgetc, which will call underflow().
              > > Underflow() is also required to set the streambuf pointers, so that the
              > > character is in the buffer.[/color]
              >
              > 'underflow()' is definitely *NOT* required to setup the pointer such that
              > the character is in the buffer! Great pains are taken to avoid this
              > particular requirement: there are several sequences defined and it is
              > always said how the alternative would look like if there is explicitly no
              > buffer at all. 'underflow()' has, however, to arrange for the stream buffer
              > to remember the character being returned: multiple calls to 'underflow()'
              > without intervening calls to 'setg()' or 'uflow()' are required to return
              > the same character.
              >[/color]
              27.5.2.4.3/12 says that after the call to underflow(), gptr() and egptr()
              statisfies one of:
              a) If the pending sequence is nonempty, egptr() is nonnull and egptr() ­-
              gptr() characters starting at gptr() are the characters in the pending sequence
              b) If the pending sequence is empty, either gptr() is null or gptr() and egptr()
              are set to the same nonNULL pointer.

              And the pending sequence is defined as the concatenation of (27.5.2.4.3/10):
              a) If gptr() is nonNULL, then the egptr() ­ - gptr() characters starting at
              gptr(), otherwise the empty sequence.
              b) Some sequence (possibly empty) of characters read from the input sequence.

              Now suppose gptr() is NULL when underflow() is called. Then the pending sequence
              is b: the characters read from the input sequence.
              If underflow returns a value (when gptr() was NULL) it means that the pending
              sequence was not empty.
              Therefore in 27.5.2.4.3/12 a applies and egptr() must be nonNULL.

              So when a character is returned by underflow(), the pending sequence is non
              empty and it contains at least the character returned. Therefore *gptr() must be
              equal to the character returned (provided it is not equal to eof()) and
              egptr() - gptr() > 0.


              Note that this behaviour is trivially implemented by having a buffer of one
              character for "unbuffered " streams. I think that this is also used in FILE for
              unbuffered streams where for ungetc a one character putback is guaranteed.

              Personally I so not much value in redefining uflow to save a one character
              buffer.

              Regards,
              Hans Bos.


              [ See http://www.gotw.ca/resources/clcm.htm for info about ]
              [ comp.lang.c++.m oderated. First time posters: Do this! ]

              Comment

              • Dietmar Kuehl

                #8
                Re: unget vs. putback

                kanze@gabi-soft.fr wrote:[color=blue]
                > dietmar_kuehl@y ahoo.com (Dietmar Kuehl) wrote in message
                > news:<5b15f8fd. 0310080253.1c15 8e70@posting.go ogle.com>...
                >[color=green]
                >> 'underflow()' has, however, to arrange
                >> for the stream buffer to remember the character being returned:
                >> multiple calls to 'underflow()' without intervening calls to 'setg()'
                >> or 'uflow()' are required to return the same character.[/color]
                >
                > Do you have any reference in the standard for this.[/color]

                I stand to what I have said with respect to the returned character:
                it is supposed to be always the same. However, this does not apply
                when 'underflow()' did not return a character, ie. when it returns
                'eof()' - but I didn't claim anything for this case.

                Concerning the character being returned, it turns out that I was
                indeed wrong in claiming that the stream buffer is not supposed to
                setup a buffer: Following the logic spelled out by Hans Bos in his
                reply to my article, the pending sequence is non-empty when a
                character is returned. This is stated in 27.5.2.4.3 paragraph 8
                (the character being returned is the first character of the pending
                sequence). And in paragraph 12 it states that the sequence [gptr(),
                egptr()) forms the pending sequence if this sequence is non-empty.

                For repeated calls of 'underflow()' the pending sequence did not
                change (although the set of functions not allowed to be called is
                rather bigger: it also includes 'gbump()', the other virtual
                functions, etc...). Hence, the first character of the pending
                sequence, ie. the one pointed to be 'gptr()', is returned.
                [color=blue]
                > It would seem to be common sense, and is what I had always
                > expected.[/color]

                Well, after a stream buffer returned 'eof()' once, it might still
                be able to provide characters later, eg. after receiving a key
                press from the keyboard. This is consistent eg. with UNIX' 'read()'
                function which might return '0' at some point but a positive
                non-null value later: at some point end of file is reached and later
                in time end of file moved on, making new character available.
                --
                <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
                Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

                [ See http://www.gotw.ca/resources/clcm.htm for info about ]
                [ comp.lang.c++.m oderated. First time posters: Do this! ]

                Comment

                • Dietmar Kuehl

                  #9
                  Re: unget vs. putback

                  Hans Bos wrote:[color=blue][color=green]
                  >> I don't think you can rely on this! Maybe the standard is unclear on this
                  >> isse: I seem to a have rather different interpretation of what happens
                  >> when a character is transfered to an empty backup sequence than you. Can
                  >> any of the other standard library implementers comment on this issue?[/color][/color]
                  [color=blue]
                  > If you transfer something it doesn't disappear.
                  > If transfer money from one account to another I don't expect it to
                  > disappear, even though an account can be empty.[/color]

                  Well, it will be disappearing if you transfer it into my empty
                  account :-)
                  [color=blue]
                  > Since the standard doesn't explicitly mention that the backup sequence may
                  > be empty after the character is transfered to it, I say the character must
                  > be in the backup sequence.
                  > And when the backup sequence is not empty, you can putback the character.[/color]

                  I think we should this for clarification to the standardization
                  committee: Since 'underflow()' is definitely allowed to leave the
                  backup sequence empty (see below) there is no guarantee that
                  'putback()' or 'unget()' will be successful. Thus, I think this
                  requirement would be an unnecessary restriction assuming it is
                  there.
                  [color=blue][color=green]
                  >> 'underflow()' is definitely *NOT* required to setup the pointer such that
                  >> the character is in the buffer![/color][/color]

                  My above statement about 'underflow()' not being required to set up
                  the internal stream buffer pointers to hold the returned character
                  is wrong as Hans correctly points out. Thank you!

                  However, the backup sequence is still allowed to be empty after a
                  call 'underflow()' as is stated explicitly in 27.5.2.4.3 paragraph
                  13: "... the function is not constrained as to their contents...".
                  Thus, after a call eg. to 'sgetc()' you cannot safely put back a
                  character.
                  --
                  <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
                  Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

                  [ See http://www.gotw.ca/resources/clcm.htm for info about ]
                  [ comp.lang.c++.m oderated. First time posters: Do this! ]

                  Comment

                  • James Kanze

                    #10
                    Re: unget vs. putback

                    Dietmar Kuehl <dietmar_kuehl@ yahoo.com> writes:

                    |> kanze@gabi-soft.fr wrote:
                    |> > dietmar_kuehl@y ahoo.com (Dietmar Kuehl) wrote in message
                    |> > news:<5b15f8fd. 0310080253.1c15 8e70@posting.go ogle.com>...

                    |> >> 'underflow()' has, however, to
                    |> >> arrange for the stream buffer to remember the character being
                    |> >> returned: multiple calls to 'underflow()' without intervening
                    |> >> calls to 'setg()' or 'uflow()' are required to return the same
                    |> >> character.

                    |> > Do you have any reference in the standard for this.

                    |> I stand to what I have said with respect to the returned character:
                    |> it is supposed to be always the same. However, this does not apply
                    |> when 'underflow()' did not return a character, ie. when it returns
                    |> 'eof()' - but I didn't claim anything for this case.

                    |> Concerning the character being returned, it turns out that I was
                    |> indeed wrong in claiming that the stream buffer is not supposed to
                    |> setup a buffer: Following the logic spelled out by Hans Bos in his
                    |> reply to my article, the pending sequence is non-empty when a
                    |> character is returned. This is stated in 27.5.2.4.3 paragraph 8 (the
                    |> character being returned is the first character of the pending
                    |> sequence). And in paragraph 12 it states that the sequence [gptr(),
                    |> egptr()) forms the pending sequence if this sequence is non-empty.

                    |> For repeated calls of 'underflow()' the pending sequence did not
                    |> change (although the set of functions not allowed to be called is
                    |> rather bigger: it also includes 'gbump()', the other virtual
                    |> functions, etc...). Hence, the first character of the pending
                    |> sequence, ie. the one pointed to be 'gptr()', is returned.

                    Good point. No guarantee with regards to underflow, but the character
                    is guaranteed to be in the buffer, so further calls to sgetc() shouldn't
                    invoke underflow.

                    Except, as you say, when underflow returns EOF (which cannot be
                    buffered).

                    |> > It would seem to be common sense, and is what I had always
                    |> > expected.

                    |> Well, after a stream buffer returned 'eof()' once, it might still be
                    |> able to provide characters later, eg. after receiving a key press
                    |> from the keyboard.

                    I am aware of this phenomenon. Way back when, I wrote the equivalent of
                    tail -f (at a time when tail didn't have a -f option).

                    |> This is consistent eg. with UNIX' 'read()' function which might
                    |> return '0' at some point but a positive non-null value later: at
                    |> some point end of file is reached and later in time end of file
                    |> moved on, making new character available.

                    The question isn't whether it is consistent with the behavior of some
                    low level function in a particular operating system. The question is,
                    in the end, what should the standard say about this case. Allowing EOF
                    not the be definitive makes a certain number of things more difficult.
                    And IMHO, the behavior is counter-intuitive. And I'm not sure that it
                    was really explicitely desired.

                    --
                    James Kanze mailto:kanze@ga bi-soft.fr
                    Conseils en informatique orientée objet/
                    Beratung in objektorientier ter Datenverarbeitu ng
                    11 rue de Rambouillet, 78460 Chevreuse, France +33 1 41 89 80 93

                    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
                    [ comp.lang.c++.m oderated. First time posters: Do this! ]

                    Comment

                    Working...