Reading from a stream til EOF

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hendrik Schober

    Reading from a stream til EOF

    Hi,

    I have a 'std::istream' and need to read
    its whole contents into a string. How can
    I do this?

    TIA;

    Schobi

    --
    SpamTrap@gmx.de is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers


  • Rodrigo Dominguez

    #2
    Re: Reading from a stream til EOF

    Hendrik Schober wrote:
    [color=blue]
    > Hi,
    >
    > I have a 'std::istream' and need to read
    > its whole contents into a string. How can
    > I do this?
    >
    > TIA;
    >
    > Schobi
    >[/color]
    well, I'm not an expert on STL, but here are some examples

    example 1:

    char c;
    while(your_istr eam.get(c))
    your_string.pus h_back(c);

    example 2:

    char c;
    while(your_istr eam >> c)
    your_string.pus h_back(c);


    example 3:

    string your_string;
    while(your_istr eam >> your_string)
    foo();



    --
    Rodrigo Dominguez
    <rorra@rorra.co m.ar>
    Powered Hosting
    (www.powered-design.com)

    Comment

    • Hendrik Schober

      #3
      Re: Reading from a stream til EOF

      Rodrigo Dominguez <rorra@rorra.co m.ar> wrote:[color=blue]
      > Hendrik Schober wrote:
      >[color=green]
      > > Hi,
      > >
      > > I have a 'std::istream' and need to read
      > > its whole contents into a string. How can
      > > I do this?
      > >
      > > TIA;
      > >
      > > Schobi
      > >[/color]
      > well, I'm not an expert on STL, but here are some examples
      > [...][/color]

      Actually I was hoping for something
      that would promiss more performance.

      Schobi

      --
      SpamTrap@gmx.de is never read
      I'm Schobi at suespammers dot org

      "Sometimes compilers are so much more reasonable than people."
      Scott Meyers


      Comment

      • Jonathan Turkanis

        #4
        Re: Reading from a stream til EOF

        "Hendrik Schober" <SpamTrap@gmx.d e> wrote in message
        news:c1j5uf$o2g $1@news1.transm edia.de...[color=blue]
        > Hi,
        >
        > I have a 'std::istream' and need to read
        > its whole contents into a string. How can
        > I do this?[/color]

        I'm afraid making a copy at some point is unavoidable. I wish you
        could call reserve() and then write directly into the underlying
        storage, as with vector -- at least if the string had never been
        copied.

        Jonathan


        Comment

        • Hendrik Schober

          #5
          Re: Reading from a stream til EOF

          Jonathan Turkanis <technews@kanga roologic.com> wrote:[color=blue]
          > "Hendrik Schober" <SpamTrap@gmx.d e> wrote in message
          > news:c1j5uf$o2g $1@news1.transm edia.de...[color=green]
          > > Hi,
          > >
          > > I have a 'std::istream' and need to read
          > > its whole contents into a string. How can
          > > I do this?[/color]
          >
          > I'm afraid making a copy at some point is unavoidable. I wish you
          > could call reserve() and then write directly into the underlying
          > storage, as with vector -- at least if the string had never been
          > copied.[/color]

          I suppose you mean 'resize()', where you
          say 'reserve()'? The problem is, I don't
          see how I can find out how much there is
          to read from the stream in advance.
          What I'm doing right now is this:

          std::string f(std::istream& is)
          {
          return std::string( std::istream_it erator<char>(is )
          , std::istream_it erator<char>() );
          }

          However, I suppose this goes through all
          the sentries etc. for each and every char?
          One other thing I was thinking about is
          that 'operator>>' seems to be overloaded
          for a stream buffer on the RHS. So should
          this

          std::stringstre am ss;
          is >> ss.rdbuf();
          return ss.str();

          do what I think? And if so, can I expect
          better performance from this compared to
          copying the char myself?
          [color=blue]
          > Jonathan[/color]

          Schobi

          --
          SpamTrap@gmx.de is never read
          I'm Schobi at suespammers dot org

          "Sometimes compilers are so much more reasonable than people."
          Scott Meyers


          Comment

          • Jonathan Turkanis

            #6
            Re: Reading from a stream til EOF


            "Hendrik Schober" <SpamTrap@gmx.d e> wrote in message
            news:c1j9ol$p8h $1@news1.transm edia.de...[color=blue]
            > Jonathan Turkanis <technews@kanga roologic.com> wrote:[color=green]
            > > "Hendrik Schober" <SpamTrap@gmx.d e> wrote in message
            > > news:c1j5uf$o2g $1@news1.transm edia.de...[color=darkred]
            > > > Hi,
            > > >
            > > > I have a 'std::istream' and need to read
            > > > its whole contents into a string. How can
            > > > I do this?[/color]
            > >
            > > I'm afraid making a copy at some point is unavoidable. I wish you
            > > could call reserve() and then write directly into the underlying
            > > storage, as with vector -- at least if the string had never been
            > > copied.[/color]
            >
            > I suppose you mean 'resize()', where you[/color]

            Yes.
            [color=blue]
            > say 'reserve()'? The problem is, I don't
            > see how I can find out how much there is
            > to read from the stream in advance.[/color]

            Right. That's unavoidable. An exponential growth strategy is the way
            to go. You should get this automatically with string, or you can do it
            yourself.
            [color=blue]
            > What I'm doing right now is this:
            >
            > std::string f(std::istream& is)
            > {
            > return std::string( std::istream_it erator<char>(is )
            > , std::istream_it erator<char>() );
            > }[/color]

            You defintely don't want to do this if you're concerned with
            efficiency. At the very least, you should extract the underlying
            streambuf using is.rdbuf(), and read into a char array using sgetn.
            [color=blue]
            > However, I suppose this goes through all
            > the sentries etc. for each and every char?
            > One other thing I was thinking about is
            > that 'operator>>' seems to be overloaded
            > for a stream buffer on the RHS. So should
            > this
            >
            > std::stringstre am ss;
            > is >> ss.rdbuf();
            > return ss.str();
            >[/color]

            I would have guessed that a good implementation would implement this
            as I described above, but I checked dinkumware and it does a
            character-by-character extraction. So I would use a char buffer.

            (In my first response, I though you were mainly interested in avoiding
            the final copy when you call ss.str())

            Jonathan



            Comment

            • Hendrik Schober

              #7
              Re: Reading from a stream til EOF

              Jonathan Turkanis <technews@kanga roologic.com> wrote:[color=blue]
              > [...][color=green]
              > > say 'reserve()'? The problem is, I don't
              > > see how I can find out how much there is
              > > to read from the stream in advance.[/color]
              >
              > Right. That's unavoidable. An exponential growth strategy is the way
              > to go. You should get this automatically with string, or you can do it
              > yourself.[/color]

              I planned to let 'std::string' take care
              of this. :)
              [color=blue][color=green]
              > > What I'm doing right now is this:
              > >
              > > std::string f(std::istream& is)
              > > {
              > > return std::string( std::istream_it erator<char>(is )
              > > , std::istream_it erator<char>() );
              > > }[/color]
              >
              > You defintely don't want to do this if you're concerned with
              > efficiency.[/color]

              I see. I was expecting this. I suppose
              using streambuf iterators wouldn't help
              much with this?
              [color=blue]
              > At the very least, you should extract the underlying
              > streambuf using is.rdbuf(), and read into a char array using sgetn.[/color]

              As this avoids creating/destroying any
              sentries and all the formatting?
              [color=blue][color=green]
              > > However, I suppose this goes through all
              > > the sentries etc. for each and every char?
              > > One other thing I was thinking about is
              > > that 'operator>>' seems to be overloaded
              > > for a stream buffer on the RHS. So should
              > > this
              > >
              > > std::stringstre am ss;
              > > is >> ss.rdbuf();
              > > return ss.str();
              > >[/color]
              >
              > I would have guessed that a good implementation would implement this
              > as I described above, but I checked dinkumware and it does a
              > character-by-character extraction.[/color]

              Thanks for checking. We are indeed using
              Dinkumware on two platforms. So this would
              not help much. I should probably ask about
              this MS' std lib newsgroup, as PJP and PB
              are reading and posting there.
              [color=blue]
              > So I would use a char buffer.[/color]

              I am not sure what you mean here. Can you
              elaborate.
              [color=blue]
              > (In my first response, I though you were mainly interested in avoiding
              > the final copy when you call ss.str())[/color]

              Well, actually, I would need to istream
              the content later anyway. However, first
              I need the size of it. (The real task is
              to parse the data, which is a rather
              lengthy process. OTOH the raw data itself
              usually is not very big. So I thought it
              would be better to loose some performance
              on copying to get the size, as this would
              give me a real progress bar for visual
              feedback to the users.)
              [color=blue]
              > Jonathan[/color]

              Schobi

              --
              SpamTrap@gmx.de is never read
              I'm Schobi at suespammers dot org

              "Sometimes compilers are so much more reasonable than people."
              Scott Meyers


              Comment

              • Dietmar Kuehl

                #8
                Re: Reading from a stream til EOF

                "Hendrik Schober" <SpamTrap@gmx.d e> wrote:[color=blue]
                > What I'm doing right now is this:
                >
                > std::string f(std::istream& is)
                > {
                > return std::string( std::istream_it erator<char>(is )
                > , std::istream_it erator<char>() );
                > }[/color]

                This is not at all what you want to do, I guess: amoung others, this will
                strip all white spaces from the input before putting it into the string!
                [color=blue]
                > However, I suppose this goes through all
                > the sentries etc. for each and every char?[/color]

                Yes, this goes through the sentries and the preparation etc. What you
                probably want to do is this:

                std::string f(std::istream& is) {
                return std::string( std::istreambuf _iterator<char> (is),
                std::istreambuf _iterator<char> () );
                }

                This does not go through the sentires. However, for this to be efficient,
                the library has either to implement the general segmented iterator
                optimization or it has to special case this particular use in some form.
                My implementation has a special case (which is pretty close to the general
                optimization but is not quite there) and this is the fastest method to
                read a string, especially for a file with the "C" facet: in this case it
                essentially amounts to a memcpy() from a memory mapped file to the string.
                [color=blue]
                > One other thing I was thinking about is
                > that 'operator>>' seems to be overloaded
                > for a stream buffer on the RHS. So should
                > this
                >
                > std::stringstre am ss;
                > is >> ss.rdbuf();
                > return ss.str();[/color]

                I would expect this to be the fastest approach with typical implementations :
                this may bypass certain internal buffers, etc. For buffered input streams
                this should at the very least process blocks of characters from buffers
                directly.
                [color=blue]
                > do what I think? And if so, can I expect
                > better performance from this compared to
                > copying the char myself?[/color]

                Go measure... I would expect the 'rdbuf()' to be significantly faster than
                processing individual characters. Here is something which should also be
                faster than processing individual characters:

                enum { bufsize = 8192 };
                char buf[bufsize];
                std::string s;
                for (std::streamsiz e size = 0; size = is.read(buf, bufsize) > 0; )
                s.append(buf, size);

                (this code is untested and I'm somewhat humble with respect to the string
                interface...).
                --
                <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
                Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

                Comment

                • Hendrik Schober

                  #9
                  Re: Reading from a stream til EOF

                  Dietmar Kuehl <dietmar_kuehl@ yahoo.com> wrote:[color=blue]
                  > "Hendrik Schober" <SpamTrap@gmx.d e> wrote:[color=green]
                  > > What I'm doing right now is this:
                  > >
                  > > std::string f(std::istream& is)
                  > > {
                  > > return std::string( std::istream_it erator<char>(is )
                  > > , std::istream_it erator<char>() );
                  > > }[/color]
                  >
                  > This is not at all what you want to do, I guess: amoung others, this will
                  > strip all white spaces from the input before putting it into the string![/color]

                  Yes, I found this out by now. :o>
                  [color=blue][color=green]
                  > > However, I suppose this goes through all
                  > > the sentries etc. for each and every char?[/color]
                  >
                  > Yes, this goes through the sentries and the preparation etc. What you
                  > probably want to do is this:
                  >
                  > std::string f(std::istream& is) {
                  > return std::string( std::istreambuf _iterator<char> (is),
                  > std::istreambuf _iterator<char> () );
                  > }
                  >
                  > This does not go through the sentires.[/color]

                  This was the next thing I was about to try.
                  [color=blue]
                  > However, for this to be efficient,
                  > the library has either to implement the general segmented iterator
                  > optimization [...][/color]

                  ???
                  [color=blue]
                  > [...][color=green]
                  > > std::stringstre am ss;
                  > > is >> ss.rdbuf();
                  > > return ss.str();[/color]
                  >
                  > I would expect this to be the fastest approach with typical implementations :
                  > this may bypass certain internal buffers, etc. For buffered input streams
                  > this should at the very least process blocks of characters from buffers
                  > directly.[/color]

                  Could I do this the other way around, too?

                  std::stringstre am ss;
                  ss << is.rdbuf();
                  return ss.str();

                  And if so, is there anything different in
                  principle or is it just down to the
                  particular library?
                  [color=blue]
                  > [...]
                  > Go measure...[/color]

                  The problem is, I need to find a way to do
                  this which most likely is fast on a couple
                  of platforms without beeing able to profile
                  it on each one.
                  [color=blue]
                  > I would expect the 'rdbuf()' to be significantly faster than
                  > processing individual characters.[/color]

                  I see.
                  [color=blue]
                  > Here is something which should also be
                  > faster than processing individual characters:
                  >
                  > enum { bufsize = 8192 };
                  > char buf[bufsize];
                  > std::string s;
                  > for (std::streamsiz e size = 0; size = is.read(buf, bufsize) > 0; )
                  > s.append(buf, size);
                  >
                  > (this code is untested and I'm somewhat humble with respect to the string
                  > interface...).[/color]

                  The good old char buf read functions. I
                  wonder why it is so hard to do something
                  efficiently without having to go back to
                  C-ish ways.

                  Schobi

                  --
                  SpamTrap@gmx.de is never read
                  I'm Schobi at suespammers dot org

                  "Sometimes compilers are so much more reasonable than people."
                  Scott Meyers


                  Comment

                  • tom_usenet

                    #10
                    Re: Reading from a stream til EOF

                    On Wed, 25 Feb 2004 23:05:55 +0100, "Hendrik Schober"
                    <SpamTrap@gmx.d e> wrote:
                    [color=blue]
                    >Hi,
                    >
                    >I have a 'std::istream' and need to read
                    >its whole contents into a string. How can
                    >I do this?[/color]

                    I've posted a few solutions to this in the past:



                    There are lots more ways, and the most efficient somewhat depends on
                    the library implementation in question.

                    Tom
                    --
                    C++ FAQ: http://www.parashift.com/c++-faq-lite/
                    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html

                    Comment

                    • Hendrik Schober

                      #11
                      Re: Reading from a stream til EOF

                      tom_usenet <tom_usenet@hot mail.com> wrote:[color=blue]
                      > [...]
                      > I've posted a few solutions to this in the past:
                      >
                      > http://www.google.com/groups?selm=3d....easynet.co.uk[/color]

                      I didn't think of seeking through a
                      stream to get its size! Of all the
                      reasons I wanted to do this I did
                      manage to eliminate all except that
                      I need the size of the data to be
                      read from the stream. Since you just
                      showed me how to get this, I won't
                      even need to read the whole thing
                      into a string anymore!
                      [color=blue]
                      > There are lots more ways, and the most efficient somewhat depends on
                      > the library implementation in question.[/color]

                      Yes. What I wanted was a solution
                      that has good performance on most
                      platforms. However, I think I don't
                      need it anymore. :)
                      [color=blue]
                      > Tom[/color]

                      Schobi

                      --
                      SpamTrap@gmx.de is never read
                      I'm Schobi at suespammers dot org

                      "Sometimes compilers are so much more reasonable than people."
                      Scott Meyers


                      Comment

                      • tom_usenet

                        #12
                        Re: Reading from a stream til EOF

                        On Thu, 26 Feb 2004 16:33:55 +0100, "Hendrik Schober"
                        <SpamTrap@gmx.d e> wrote:
                        [color=blue]
                        >tom_usenet <tom_usenet@hot mail.com> wrote:[color=green]
                        >> [...]
                        >> I've posted a few solutions to this in the past:
                        >>
                        >> http://www.google.com/groups?selm=3d....easynet.co.uk[/color]
                        >
                        > I didn't think of seeking through a
                        > stream to get its size! Of all the
                        > reasons I wanted to do this I did
                        > manage to eliminate all except that
                        > I need the size of the data to be
                        > read from the stream. Since you just
                        > showed me how to get this, I won't
                        > even need to read the whole thing
                        > into a string anymore![/color]

                        There are a couple of provisos.

                        Firstly, opening the stream in binary mode is likely to give you a
                        better result (e.g. the number of bytes in the file) - text mode
                        sometimes has funny ideas about where a file ends on some OSes.

                        Secondly, it won't work for files whose length won't fit in a
                        std::streamoff (e.g. bigger than, say, 2GB).

                        Finally, don't forget you can just use a std::filebuf and cut out the
                        fstream entirely.

                        Tom
                        --
                        C++ FAQ: http://www.parashift.com/c++-faq-lite/
                        C FAQ: http://www.eskimo.com/~scs/C-faq/top.html

                        Comment

                        • Hendrik Schober

                          #13
                          Re: Reading from a stream til EOF

                          tom_usenet <tom_usenet@hot mail.com> wrote:[color=blue]
                          > [...]
                          > Firstly, opening the stream in binary mode is likely to give you a
                          > better result (e.g. the number of bytes in the file) - text mode
                          > sometimes has funny ideas about where a file ends on some OSes.[/color]

                          Is there anything worse to be expected than
                          the "\r\n" problem? As this is just for
                          progress indication for the users, accuracy
                          is not as important.
                          [color=blue]
                          > Secondly, it won't work for files whose length won't fit in a
                          > std::streamoff (e.g. bigger than, say, 2GB).[/color]

                          Yes. But I woulnd't have thought of loading
                          these into a string anyway. :)
                          [color=blue]
                          > Finally, don't forget you can just use a std::filebuf and cut out the
                          > fstream entirely.[/color]

                          How do I read a line from a streambuf?
                          [color=blue]
                          > Tom[/color]

                          Schobi

                          --
                          SpamTrap@gmx.de is never read
                          I'm Schobi at suespammers dot org

                          "Sometimes compilers are so much more reasonable than people."
                          Scott Meyers


                          Comment

                          • Dietmar Kuehl

                            #14
                            Re: Reading from a stream til EOF

                            Hendrik Schober wrote:[color=blue]
                            > Dietmar Kuehl <dietmar_kuehl@ yahoo.com> wrote:[color=green]
                            >> std::string f(std::istream& is) {
                            >> return std::string( std::istreambuf _iterator<char> (is),
                            >> std::istreambuf _iterator<char> () );
                            >> }[/color][/color]
                            [color=blue][color=green]
                            >> However, for this to be efficient,
                            >> the library has either to implement the general segmented iterator
                            >> optimization [...][/color][/color]

                            Well, essentially, a streambuf iterator iterates over buffers of
                            characters. Sure, it is always the same buffer but just envision each
                            fill of the buffer a separate one. Now, each of these buffers can be
                            processed in a chunk making up a segment of the overall sequence.
                            Taking advantage of this view results in faster code because rather
                            than making two checks in each iteration, there is just one. Also, it
                            is possible to unroll the loop even further because the sizes of the
                            segments are known in advance, allowing to make a check only for
                            something like every 100th character. Without this optimization, the
                            processing of stream buffers will work more efficiently because this
                            processing does just this, just more naturally (at least, I would
                            expect it from most implementations ).

                            The general principle can also be applied to other kinds of sequences
                            which are similarily segmented. 'std::deque's and hashes using lists
                            of each bucket come to mind.

                            [color=blue]
                            > Could I do this the other way around, too?
                            >
                            > std::stringstre am ss;[/color]
                            std::ostringstr eam ss;[color=blue]
                            > ss << is.rdbuf();
                            > return ss.str();[/color]

                            This is how I'm normally writing it. The direction should not really
                            matter and the same function should be used underneath.
                            [color=blue]
                            > The problem is, I need to find a way to do
                            > this which most likely is fast on a couple
                            > of platforms without beeing able to profile
                            > it on each one.[/color]

                            But you should get a general feeling which things work fast and which
                            don't by trying out a couple. Actually, I'm aware of only five
                            different libraries being in wider use:
                            - Dinkumware (eg. shipping with MSVC++)
                            - libstdc++ (shipping with gcc)
                            - Metrowerk's library shipping with their compiler
                            - RougeWave (used to ship eg. with Sun CC)
                            - STLport (a free drop in place library)

                            I'm unaware of any other standard C++ library shipping with a commmercial
                            compiler (ObjectSpace dropped their library and mine was never shipping
                            with anything; is there any other reasonably complete standard library
                            implementation still in use?)
                            [color=blue]
                            > The good old char buf read functions. I
                            > wonder why it is so hard to do something
                            > efficiently without having to go back to
                            > C-ish ways.[/color]

                            Well, the segmented iterator optimization requires quite a bit of
                            machinery to work. It gives a nice abstract interface to an efficient
                            implementation. Just, nobody does it because the library implementers are
                            kept busy with all kinds of other stuff and optimizations. The low-level
                            stuff is some wiring you can apply yourself...
                            --
                            <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
                            Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

                            Comment

                            • Hendrik Schober

                              #15
                              Re: Reading from a stream til EOF

                              Dietmar Kuehl <dietmar_kuehl@ yahoo.com> wrote:[color=blue]
                              > [...]
                              > Well, essentially, a streambuf iterator [...][/color]

                              Thanks for the enlightment!
                              [color=blue][color=green]
                              > > Could I do this the other way around, too?
                              > >
                              > > std::stringstre am ss;[/color]
                              > std::ostringstr eam ss;[color=green]
                              > > ss << is.rdbuf();
                              > > return ss.str();[/color]
                              >
                              > This is how I'm normally writing it. The direction should not really
                              > matter and the same function should be used underneath.[/color]

                              I see.
                              [color=blue][color=green]
                              > > The problem is, I need to find a way to do
                              > > this which most likely is fast on a couple
                              > > of platforms without beeing able to profile
                              > > it on each one.[/color]
                              >
                              > But you should get a general feeling which things work fast and which
                              > don't by trying out a couple. Actually, I'm aware of only five
                              > different libraries being in wider use:
                              > - Dinkumware (eg. shipping with MSVC++)
                              > - libstdc++ (shipping with gcc)
                              > - Metrowerk's library shipping with their compiler
                              > - RougeWave (used to ship eg. with Sun CC)
                              > - STLport (a free drop in place library)[/color]

                              Yes, but then there is all the different
                              versions of these libraries. And once a
                              piece of code works, nobody will go into
                              it and check whether with the newest
                              version this or that could be optimized
                              using another technique...
                              [color=blue]
                              > I'm unaware of any other standard C++ library shipping with a commmercial
                              > compiler (ObjectSpace dropped their library and mine was never shipping
                              > with anything;[/color]

                              Warum eigentlich?
                              [color=blue]
                              > is there any other reasonably complete standard library
                              > implementation still in use?)
                              >[color=green]
                              > > The good old char buf read functions. I
                              > > wonder why it is so hard to do something
                              > > efficiently without having to go back to
                              > > C-ish ways.[/color]
                              >
                              > Well, the segmented iterator optimization requires quite a bit of
                              > machinery to work. It gives a nice abstract interface to an efficient
                              > implementation. Just, nobody does it because the library implementers are
                              > kept busy with all kinds of other stuff and optimizations. The low-level
                              > stuff is some wiring you can apply yourself...[/color]


                              But I wonder whether it is a flaw in the
                              design if something like reading into a
                              string cannot easily be done fast with
                              the recommended approach.

                              Schobi

                              --
                              SpamTrap@gmx.de is never read
                              I'm Schobi at suespammers dot org

                              "Sometimes compilers are so much more reasonable than people."
                              Scott Meyers


                              Comment

                              Working...