getline() and newlines

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • barcaroller

    getline() and newlines


    I have a text file with mixed carriage returns ('\n' and '\r\n').

    On Linux, both the std::string getline() global function and the
    std::iostream getline() member function are keeping some of the newlines in
    the result (I suspect they look only for the '\n').

    * Is there a quick way I can tell either function to gobble up both
    Windows-style and Unix-style newlines?

    * If not, what would be an efficient way of getting rid of them? Currently
    I use string::find_la st_of("\n\r") + string::erase() but this is not very
    efficient.



  • =?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

    #2
    Re: getline() and newlines

    On 2008-04-05 15:25, barcaroller wrote:
    I have a text file with mixed carriage returns ('\n' and '\r\n').
    >
    On Linux, both the std::string getline() global function and the
    std::iostream getline() member function are keeping some of the newlines in
    the result (I suspect they look only for the '\n').
    >
    * Is there a quick way I can tell either function to gobble up both
    Windows-style and Unix-style newlines?
    While you can specify the delimiting character you can only specify one
    character.
    * If not, what would be an efficient way of getting rid of them? Currently
    I use string::find_la st_of("\n\r") + string::erase() but this is not very
    efficient.
    Since the Windows sequence is \r\n and getline() uses \n as delimiter
    any line with a Windows linebreak will end with \r. Use this knowledge
    to reduce the work required:

    std::string str;
    std::getline(fi le, str);

    if (str[str.size() - 1] == '\r')
    str.resize(str. size() - 1);

    --
    Erik Wikström

    Comment

    • Obnoxious User

      #3
      Re: getline() and newlines

      On Sat, 05 Apr 2008 09:25:11 -0400, barcaroller wrote:
      I have a text file with mixed carriage returns ('\n' and '\r\n').
      >
      On Linux, both the std::string getline() global function and the
      std::iostream getline() member function are keeping some of the newlines
      in the result (I suspect they look only for the '\n').
      >
      * Is there a quick way I can tell either function to gobble up both
      Windows-style and Unix-style newlines?
      >
      * If not, what would be an efficient way of getting rid of them?
      Currently
      I use string::find_la st_of("\n\r") + string::erase() but this is not
      very efficient.
      A simple and quick solution, adjust it to your own needs:

      #include <iostream>
      #include <sstream>

      std::istream & getline(std::is tream & in, std::string & out) {
      char c;
      while(in.get(c) .good()) {
      if(c == '\n') {
      c = in.peek();
      if(in.good()) {
      if(c == '\r') {
      in.ignore();
      }
      }
      break;
      }
      out.append(1,c) ;
      }
      return in;
      }

      int main() {
      std::istringstr eam strm("alpha\nbe ta\n\r...\n\rom ega\n\n");
      for(int i = 0; strm.good(); ++i) {
      std::string line;
      getline(strm,li ne);
      std::cout<<i<<" \t"<<line<<std: :endl;
      }
      return 0;
      }

      --
      OU

      Comment

      • Obnoxious User

        #4
        Re: getline() and newlines

        On Sat, 05 Apr 2008 13:45:39 +0000, Obnoxious User wrote:
        On Sat, 05 Apr 2008 09:25:11 -0400, barcaroller wrote:
        >
        >I have a text file with mixed carriage returns ('\n' and '\r\n').
        >>
        >On Linux, both the std::string getline() global function and the
        >std::iostrea m getline() member function are keeping some of the
        >newlines in the result (I suspect they look only for the '\n').
        >>
        > * Is there a quick way I can tell either function to gobble up both
        >Windows-style and Unix-style newlines?
        >>
        > * If not, what would be an efficient way of getting rid of them?
        > Currently
        >I use string::find_la st_of("\n\r") + string::erase() but this is not
        >very efficient.
        >
        A simple and quick solution, adjust it to your own needs:
        >
        #include <iostream>
        #include <sstream>
        >
        std::istream & getline(std::is tream & in, std::string & out) {
        char c;
        while(in.get(c) .good()) {
        if(c == '\n') {
        c = in.peek();
        if(in.good()) {
        if(c == '\r') {
        in.ignore();
        }
        }
        break;
        }
        out.append(1,c) ;
        }
        return in;
        }
        >
        int main() {
        std::istringstr eam strm("alpha\nbe ta\n\r...\n\rom ega\n\n");
        > for(int i = 0; strm.good(); ++i) {
        std::string line;
        getline(strm,li ne);
        std::cout<<i<<" \t"<<line<<std: :endl;
        }
        return 0;
        }
        Realized after I posted it that I reversed the sequence, so the code is
        flawed for your needs. Although easily fixed. Ignore it.

        --
        OU

        Comment

        • James Kanze

          #5
          Re: getline() and newlines

          On 5 avr, 15:25, "barcarolle r" <barcarol...@mu sic.netwrote:
          I have a text file with mixed carriage returns ('\n' and '\r\n').
          On Linux, both the std::string getline() global function and
          the std::iostream getline() member function are keeping some
          of the newlines in the result (I suspect they look only for
          the '\n').
          Technically, it's implementation defined. Typically, however,
          yes: Unix implementations treat a single 0x0A in the stream as a
          newline; Windows implementations treat either a single 0x0A or
          the sequence 0x0D, 0x0A as a newline.

          Most of the time, this should not be a problem. In all of the
          usual encodings (at least outside of the mainframe world), the
          0x0D will result in an '\r' under Unix (and probably also under
          Windows, if it isn't immediately followed by a 0x0A). In the
          "C" locale, and probably in all other locales, '\r' is
          whitespace. So it ends up ignored with the rest of the trailing
          whitespace. (The one exception is C and C++ source code; for
          some reason, the standard doesn't consider '\r' as whitespace in
          source code.)
          * Is there a quick way I can tell either function to gobble
          up both Windows-style and Unix-style newlines?
          Is there ever a need to?
          * If not, what would be an efficient way of getting rid of
          them? Currently I use string::find_la st_of("\n\r") +
          string::erase() but this is not very efficient.
          I'd use an external program (e.g. tr). In practice, if a file
          is on a shared file system, and thus being read by both Windows
          and Unix, it's generally best (pragmatically, at least) to stick
          with the Unix conventions.

          --
          James Kanze (GABI Software) email:james.kan ze@gmail.com
          Conseils en informatique orientée objet/
          Beratung in objektorientier ter Datenverarbeitu ng
          9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

          Comment

          • Antoine Mathys

            #6
            Re: getline() and newlines

            std::string str;
            std::getline(fi le, str);
            >
            if (str[str.size() - 1] == '\r')
            str.resize(str. size() - 1);
            And with a empty line with unix end of line -SEGFAULT

            The code fragment should be:
            if ((str.size() 0) && (str[str.size() - 1] == '\r')
            str.resize(str. size() - 1);

            Comment

            Working...