How to read tsv file?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • BCC

    How to read tsv file?

    Hi,

    I have a tab separated value table like this:
    header1 header2 header3
    13.455 55.3 A string
    4.55 5.66 Another string

    I want to load this guy into a vector of vectors, since I do not know how
    long it may be. I think I have to have a vector of vectors of strings, and
    then extract the doubles later(?):
    std::vector<std ::vector<std::s tring> > m_data_vec;

    I started off with this skeletal function, but Im not sure how to parse the
    line for tabs and newlines, and stuff the elements into the vector. Is it
    better to read in the whole line then parse it? Can I parse it on the fly?
    How?

    void MyClass::ReadTS V(const* filename)
    {
    using namespace std;

    ifstream infile(filename );
    if (!infile) {
    cout << "unable to load file" << endl;
    }

    // Now what?
    }

    Thanks,
    Bryan


  • Victor Bazarov

    #2
    Re: How to read tsv file?

    "BCC" <a@b.c> wrote...[color=blue]
    > I have a tab separated value table like this:
    > header1 header2 header3
    > 13.455 55.3 A string
    > 4.55 5.66 Another string
    >
    > I want to load this guy into a vector of vectors, since I do not know how
    > long it may be. I think I have to have a vector of vectors of strings,[/color]
    and[color=blue]
    > then extract the doubles later(?):
    > std::vector<std ::vector<std::s tring> > m_data_vec;
    >
    > I started off with this skeletal function, but Im not sure how to parse[/color]
    the[color=blue]
    > line for tabs and newlines, and stuff the elements into the vector. Is it
    > better to read in the whole line then parse it?[/color]

    Oh, so much better...
    [color=blue]
    > Can I parse it on the fly?[/color]

    I don't know. Can you?
    [color=blue]
    > How?
    >
    > void MyClass::ReadTS V(const* filename)
    > {
    > using namespace std;
    >
    > ifstream infile(filename );
    > if (!infile) {
    > cout << "unable to load file" << endl;
    > }
    >
    > // Now what?[/color]

    If you know how many fields to expect, you could use get( ... , '\t') N-1
    times and then get( ... , '\n') and then again and again.

    Easier still to get one by one character and watch for '\t' and '\n'. But
    I would still do the "get the whole line and then parse it" thing.
    [color=blue]
    > }[/color]

    V


    Comment

    • Sharad Kala

      #3
      Re: How to read tsv file?


      "BCC" <a@b.c> wrote in message
      news:p1lSb.7633 $uM.3791@newssv r29.news.prodig y.com...[color=blue]
      > Hi,
      >
      > I have a tab separated value table like this:
      > header1 header2 header3
      > 13.455 55.3 A string
      > 4.55 5.66 Another string
      >
      > I want to load this guy into a vector of vectors, since I do not know how
      > long it may be. I think I have to have a vector of vectors of strings, and
      > then extract the doubles later(?):
      > std::vector<std ::vector<std::s tring> > m_data_vec;
      >
      > I started off with this skeletal function, but Im not sure how to parse the
      > line for tabs and newlines, and stuff the elements into the vector. Is it
      > better to read in the whole line then parse it? Can I parse it on the fly?
      > How?
      >
      > void MyClass::ReadTS V(const* filename)
      > {
      > using namespace std;
      >
      > ifstream infile(filename );
      > if (!infile) {
      > cout << "unable to load file" << endl;
      > }
      >
      > // Now what?
      > }[/color]
      May be this gives you the basic idea.
      I haven't tested it. Also no checks for errors etc.

      <UNTESTED CODE>

      #include <fstream>
      #include <string>
      #include <vector>
      using namespace std;

      void ReadTSV(const char* filename)
      {
      using namespace std;

      ifstream infile(filename );
      if (!infile) {
      cout << "unable to load file" << endl;
      }
      string str;

      vector<vector<s tring> > vvStr;
      vector<string> vStr;
      int pos1, pos2;
      while (getline(infile , str))
      {
      pos1 = 0;
      while((pos2 = str.find('\t')) != string::npos)
      {
      vStr.push_back( str.substr(pos1 , pos2));
      pos1 = pos2++;
      }
      vStr.push_back( str.substr(pos1 , string::npos));
      vvStr.push_back (vStr);
      }

      }

      </UNTESTED CODE>

      Best wishes,
      Sharad


      Comment

      • Jonathan Turkanis

        #4
        Re: How to read tsv file?

        "BCC" <a@b.c> wrote in message
        news:p1lSb.7633 $uM.3791@newssv r29.news.prodig y.com...[color=blue]
        > Hi,
        >
        > I have a tab separated value table like this:
        > header1 header2 header3
        > 13.455 55.3 A string
        > 4.55 5.66 Another string
        >
        > I want to load this guy into a vector of vectors, since I do not[/color]
        know how[color=blue]
        > long it may be. I think I have to have a vector of vectors of[/color]
        strings, and[color=blue]
        > then extract the doubles later(?):
        > std::vector<std ::vector<std::s tring> > m_data_vec;
        >
        > I started off with this skeletal function, but Im not sure ho to[/color]
        parse the[color=blue]
        > line for tabs and newlines, and stuff the elements into the vector.[/color]
        Is it[color=blue]
        > better to read in the whole line then parse it? Can I parse it on[/color]
        the fly?[color=blue]
        > How?[/color]

        Here's some code I wrote some time ago for splitting sequences of
        characters and adding them to lists. I have used it a lot with Visual
        C++. I don''t guarantee its portability or efficiency, but I looks
        generally okay.

        Usage:

        struct is_tab {
        bool operator(char c) const { return c == '\t'; }
        };

        // Split s using tab as a separator character,
        // adding segments to the end of a vector.
        string s;
        vector<string> vec;
        split(s.begin() , s.end(), back_inserter(v ec), is_tab(), false);

        Here you could use any input iterators for the first and second
        arguments; in particular, you should be able to use istream_iterato rs
        or istreambuf_iter ators.

        Jonathan


        ---------------------
        //
        // File name: split.h
        //
        // Descriptions: Contains template functions for splitting a string
        into
        // a list.
        //
        // Author: Jonathan Turkanis
        //
        // Copyright: Jonathan Turkanis, July 29, 2002. See Readme.txt for
        // license information.
        //

        #ifndef UT_SPLIT_H_INCL UDED
        #define UT_SPLIT_H_INCL UDED

        #include <iterator>
        #include <locale>
        #include <string>
        #include <boost/bind.hpp>
        #include <boost/ref.hpp>

        namespace Utility {

        //
        // Function name: split.
        //
        // Description: Splits the given string into components.
        //
        // Template paramters:
        // InIt - An input iterator type with any value type Elem.
        // OutIt - An output iterator type with value type equal to
        // std::basic_stri ng<Elem>.
        // Pred - A predicate with argument type Elem.
        // Parameters:
        // first - The beginning of the input sequence.
        // last - The end of the input sequence.
        // dest - Receives the terms in the generated list.
        // sep - Determines where to split the input sequence.
        // coalesce - true if sequences of consecutive elements satisfying
        sep
        // should be treated as one. Defaults to true.
        //
        template<class InIt, class OutIt, class Pred>
        void split(InIt first, InIt last, OutIt dest, Pred sep, bool coalesce
        = true);

        //
        // Function name: split_by_whites pace.
        //
        // Description: Splits the given string into components.
        //
        // Template paramters:
        // InIt - An input iterator type with any value type Elem.
        // OutIt - An output iterator type with value type equal to
        // std::basic_stri ng<Elem>.
        // Pred - A predicate with argument type Elem.
        // Parameters:
        // first - The begiining of the input sequence.
        // last - The end of the input sequence.
        // dest - Receives the terms in the generated list.
        //
        template<class InIt, class OutIt>
        void split_by_whites pace(InIt first, InIt last, OutIt dest)
        {
        using namespace std;
        typedef iterator_traits <InIt>::value_t ype char_type;
        locale loc;
        split(first, last, dest, boost::bind(iss pace<char_type> , _1,
        boost::ref(loc) ));
        }

        template<class InIt, class OutIt, class Pred>
        void split(InIt first, InIt last, OutIt dest, Pred sep, bool coalesce)
        {
        using namespace std;
        typedef iterator_traits <InIt>::value_t ype char_type;
        typedef basic_string<ch ar_type> string_type;

        bool prev = true; // True if prev char was a separator.
        string_type term;
        while (first != last) {
        char_type c = *first++;
        bool is_sep = sep(c);
        if (is_sep && (!coalesce || coalesce && !prev)) {
        *dest++ = term;
        term.clear();
        }
        if (!is_sep)
        term += c;
        prev = is_sep;
        }
        if (!term.empty() && !coalesce || coalesce && !prev)
        *dest++ = term;
        }
        }

        #endif // #ifndef UT_SPLIT_H_INCL UDED


        Comment

        • Sharad Kala

          #5
          Re: How to read tsv file?


          "Sharad Kala" <no.spam_sharad k_ind@yahoo.com > wrote in message
          news:bvcrv5$qom b8$1@ID-221354.news.uni-berlin.de...[color=blue]
          >
          > "BCC" <a@b.c> wrote in message
          > news:p1lSb.7633 $uM.3791@newssv r29.news.prodig y.com...[color=green]
          > > Hi,
          > >
          > > I have a tab separated value table like this:
          > > header1 header2 header3
          > > 13.455 55.3 A string
          > > 4.55 5.66 Another string
          > >
          > > I want to load this guy into a vector of vectors, since I do not know how
          > > long it may be. I think I have to have a vector of vectors of strings, and
          > > then extract the doubles later(?):
          > > std::vector<std ::vector<std::s tring> > m_data_vec;
          > >
          > > I started off with this skeletal function, but Im not sure how to parse the
          > > line for tabs and newlines, and stuff the elements into the vector. Is it
          > > better to read in the whole line then parse it? Can I parse it on the fly?
          > > How?
          > >
          > > void MyClass::ReadTS V(const* filename)
          > > {
          > > using namespace std;
          > >
          > > ifstream infile(filename );
          > > if (!infile) {
          > > cout << "unable to load file" << endl;
          > > }
          > >
          > > // Now what?
          > > }[/color]
          > May be this gives you the basic idea.
          > I haven't tested it. Also no checks for errors etc.
          >
          > <UNTESTED CODE>
          >
          > #include <fstream>
          > #include <string>
          > #include <vector>
          > using namespace std;
          >
          > void ReadTSV(const char* filename)
          > {
          > using namespace std;
          >
          > ifstream infile(filename );
          > if (!infile) {
          > cout << "unable to load file" << endl;
          > }
          > string str;
          >
          > vector<vector<s tring> > vvStr;
          > vector<string> vStr;
          > int pos1, pos2;
          > while (getline(infile , str))
          > {
          > pos1 = 0;
          > while((pos2 = str.find('\t')) != string::npos)
          > {
          > vStr.push_back( str.substr(pos1 , pos2));[/color]

          oops..second parameter should be pos2-pos1+1 i guess.


          Comment

          • Jon Bell

            #6
            Re: How to read tsv file?

            In article <p1lSb.7633$uM. 3791@newssvr29. news.prodigy.co m>, BCC <a@b.c> wrote:[color=blue]
            >Hi,
            >
            >I have a tab separated value table like this:
            >header1 header2 header3
            >13.455 55.3 A string
            >4.55 5.66 Another string
            >
            >I want to load this guy into a vector of vectors,[/color]

            Use getline() to read one line at a time, then use a stringstream to split
            the line into tokens. Note you can specify some other line terminator
            than '\n', for getline().

            std::vector<std ::vector<std::s tring> > m_data_vec;
            std::string line;
            while (std::getline (infile, line))
            {
            std::istringstr eam linestream (line);
            std::string token;
            std::vector<std ::string> row;
            while (std::getline (linestream, token, '\t')
            {
            row.push_back (token);
            }
            m_data_vec.push _back (row);
            }

            Actually, your example is easy to parse without a stringstream, if you
            use a struct to represent a line, with appropriate member data types:

            struct data_rec
            {
            double foo, bar;
            std::string baz;
            };

            std::vector<dat a_rec> m_data_vec;
            data_rec linedata;
            while ((infile >> linedata.foo >> linedata.bar))
            && std::getline (input, linedata.baz))
            {
            m_data_vec.push _back (linedata);
            }

            --
            Jon Bell <jtbellm4h@pres by.edu> Presbyterian College
            Dept. of Physics and Computer Science Clinton, South Carolina USA

            Comment

            • Jon Bell

              #7
              Re: How to read tsv file?

              In article <p1lSb.7633$uM. 3791@newssvr29. news.prodigy.co m>, BCC <a@b.c> wrote:[color=blue]
              >Hi,
              >
              >I have a tab separated value table like this:
              >header1 header2 header3
              >13.455 55.3 A string
              >4.55 5.66 Another string
              >
              >I want to load this guy into a vector of vectors,[/color]

              Use getline() to read one line at a time, then use a stringstream to split
              the line into tokens. Note you can specify some other line terminator
              than '\n', for getline().

              std::vector<std ::vector<std::s tring> > m_data_vec;
              std::string line;
              while (std::getline (infile, line))
              {
              std::istringstr eam linestream (line);
              std::string token;
              std::vector<std ::string> row;
              while (std::getline (linestream, token, '\t')
              {
              row.push_back (token);
              }
              m_data_vec.push _back (row);
              }

              Actually, your example is easy to parse without a stringstream, if you
              use a struct to represent a line, with appropriate member data types:

              struct data_rec
              {
              double foo, bar;
              std::string baz;
              };

              std::vector<dat a_rec> m_data_vec;
              data_rec linedata;
              while ((infile >> linedata.foo >> linedata.bar)
              && std::getline (infile, linedata.baz))
              {
              m_data_vec.push _back (linedata);
              }

              --
              Jon Bell <jtbellm4h@pres by.edu> Presbyterian College
              Dept. of Physics and Computer Science Clinton, South Carolina USA

              Comment

              • Chris Theis

                #8
                Re: How to read tsv file?


                "Sharad Kala" <no.spam_sharad k_ind@yahoo.com > wrote in message
                news:bvct2d$r15 ue$1@ID-221354.news.uni-berlin.de...[color=blue]
                >
                > "Sharad Kala" <no.spam_sharad k_ind@yahoo.com > wrote in message
                > news:bvcrv5$qom b8$1@ID-221354.news.uni-berlin.de...[color=green]
                > >
                > > "BCC" <a@b.c> wrote in message
                > > news:p1lSb.7633 $uM.3791@newssv r29.news.prodig y.com...[color=darkred]
                > > > Hi,
                > > >
                > > > I have a tab separated value table like this:
                > > > header1 header2 header3
                > > > 13.455 55.3 A string
                > > > 4.55 5.66 Another string
                > > >
                > > > I want to load this guy into a vector of vectors, since I do not know[/color][/color][/color]
                how[color=blue][color=green][color=darkred]
                > > > long it may be. I think I have to have a vector of vectors of[/color][/color][/color]
                strings, and[color=blue][color=green][color=darkred]
                > > > then extract the doubles later(?):
                > > > std::vector<std ::vector<std::s tring> > m_data_vec;
                > > >
                > > > I started off with this skeletal function, but Im not sure how to[/color][/color][/color]
                parse the[color=blue][color=green][color=darkred]
                > > > line for tabs and newlines, and stuff the elements into the vector.[/color][/color][/color]
                Is it[color=blue][color=green][color=darkred]
                > > > better to read in the whole line then parse it? Can I parse it on the[/color][/color][/color]
                fly?[color=blue][color=green][color=darkred]
                > > > How?
                > > >
                > > > void MyClass::ReadTS V(const* filename)
                > > > {
                > > > using namespace std;
                > > >
                > > > ifstream infile(filename );
                > > > if (!infile) {
                > > > cout << "unable to load file" << endl;
                > > > }
                > > >
                > > > // Now what?
                > > > }[/color]
                > > May be this gives you the basic idea.
                > > I haven't tested it. Also no checks for errors etc.
                > >
                > > <UNTESTED CODE>
                > >
                > > #include <fstream>
                > > #include <string>
                > > #include <vector>
                > > using namespace std;
                > >
                > > void ReadTSV(const char* filename)
                > > {
                > > using namespace std;
                > >
                > > ifstream infile(filename );
                > > if (!infile) {
                > > cout << "unable to load file" << endl;
                > > }
                > > string str;
                > >
                > > vector<vector<s tring> > vvStr;
                > > vector<string> vStr;
                > > int pos1, pos2;
                > > while (getline(infile , str))
                > > {
                > > pos1 = 0;
                > > while((pos2 = str.find('\t')) != string::npos)
                > > {
                > > vStr.push_back( str.substr(pos1 , pos2));[/color]
                >
                > oops..second parameter should be pos2-pos1+1 i guess.
                >[/color]

                There is even an easier way to obtain the vStr vector using stringstreams:


                template <class T>
                std::vector<T> StringToVector( const std::string& Str )
                {
                std::istringstr eam iss( Str );
                return std::vector<T>( std::istream_it erator<T>(iss),
                std::istream_it erator<T>() );
                }

                [OT]
                Using VC++ 6.0 this solution has to be altered a little bit using copy and a
                back_inserter 'cause the appropriate ctor of vector is not yet available in
                that compiler version.

                Regards
                Chris


                Comment

                • Sharad Kala

                  #9
                  Re: How to read tsv file?


                  "Chris Theis" <Christian.Thei s@nospam.cern.c h> wrote in message
                  news:bvd698$1k9 $1@sunnews.cern .ch...[color=blue]
                  >
                  > "Sharad Kala" <no.spam_sharad k_ind@yahoo.com > wrote in message
                  > news:bvct2d$r15 ue$1@ID-221354.news.uni-berlin.de...[color=green]
                  > >
                  > > "Sharad Kala" <no.spam_sharad k_ind@yahoo.com > wrote in message
                  > > news:bvcrv5$qom b8$1@ID-221354.news.uni-berlin.de...[color=darkred]
                  > > >
                  > > > "BCC" <a@b.c> wrote in message
                  > > > news:p1lSb.7633 $uM.3791@newssv r29.news.prodig y.com...
                  > > > > Hi,
                  > > > >
                  > > > > I have a tab separated value table like this:
                  > > > > header1 header2 header3
                  > > > > 13.455 55.3 A string
                  > > > > 4.55 5.66 Another string
                  > > > >
                  > > > > I want to load this guy into a vector of vectors, since I do not know[/color][/color]
                  > how[color=green][color=darkred]
                  > > > > long it may be. I think I have to have a vector of vectors of[/color][/color]
                  > strings, and[color=green][color=darkred]
                  > > > > then extract the doubles later(?):
                  > > > > std::vector<std ::vector<std::s tring> > m_data_vec;
                  > > > >
                  > > > > I started off with this skeletal function, but Im not sure how to[/color][/color]
                  > parse the[color=green][color=darkred]
                  > > > > line for tabs and newlines, and stuff the elements into the vector.[/color][/color]
                  > Is it[color=green][color=darkred]
                  > > > > better to read in the whole line then parse it? Can I parse it on the[/color][/color]
                  > fly?[color=green][color=darkred]
                  > > > > How?
                  > > > >
                  > > > > void MyClass::ReadTS V(const* filename)
                  > > > > {
                  > > > > using namespace std;
                  > > > >
                  > > > > ifstream infile(filename );
                  > > > > if (!infile) {
                  > > > > cout << "unable to load file" << endl;
                  > > > > }
                  > > > >
                  > > > > // Now what?
                  > > > > }
                  > > > May be this gives you the basic idea.
                  > > > I haven't tested it. Also no checks for errors etc.
                  > > >
                  > > > <UNTESTED CODE>
                  > > >
                  > > > #include <fstream>
                  > > > #include <string>
                  > > > #include <vector>
                  > > > using namespace std;
                  > > >
                  > > > void ReadTSV(const char* filename)
                  > > > {
                  > > > using namespace std;
                  > > >
                  > > > ifstream infile(filename );
                  > > > if (!infile) {
                  > > > cout << "unable to load file" << endl;
                  > > > }
                  > > > string str;
                  > > >
                  > > > vector<vector<s tring> > vvStr;
                  > > > vector<string> vStr;
                  > > > int pos1, pos2;
                  > > > while (getline(infile , str))
                  > > > {
                  > > > pos1 = 0;
                  > > > while((pos2 = str.find('\t')) != string::npos)
                  > > > {
                  > > > vStr.push_back( str.substr(pos1 , pos2));[/color]
                  > >
                  > > oops..second parameter should be pos2-pos1+1 i guess.
                  > >[/color]
                  >
                  > There is even an easier way to obtain the vStr vector using stringstreams:
                  >
                  >
                  > template <class T>
                  > std::vector<T> StringToVector( const std::string& Str )
                  > {
                  > std::istringstr eam iss( Str );
                  > return std::vector<T>( std::istream_it erator<T>(iss),
                  > std::istream_it erator<T>() );
                  > }[/color]

                  How do you take care of the '\t' in the string?



                  Comment

                  • David Harmon

                    #10
                    Re: How to read tsv file?

                    On Fri, 30 Jan 2004 16:20:30 +0530 in comp.lang.c++, "Sharad Kala"
                    <no.spam_sharad k_ind@yahoo.com > was alleged to have written:[color=blue][color=green]
                    >> template <class T>
                    >> std::vector<T> StringToVector( const std::string& Str )
                    >> {
                    >> std::istringstr eam iss( Str );
                    >> return std::vector<T>( std::istream_it erator<T>(iss),
                    >> std::istream_it erator<T>() );
                    >> }[/color]
                    >
                    >How do you take care of the '\t' in the string?[/color]

                    istream_iterato r<T> uses T's operator>> which in turn recognizes any
                    kind of whitespace as a delimiter.

                    Comment

                    • Chris Theis

                      #11
                      Re: How to read tsv file?


                      "Sharad Kala" <no.spam_sharad k_ind@yahoo.com > wrote in message
                      news:bvdcko$r7l mj$1@ID-221354.news.uni-berlin.de...
                      [SNIP]> >[color=blue][color=green]
                      > > There is even an easier way to obtain the vStr vector using[/color][/color]
                      stringstreams:[color=blue][color=green]
                      > >
                      > >
                      > > template <class T>
                      > > std::vector<T> StringToVector( const std::string& Str )
                      > > {
                      > > std::istringstr eam iss( Str );
                      > > return std::vector<T>( std::istream_it erator<T>(iss),
                      > > std::istream_it erator<T>() );
                      > > }[/color]
                      >
                      > How do you take care of the '\t' in the string?
                      >[/color]

                      This should be done by the istream_iterato rs (at least in the Dinkumware
                      implementation used under VC++). However, I did not yet try it under another
                      compiler like g++.

                      Cheers
                      Chris


                      Comment

                      Working...