How to Parse a CSV formatted text file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ram Laxman

    How to Parse a CSV formatted text file

    Hi all,
    I have a text file which have data in CSV format.
    "empno","phonen umber","wardnum ber"
    12345,2234353,1 000202
    12326,2243653,1 000098
    Iam a beginner of C/C++ programming.
    I don't know how to tokenize the comma separated values.I used strtok
    function reading line by line using fgets.but it gives some weird
    behavior.It doesnot stripout the "" fully.Could any body have sample
    code for the same so that it will be helfful for my reference?

    Ram Laxman



    Ram Laxman
  • Phlip

    #2
    Re: How to Parse a CSV formatted text file

    Ram Laxman wrote:
    [color=blue]
    > I have a text file which have data in CSV format.
    > "empno","phonen umber","wardnum ber"
    > 12345,2234353,1 000202
    > 12326,2243653,1 000098
    > Iam a beginner of C/C++ programming.
    > I don't know how to tokenize the comma separated values.I used strtok
    > function reading line by line using fgets.but it gives some weird
    > behavior.It doesnot stripout the "" fully.Could any body have sample
    > code for the same so that it will be helfful for my reference?[/color]

    Parsing is tricky. Consider these rules:

    - \n is absolute. All lines must be unbroken
    - "" precedes , - so commas inside strings are text, not delimiters
    - quotes inside "" need an escape, either \n or ""
    - escapes need escapes - \\ is \

    Try this project to learn more:



    First, we express those rules (one by one) as test cases:

    TEST_(TestCase, pullNextToken_c omma)
    {

    Source aSource("a , b\nc, \n d");

    string
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("a", token);
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("b", token);
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("c", token);
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("d", token);
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("", token);
    // EOF!

    }

    struct
    TestTokens: TestCase
    {

    void
    test_a_b_d(stri ng input)
    {
    Source aSource(input);
    string
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("a", token);
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("b", token);
    // token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("c",
    token);
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("d", token);
    token = aSource.pullNex tToken(); CPPUNIT_ASSERT_ EQUAL("", token);
    // EOF!
    }

    };

    TEST_(TestToken s?, elideComments)
    {
    test_a_b_d("a b\n //c\n d");
    test_a_b_d("a b\n//c \n d");
    test_a_b_d("a b\n // c \"neither\" \n d");
    test_a_b_d("a b\n // c \"neither\" \n d//");
    test_a_b_d("//\na b\n // c \"neither\" \n d//");
    test_a_b_d("//c\na b\n // c \"neither\" \n d//");
    test_a_b_d("// c\na b\n // c \"neither\" \n d//");
    test_a_b_d("//c \na b\n // c \"neither\" \n d//");
    test_a_b_d("// \na b\n // c \"neither\" \n d//");
    test_a_b_d(" // \na b\n // c \"neither\" \n d//");
    }

    TEST_(TestToken s?, elideStreamComm ents)
    {
    test_a_b_d("a b\n /*c*/\n d");
    test_a_b_d("a b\n/*c*/ \n d");
    test_a_b_d("a b\n /* c \"neither\" */\n d");
    test_a_b_d("a b\n /* c \"neither\" \n */ d//");
    test_a_b_d("//\na b\n /* c \"neither\" */ \n d/**/");
    test_a_b_d("//c\na b\n // c \"neither\" \n d/* */");
    test_a_b_d("/* c\n*/a b\n // c \"neither\" \n d//");
    test_a_b_d("//c \na b\n // c \"neither\" \n d//");
    test_a_b_d("// \na b\n // c \"neither\" \n d//");
    test_a_b_d(" // \na b\n // c \"neither\" \n d//");
    }

    Those tests re-use the fixture test_a_b_d() to ensure that every one of
    those strings parse into a, b, & d, skipping (for whatever reason) c.

    You will need tests that show slightly different behaviors. But write your
    tests one at a time. I wrote every single line you see here, essentially in
    order, and got it to work before adding the next line. Don't write all your
    tests at once, because when programming you should never go more than 1~10
    edits before passing all tests.

    Now here's the source of Source (which means "source of tokens"):

    class
    Source
    {
    public:
    Source(string const & rc = ""):
    m_rc(rc),
    m_bot(0),
    m_eot(0)
    {}

    void setResource(str ing const & rc) { m_rc = rc; }
    size_type getBOT() { return m_bot; }
    string const & getPriorToken() { return m_priorToken; }
    string const & getCurrentToken () { return m_currentToken; }

    string const &
    pullNextToken()
    {
    m_priorToken = m_currentToken;
    extractNextToke n();
    return m_currentToken;
    }

    size_type
    getLineNumber(s ize_type at)
    {
    size_type lineNumber = 1;

    for(size_type idx(0); idx < at; ++idx)
    if ('\n' == m_rc[idx])
    ++lineNumber;

    return lineNumber;
    }

    string
    getLine(size_ty pe at)
    {
    size_type bol = m_rc.rfind('\n' , at);
    if (string::npos == bol) bol = 0; else ++bol;
    size_type eol = m_rc.find('\n', at);
    if (string::npos == eol) eol = m_rc.length(); else ++eol;
    return m_rc.substr(bol , eol - bol);
    }

    private:

    string const &
    extractNextToke n()
    {
    char static const delims[] = " \t\n,";

    m_bot = m_rc.find_first _not_of(delims, m_eot);

    if (string::npos == m_bot)
    m_currentToken = "";
    else if (m_rc[m_bot] == '"')
    m_currentToken = parseString();
    else if (m_rc.substr(m_ bot, 2) == "//")
    {
    if (skipUntil("\n" ))
    return extractNextToke n();
    }
    else if (m_rc.substr(m_ bot, 2) == "/*")
    {
    if (skipUntil("*/"))
    return extractNextToke n();
    }
    /* else if (m_rc.substr(m_ bot, 1) == "#")
    {
    string line = getLine(m_bot);
    size_type at(0);
    while(isspace(l ine[at]) && at < line.size()) ++at;


    if ('#' == line[at])
    {
    m_eot = m_bot + 1;
    if (skipUntil("\n" ))
    return extractNextToke n();
    }
    }*/
    else
    {
    m_eot = m_rc.find_first _of(" \n,/", m_bot);
    m_currentToken = m_rc.substr(m_b ot, m_eot - m_bot);
    }

    if ('#' == m_currentToken[0])
    {
    // assert(m_rc.sub str(m_bot, 1) == "#");
    string line = getLine(m_bot);
    size_type at(0);
    while(isspace(l ine[at]) && at < line.size()) ++at;

    if ('#' == line[at])
    {
    --m_eot;
    if (skipUntil("\n" ))
    return extractNextToke n();
    }
    }
    return m_currentToken;
    }

    bool
    skipUntil(char const * delimiter)
    {
    m_eot = m_rc.find(delim iter, m_eot + 1);

    if (string::npos == m_eot)
    {
    m_currentToken = "";
    return false;
    }
    m_eot += strlen(delimite r);
    return true;
    }

    char
    parseStringChar ()
    {
    if (m_rc[m_eot] == '\\')
    {
    m_eot += 1;
    char escapee(m_rc[m_eot++]);

    switch (escapee)
    {
    case 'n' : return '\n';
    case 'r' : return '\r';
    case 't' : return '\t';
    case '0' : return '\0';
    case '\\': return '\\';
    case 'a' : return '\a';
    default : // TODO \x, \v \b, \f
    if (isdigit(escape e))
    {
    string slug = m_rc.substr(m_e ot - 1, 3);
    return char(strtol(slu g.c_str(), NULL, 8));
    }
    else
    //assert(false);
    return escapee;
    }
    }
    else if (m_rc[m_eot] == '"' && m_rc[m_eot+1] == '"')
    m_eot++;

    return m_rc[m_eot++];
    }

    string
    parseString()
    {
    m_eot = m_bot + 1;
    string z;

    while ( m_eot < m_rc.length() &&
    ( m_rc[m_eot] != '"' ||
    m_rc[m_eot + 1] == '"' ) )
    z += parseStringChar ();

    if (m_eot < m_rc.length())
    m_eot += 1;

    return z;
    }

    string m_rc;
    size_type m_bot;
    size_type m_eot;
    string m_priorToken;
    string m_currentToken;
    };

    That looks really ugly & long, because it hides so much behind such a narrow
    interface. (I don't know if I copied all of it in, either.) But it
    demonstrates (possibly) correct usage of std::string and std::vector.

    Do not copy my source into your editor and try to run it. It will not parse
    CVS. Start your project like this:

    #include <assert.h>
    #include <string>
    #include <vector>
    typedef std::vector<std ::string> strings_t;

    strings_t parse(std::stri ng input)
    {
    strings_t result;
    return result;
    }

    int main()
    {
    assert("a" == parse("a,b")[0]);
    }

    If that compiles, it >will< crash if you run it.

    Now fix parse() so that it _only_ does not crash, and passes this test. Make
    the implementation as stupid as you like.

    Then add a test:

    assert("a" == parse("a,b")[0]);
    assert("b" == parse("a,b")[1]);

    Keep going. Make the implementation just a little better after each test.
    Write a set of tests for each of the parsing rules I listed. When the new
    parse() function is full-featured, put it to work in your program.

    All programs should be written by generating long lists of simple tests like
    this. That keeps the bug count very low, and prevents wasting hours and
    hours with a debugger.

    --
    Phlip



    Comment

    • Willem

      #3
      Re: How to Parse a CSV formatted text file

      Ram wrote:
      ) Hi all,
      ) I have a text file which have data in CSV format.
      ) "empno","phonen umber","wardnum ber"
      ) 12345,2234353,1 000202
      ) 12326,2243653,1 000098
      ) Iam a beginner of C/C++ programming.
      ) I don't know how to tokenize the comma separated values.I used strtok
      ) function reading line by line using fgets.but it gives some weird
      ) behavior.It doesnot stripout the "" fully.Could any body have sample
      ) code for the same so that it will be helfful for my reference?

      Here's a tip: Look for a library that scans CSV files.

      And if you really want to do it yourself, you really don't want to be using
      stuff like strtok. Assuming you have one complete line in memory, you're
      better off searching for the commas (and quotes) yourself, that's really
      not so hard. Just put NULs where the commas are, and point to the
      beginning of the strings (just after the comma). You can then pass these
      pointers as strings to another parsing routine that turns stuff without
      quotes into integers, and stuff with quotes into strings or whatever.


      SaSW, Willem
      --
      Disclaimer: I am in no way responsible for any of the statements
      made in the above text. For all I know I might be
      drugged or something..
      No I'm not paranoid. You all think I'm paranoid, don't you !
      #EOT

      Comment

      • Phlip

        #4
        Re: How to Parse a CSV formatted text file

        Willem wrote:
        [color=blue]
        > Ram wrote:
        > ) Hi all,
        > ) I have a text file which have data in CSV format.
        > ) "empno","phonen umber","wardnum ber"
        > ) 12345,2234353,1 000202
        > ) 12326,2243653,1 000098
        > ) Iam a beginner of C/C++ programming.
        > ) I don't know how to tokenize the comma separated values.I used strtok
        > ) function reading line by line using fgets.but it gives some weird
        > ) behavior.It doesnot stripout the "" fully.Could any body have sample
        > ) code for the same so that it will be helfful for my reference?
        >
        > Here's a tip: Look for a library that scans CSV files.[/color]

        Hi Willem! Welcome to the first hard projects of this semester. So far, a
        professor somewhere has assumed their class was reading the right chapters
        in their tutorial, and has hit them with the first non-Hello World project.

        Someone just posted the same question to news:comp.progr amming .

        --
        Phlip



        Comment

        • Mike Wahler

          #5
          Re: How to Parse a CSV formatted text file


          "Ram Laxman" <ram_laxman@ind ia.com> wrote in message
          news:24812e22.0 402070939.27b82 bba@posting.goo gle.com...[color=blue]
          > Hi all,
          > I have a text file which have data in CSV format.
          > "empno","phonen umber","wardnum ber"
          > 12345,2234353,1 000202
          > 12326,2243653,1 000098
          > Iam a beginner of C/C++ programming.
          > I don't know how to tokenize the comma separated values.I used strtok
          > function reading line by line using fgets.but it gives some weird
          > behavior.It doesnot stripout the "" fully.Could any body have sample
          > code for the same so that it will be helfful for my reference?
          >
          > Ram Laxman[/color]

          #include <cstdlib>
          #include <fstream>
          #include <ios>
          #include <iomanip>
          #include <iostream>
          #include <sstream>
          #include <string>

          int main()
          {
          std::ifstream ifs("csv.txt");
          if(!ifs)
          {
          std::cerr << "Cannot open input\n";
          return EXIT_FAILURE;
          }

          const std::streamsize width(15);
          std::cout << std::left;

          std::string line;
          while(std::getl ine(ifs, line))
          {
          std::string tok1;
          std::istringstr eam iss(line);
          while(std::getl ine(iss, tok1, ','))
          {
          if(tok1.find('" ') != std::string::np os)
          {
          std::string tok2;
          std::istringstr eam iss(tok1);
          while(std::getl ine(iss, tok2, '"'))
          {
          if(!tok2.empty( ))
          std::cout << std::setw(width ) << tok2;
          }
          }
          else
          std::cout << std::setw(width ) << tok1;

          std::cout << ' ';

          }
          std::cout << " \n";
          }

          if(!ifs && !ifs.eof())
          std::cerr << "Error reading input\n";

          return 0;
          }

          Input file:

          "empno","phonen umber","wardnum ber"
          12345,2234353,1 000202
          12326,2243653,1 000098



          Output:

          empno phonenumber wardnumber
          12345 2234353 1000202
          12326 2243653 1000098



          -Mike


          Comment

          • Jon Bell

            #6
            Re: How to Parse a CSV formatted text file

            In article <mgaVb.17848$uM 2.163@newsread1 .news.pas.earth link.net>,
            Mike Wahler <mkwahler@mkwah ler.net> wrote:

            [code snipped]
            [color=blue]
            >Input file:
            >
            >"empno","phone number","wardnu mber"
            >12345,2234353, 1000202
            >12326,2243653, 1000098[/color]

            Try changing the first line so one of the tokens contains a comma, e.g.

            "empno","ph one, number","wardnu mber"

            ;-)

            I started to work on a solution, too, and then I thought about embedded
            commas, and went, "uh oh..."

            --
            Jon Bell <jtbellm4h@pres by.edu> Presbyterian College
            Dept. of Physics and Computer Science Clinton, South Carolina USA

            Comment

            • Phlip

              #7
              Re: How to Parse a CSV formatted text file

              Mike Wahler wrote:
              [color=blue]
              > #include <cstdlib>[/color]

              Hi Mike!

              I just wanted to be the first to remind you that the FAQ advises against
              doing others' homework - fun though it may be. (Advising the newbie to throw
              in a few Design Patterns is better sport, of course...)

              --
              Phlip



              Comment

              • Mike Wahler

                #8
                Re: How to Parse a CSV formatted text file


                "Phlip" <phlip_cpp@yaho o.com> wrote in message
                news:YtaVb.1946 2$oo6.14119@new ssvr16.news.pro digy.com...[color=blue]
                > Mike Wahler wrote:
                >[color=green]
                > > #include <cstdlib>[/color]
                >
                > Hi Mike!
                >
                > I just wanted to be the first to remind you that the FAQ advises against
                > doing others' homework - fun though it may be.[/color]

                Yes, I realize that.
                [color=blue]
                > (Advising the newbie to throw
                > in a few Design Patterns is better sport, of course...)[/color]

                I very much doubt that the code would be accepted 'as is'
                by an instructor -- unless the student can explain it --
                in which case he would have actually studied and learned... :-)
                Anyway, it seems that OP isn't quite sure whether he's learning
                C or C++.

                -Mike


                Comment

                • Mike Wahler

                  #9
                  Re: How to Parse a CSV formatted text file


                  "Jon Bell" <jtbellj3p@pres by.edu> wrote in message
                  news:c03c7s$mq0 $1@jtbell.presb y.edu...[color=blue]
                  > In article <mgaVb.17848$uM 2.163@newsread1 .news.pas.earth link.net>,
                  > Mike Wahler <mkwahler@mkwah ler.net> wrote:
                  >
                  > [code snipped]
                  >[color=green]
                  > >Input file:
                  > >
                  > >"empno","phone number","wardnu mber"
                  > >12345,2234353, 1000202
                  > >12326,2243653, 1000098[/color]
                  >
                  > Try changing the first line so one of the tokens contains a comma, e.g.
                  >
                  > "empno","ph one, number","wardnu mber"
                  >
                  > ;-)
                  >
                  > I started to work on a solution, too, and then I thought about embedded
                  > commas, and went, "uh oh..."[/color]

                  Well, yes I did think about bad input, but thought I'd leave that
                  to the OP. IOW I gave a very 'literal' answer that only addressed
                  the exact input cited by the OP. :-)


                  -Mike


                  Comment

                  • Jon Bell

                    #10
                    Re: How to Parse a CSV formatted text file

                    In article <vGaVb.17879$uM 2.5046@newsread 1.news.pas.eart hlink.net>,
                    Mike Wahler <mkwahler@mkwah ler.net> wrote:[color=blue]
                    >
                    >Well, yes I did think about bad input, but thought I'd leave that
                    >to the OP. IOW I gave a very 'literal' answer that only addressed
                    >the exact input cited by the OP. :-)[/color]

                    It would be interesting to find out if the instructor actually intended
                    the students to go whole hog and deal with embedded commas, escaped
                    quotes, etc. If it's an introductory programming course, it's quite
                    possible they don't need to worry about those details for the purposes of
                    the assignment.

                    --
                    Jon Bell <jtbellm4h@pres by.edu> Presbyterian College
                    Dept. of Physics and Computer Science Clinton, South Carolina USA

                    Comment

                    • Derk Gwen

                      #11
                      Re: How to Parse a CSV formatted text file

                      ram_laxman@indi a.com (Ram Laxman) wrote:
                      # Hi all,
                      # I have a text file which have data in CSV format.
                      # "empno","phonen umber","wardnum ber"
                      # 12345,2234353,1 000202
                      # 12326,2243653,1 000098
                      # Iam a beginner of C/C++ programming.
                      # I don't know how to tokenize the comma separated values.I used strtok
                      # function reading line by line using fgets.but it gives some weird
                      # behavior.It doesnot stripout the "" fully.Could any body have sample
                      # code for the same so that it will be helfful for my reference?

                      This is probably a type 3 language, so you can probably use a finite
                      state machine. If you're just beginning, that can be an intimidating
                      bit of jargon, but FSMs are actually easy to understand, and if you
                      want to be a programmer, you have to understand them. They pop up all
                      over the place.

                      You can #defines to abstract the FSM with something like

                      #define FSM(name) static int name(FILE *file) {int ch=0,m=0,n=0; char *s=0;
                      #define endFSM return -1;}

                      #define state(name) name: ch = fgetc(stdin); e_##name: switch (ch) {
                      #define endstate } return -1;

                      #define is(character) case character:
                      #define any default:
                      #define next(name) ;goto name
                      #define emove(name) ;goto e_##name;
                      #define final(name,valu e) name: e_##name: free(s); return value;

                      #define shift ;if (n+1>=m) {m = 2*(n+1); s = realloc(s,m);} s[n++] = ch;
                      #define discard ;m = n = 0; s = 0;
                      #define dispose ;free(s) discard



                      static void got_empno(char *s);
                      static void got_phonenumber (char *s);
                      static void got_wardnumber( char *s);
                      static void got_csventry(vo id);

                      FSM(csv_parser)
                      state(empno)
                      is('"') next(quoted_emp no)
                      is(EOF) next(at_end)
                      is(',') got_empno(s) discard next(phonenumbe r)
                      any shift next(empno)
                      endstate
                      state(quoted_em pno)
                      is('"') next(empno)
                      is(EOF) next(at_end_in_ string)
                      any shift next(empno)
                      endstate
                      state(phonenumb er)
                      is('"') next(quoted_pho nenumber)
                      is(EOF) next(at_end_in_ entry)
                      is(',') got_phonenumber (s) discard next(wardnumber )
                      any shift next(phonenumbe r)
                      endstate
                      state(quoted_ph onenumber)
                      is('"') next(phonenumbe r)
                      is(EOF) next(at_end_in_ string)
                      any shift next(phonenumbe r)
                      endstate
                      state(wardnumbe r)
                      is('"') next(quoted_war dnumber)
                      is(EOF)
                      got_wardnumber( s); got_csventry() discard
                      next(at_end)
                      is('\n')
                      got_wardnumber( s); got_csventry() discard
                      next(empno)
                      is(',') got_wardnumber( s) discard next(unexpected _field)
                      any shift next(wardnumber )
                      endstate
                      state(quoted_wa rdnumber)
                      is('"') next(wardnumber )
                      is(EOF) next(at_end_in_ string)
                      any shift next(wardnumber )
                      endstate
                      final(at_end,0)
                      final(at_end_in _string,1)
                      final(unexpecte d_field,2)
                      endFSM

                      ....
                      int rc = csv_parser(stdi n);
                      // calls
                      // got_empno(empno-string)
                      // got_phonenumber (phonenumber-string)
                      // got_wardnumber( wardnumber-string)
                      // got_csventry()
                      // for each entry
                      switch (rc) {
                      case -1: fputs("parser failure\n",stde rr); break;
                      case 1: fputs("end of file in a string\n",stder r); break;
                      case 2: fputs("too many fields\n",stder r); break;
                      }
                      ....

                      --
                      Derk Gwen http://derkgwen.250free.com/html/index.html
                      I have no idea what you just said.
                      I get that alot.

                      Comment

                      • Jack Klein

                        #12
                        Re: How to Parse a CSV formatted text file

                        On Sat, 07 Feb 2004 18:38:10 GMT, "Mike Wahler"
                        <mkwahler@mkwah ler.net> wrote in comp.lang.c:
                        [color=blue]
                        >
                        > "Ram Laxman" <ram_laxman@ind ia.com> wrote in message
                        > news:24812e22.0 402070939.27b82 bba@posting.goo gle.com...[color=green]
                        > > Hi all,
                        > > I have a text file which have data in CSV format.
                        > > "empno","phonen umber","wardnum ber"
                        > > 12345,2234353,1 000202
                        > > 12326,2243653,1 000098
                        > > Iam a beginner of C/C++ programming.
                        > > I don't know how to tokenize the comma separated values.I used strtok
                        > > function reading line by line using fgets.but it gives some weird
                        > > behavior.It doesnot stripout the "" fully.Could any body have sample
                        > > code for the same so that it will be helfful for my reference?
                        > >
                        > > Ram Laxman[/color]
                        >
                        > #include <cstdlib>
                        > #include <fstream>
                        > #include <ios>
                        > #include <iomanip>
                        > #include <iostream>
                        > #include <sstream>
                        > #include <string>[/color]

                        [snip]

                        Mike, please do NOT post C++ code to messages crossposted to
                        comp.lang.c.

                        Thanks

                        --
                        Jack Klein
                        Home: http://JK-Technology.Com
                        FAQs for
                        comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
                        comp.lang.c++ http://www.parashift.com/c++-faq-lite/
                        alt.comp.lang.l earn.c-c++

                        Comment

                        • bartek

                          #13
                          Re: How to Parse a CSV formatted text file

                          ram_laxman@indi a.com (Ram Laxman) wrote in
                          news:24812e22.0 402070939.27b82 bba@posting.goo gle.com:
                          [color=blue]
                          > Hi all,
                          > I have a text file which have data in CSV format.
                          > "empno","phonen umber","wardnum ber"
                          > 12345,2234353,1 000202
                          > 12326,2243653,1 000098
                          > Iam a beginner of C/C++ programming.
                          > I don't know how to tokenize the comma separated values.I used strtok
                          > function reading line by line using fgets.but it gives some weird
                          > behavior.It doesnot stripout the "" fully.Could any body have sample
                          > code for the same so that it will be helfful for my reference?
                          >[/color]

                          Check out the amazing Spirit framework.
                          It's available from Boost libraries: http://www.boost.org

                          Comment

                          • Mark McIntyre

                            #14
                            Re: How to Parse a CSV formatted text file

                            On 7 Feb 2004 09:39:14 -0800, in comp.lang.c , ram_laxman@indi a.com
                            (Ram Laxman) wrote:
                            [color=blue]
                            >Hi all,
                            > I have a text file which have data in CSV format.
                            >"empno","phone number","wardnu mber"
                            >12345,2234353, 1000202
                            >12326,2243653, 1000098
                            >Iam a beginner of C/C++ programming.
                            >I don't know how to tokenize the comma separated values.I used strtok
                            >function reading line by line using fgets.but it gives some weird
                            >behavior.[/color]

                            yes, you need to handle that sort of stuff yourself. Personally I'd
                            use strtok on this sort of data, since embedded commas should not
                            exist. Consider the 1st line a special case.

                            --
                            Mark McIntyre
                            CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
                            CLC readme: <http://www.angelfire.c om/ms3/bchambless0/welcome_to_clc. html>


                            ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
                            http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
                            ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

                            Comment

                            • Joe Wright

                              #15
                              Re: How to Parse a CSV formatted text file

                              Mark McIntyre wrote:[color=blue]
                              >
                              > On 7 Feb 2004 09:39:14 -0800, in comp.lang.c , ram_laxman@indi a.com
                              > (Ram Laxman) wrote:
                              >[color=green]
                              > >Hi all,
                              > > I have a text file which have data in CSV format.
                              > >"empno","phone number","wardnu mber"
                              > >12345,2234353, 1000202
                              > >12326,2243653, 1000098
                              > >Iam a beginner of C/C++ programming.
                              > >I don't know how to tokenize the comma separated values.I used strtok
                              > >function reading line by line using fgets.but it gives some weird
                              > >behavior.[/color]
                              >
                              > yes, you need to handle that sort of stuff yourself. Personally I'd
                              > use strtok on this sort of data, since embedded commas should not
                              > exist. Consider the 1st line a special case.
                              >[/color]
                              I don't know of a 'Standard' defining .csv but this is normal output
                              from Visual FoxPro..

                              first,last
                              "Mac "The Knife" Peter","Boswell , Jr."

                              But strangely, Excel reads it back wrong. Go figure.
                              "Failure is not an option. With M$ it is bundled with every package."

                              The format started with dBASE I think and goes something like this..

                              Fields are alphanumerics separated by commas. Fields of type 'Character'
                              are further delimited with '"' so that they might contain comma and '"'
                              itself. The Rules are something like this..

                              The first field begins with the first character on the line.
                              Fields end at a naked ',' comma or '\n' newline.
                              Delimited fields begin with '"' and end with '"' and comma or newline.
                              The delimiters are not a literal part of the field. Any comma or '"'
                              within the delimiters are literals.

                              --
                              Joe Wright http://www.jw-wright.com
                              "Everything should be made as simple as possible, but not simpler."
                              --- Albert Einstein ---

                              Comment

                              Working...