How to Parse a CSV formatted text file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mark McIntyre

    #16
    Re: How to Parse a CSV formatted text file

    On Sun, 08 Feb 2004 14:43:05 GMT, in comp.lang.c , Joe Wright
    <joewwright@ear thlink.net> wrote:
    [color=blue]
    >Mark McIntyre wrote:[color=green]
    >>
    >>
    >> yes, you need to handle that sort of stuff yourself. Personally I'd
    >> use strtok on this sort of data, since embedded commas should not
    >> exist. Consider the 1st line a special case.
    >>[/color]
    >I don't know of a 'Standard' defining .csv but this is normal output
    >from Visual FoxPro..[/color]

    snip example w/ embedded commas.

    Interesting, but hte OP's data was employee numbers, phone numbers and
    ward numbers. I find Occam's Razor to be efficient in such cases.

    --
    Mark McIntyre
    CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
    CLC readme: <http://www.angelfire.c om/ms3/bchambless0/welcome_to_clc. html>


    ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

    Comment

    • David Harmon

      #17
      Re: How to Parse a CSV formatted text file

      On Sun, 08 Feb 2004 14:43:05 GMT in comp.lang.c++, Joe Wright
      <joewwright@ear thlink.net> was alleged to have written:[color=blue]
      >I don't know of a 'Standard' defining .csv but this is normal output
      >from Visual FoxPro..
      >
      >first,last
      >"Mac "The Knife" Peter","Boswell , Jr."
      >
      >But strangely, Excel reads it back wrong.[/color]

      Excel expects quotes within the field to be doubled. In fact, I would
      go so far as to say FoxPro is wrong.

      More lenient parsing would recognize a quote not followed by a comma or
      newline as contained within the field. This creates some ambiguities,
      since quoted fields can also contain newline.

      There is no standard, but see http://www.wotsit.org/download.asp?f=csv

      Comment

      • Gordon Burditt

        #18
        Re: How to Parse a CSV formatted text file

        > I have a text file which have data in CSV format.

        What *IS* CSV format? The following "definition by example"
        isn't very complete.
        [color=blue]
        >"empno","phone number","wardnu mber"
        >12345,2234353, 1000202
        >12326,2243653, 1000098[/color]

        Your examples do not handle the corner cases where a string
        contains commas, quotes, and/or newlines. If your definition
        introduces an "escape" character, also worry about strings
        consisting of several of those characters. Also, can single
        quotes be used in place of double quotes? Can a single quote
        match a double quote or vice versa?

        Also it isn't explained what isn't a valid CSV format. How
        about these:

        ,,,,,,,,,,,,,,, ,,,,,
        ,"""""""""""""" """"""""""" "",
        ,"""""""""""""" """"""""""" """,
        ,"""""""""""""" """"""""""""""" ,
        "\\\\\\\\\\\\\\ \\\"
        "\\\\\\\\\\\\\\ \\\\"
        "\\\\\\\\\\\\\\ \\\\\"
        """"""""""""""" """"""""
        """"""""""""""" """""""""
        """"""""""""""" """"""""""
        """"""""""""""" """""""""""

        Gordon L. Burditt

        Comment

        • Dietmar Kuehl

          #19
          Re: How to Parse a CSV formatted text file

          Since this is mostly about C++ I removed crossposting to non-C++ groups.

          bartek <bartekd@qwerty uiop.o2.pl> wrote:[color=blue]
          > ram_laxman@indi a.com (Ram Laxman) wrote in
          > news:24812e22.0 402070939.27b82 bba@posting.goo gle.com:[color=green]
          > > Iam a beginner of C/C++ programming.[/color][/color]
          [color=blue]
          > Check out the amazing Spirit framework.[/color]

          It took me something like half a day to make any sense at all out of the
          indeed amazing Spirit framework - and I personally don't consider myself
          really a beginner of C/C++ programming... Of course, we might consider
          providing a full-fledged solution using Spirit as this would certainly
          disqualify as a potential solution to a homework assignment.
          --
          <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
          Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

          Comment

          • Dietmar Kuehl

            #20
            Re: How to Parse a CSV formatted text file

            Joe Wright <joewwright@ear thlink.net> wrote:[color=blue]
            > I don't know of a 'Standard' defining .csv but this is normal output
            > from Visual FoxPro..
            >
            > first,last
            > "Mac "The Knife" Peter","Boswell , Jr."
            >
            > But strangely, Excel reads it back wrong. Go figure.
            > "Failure is not an option. With M$ it is bundled with every package."[/color]

            So, you are saying this is not at all a homework assignment but rather a
            request from a Microsoft engineer asking for correct code dealing with
            their files?
            --
            <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
            Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.co m/>

            Comment

            • Joe Wright

              #21
              Re: How to Parse a CSV formatted text file

              Dietmar Kuehl wrote:[color=blue]
              >
              > Joe Wright <joewwright@ear thlink.net> wrote:[color=green]
              > > I don't know of a 'Standard' defining .csv but this is normal output
              > > from Visual FoxPro..
              > >
              > > first,last
              > > "Mac "The Knife" Peter","Boswell , Jr."
              > >
              > > But strangely, Excel reads it back wrong. Go figure.
              > > "Failure is not an option. With M$ it is bundled with every package."[/color]
              >
              > So, you are saying this is not at all a homework assignment but rather a
              > request from a Microsoft engineer asking for correct code dealing with
              > their files?[/color]

              I'm not sure I follow you. It's certainly not my homework assignment.
              --
              Joe Wright http://www.jw-wright.com
              "Everything should be made as simple as possible, but not simpler."
              --- Albert Einstein ---

              Comment

              • Phlip

                #22
                Re: How to Parse a CSV formatted text file

                Joe Wright wrote:
                [color=blue]
                > I'm not sure I follow you. It's certainly not my homework assignment.[/color]

                The question is not whether it's homework (or whether a very similar
                question arrived within a few hours).

                The question is whether the group is asked to do someone's learning for
                them.

                --
                Phlip



                Comment

                • Programmer Dude

                  #23
                  Re: How to Parse a CSV formatted text file

                  David Harmon wrote:
                  [color=blue][color=green]
                  >> "Mac "The Knife" Peter","Boswell , Jr."
                  >>
                  >> But strangely, Excel reads it back wrong.[/color]
                  >
                  > Excel expects quotes within the field to be doubled. In fact, I
                  > would go so far as to say FoxPro is wrong.[/color]

                  I would agree FoxPro is wrong. It appears to require (based on the
                  spec presented upthread), seeing the two-character sequence (",) or
                  the two-character sequence ("<newline>) . That is, if the field
                  started with the (") character.

                  I might look for the three-character sequence (",") (or "<newline>) ,
                  but I still think this is a broken spec. Without being able to
                  escape the double-quote, you simply can't guarentee that there isn't
                  a valid delimiter sequence instring.

                  Also, this spec requires control of the CSV *emitter* (which, to me,
                  lacks robustness). The spec requires the writer of the values be
                  sure to not include spaces--in this case, between the final quote
                  and the comma. I'd rather a CSV reader that can handle:

                  " " , foobar , 42 , "Hello, World!" ,, , "Jonas ""J"" Jamison",

                  Without worrying about padding spaces around the commas.

                  What's maybe more an issue is how quotes are escaped. One standard
                  (used by MS and others) is doubling the quote. The other common one
                  uses an "escape char", such as the backslash. A *really* good CSV
                  parser should, IMO, detect both *AND* allow for single-quoting as
                  well as double-quoting.

                  --
                  |_ CJSonnack <Chris@Sonnack. com> _____________| How's my programming? |
                  |_ http://www.Sonnack.com/ _______________ ____| Call: 1-800-DEV-NULL |
                  |______________ _______________ _______________ _|_____________ __________|

                  Comment

                  Working...