CSV module: incorrectly parsed file.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Christopher Barrington-Leigh

    CSV module: incorrectly parsed file.

    Here is a file "test.csv"
    number,name,des cription,value
    1,"wer","tape 2"",5
    1,vvv,"hoohaa", 2

    I want to convert it to tab-separated without those silly quotes. Note
    in the second line that a field is 'tape 2"' , ie two inches: there is
    a double quote in the string.

    When I use csv module to read this:


    import sys
    outf=open(sys.a rgv[1]+'.tsv','wt')
    import csv
    reader=csv.read er(open(sys.arg v[1], "rb"))
    for row in reader:
    outf.write('\t' .join([rr.strip() for rr in row]) +'\n')


    it mangles it, messing up the double double-quote.
    Can anyone help me? How do I use CSV to get it right?
    Tjhanks!
    c
  • 7stud

    #2
    Re: CSV module: incorrectly parsed file.

    On Feb 17, 7:09 pm, Christopher Barrington-Leigh
    <christophe...@ gmail.comwrote:
    Here is a file "test.csv"
    number,name,des cription,value
    1,"wer","tape 2"",5
    1,vvv,"hoohaa", 2
    >
    I want to convert it to tab-separated without those silly quotes. Note
    in the second line that a field is 'tape 2"' , ie two inches: there is
    a double quote in the string.
    >
    When I use csv module to read this:
    >
    import sys
    outf=open(sys.a rgv[1]+'.tsv','wt')
    import csv
    reader=csv.read er(open(sys.arg v[1], "rb"))
    for row in reader:
        outf.write('\t' .join([rr.strip() for rr in row]) +'\n')
    >
    it mangles it, messing up the double double-quote.
    Can anyone help me? How do I use CSV to get it right?
    Tjhanks!
    c

    Try this:

    infile = open('data.txt' )
    outfile = open('outfile.t xt', 'w')

    for line in infile:
    pieces = line.strip().sp lit(',')

    data = []
    for piece in pieces:
    if piece[0] == '"':
    data.append(pie ce[1:-2])
    else:
    data.append(pie ce)

    out_line = '%s\n' % '\t'.join(data)
    outfile.write(o ut_line)

    Comment

    • 7stud

      #3
      Re: CSV module: incorrectly parsed file.

      On Feb 17, 9:11 pm, 7stud <bbxx789_0...@y ahoo.comwrote:
      On Feb 17, 7:09 pm, Christopher Barrington-Leigh
      >
      >
      >
      <christophe...@ gmail.comwrote:
      Here is a file "test.csv"
      number,name,des cription,value
      1,"wer","tape 2"",5
      1,vvv,"hoohaa", 2
      >
      I want to convert it to tab-separated without those silly quotes. Note
      in the second line that a field is 'tape 2"' , ie two inches: there is
      a double quote in the string.
      >
      When I use csv module to read this:
      >
      import sys
      outf=open(sys.a rgv[1]+'.tsv','wt')
      import csv
      reader=csv.read er(open(sys.arg v[1], "rb"))
      for row in reader:
          outf.write('\t' .join([rr.strip() for rr in row]) +'\n')
      >
      it mangles it, messing up the double double-quote.
      Can anyone help me? How do I use CSV to get it right?
      Tjhanks!
      c
      >
      Try this:
      >
      infile = open('data.txt' )
      outfile = open('outfile.t xt', 'w')
      >
      for line in infile:
          pieces = line.strip().sp lit(',')
      >
          data = []
          for piece in pieces:
              if piece[0] == '"':
                  data.append(pie ce[1:-2])
              else:
                  data.append(pie ce)
      >
          out_line = '%s\n' % '\t'.join(data)
          outfile.write(o ut_line)
      Whoops. The line:

      data.append(pie ce[1:-2])

      should be:

      data.append(pie ce[1:-1])

      Comment

      • Steve Holden

        #4
        Re: CSV module: incorrectly parsed file.

        7stud wrote:
        On Feb 17, 9:11 pm, 7stud <bbxx789_0...@y ahoo.comwrote:
        >On Feb 17, 7:09 pm, Christopher Barrington-Leigh
        >>
        >>
        >>
        ><christophe... @gmail.comwrote :
        >>Here is a file "test.csv"
        >>number,name,d escription,valu e
        >>1,"wer","ta pe 2"",5
        >>1,vvv,"hoohaa ",2
        >>I want to convert it to tab-separated without those silly quotes. Note
        >>in the second line that a field is 'tape 2"' , ie two inches: there is
        >>a double quote in the string.
        >>When I use csv module to read this:
        >>import sys
        >>outf=open(sys .argv[1]+'.tsv','wt')
        >>import csv
        >>reader=csv.re ader(open(sys.a rgv[1], "rb"))
        >>for row in reader:
        >> outf.write('\t' .join([rr.strip() for rr in row]) +'\n')
        >>it mangles it, messing up the double double-quote.
        >>Can anyone help me? How do I use CSV to get it right?
        >>Tjhanks!
        >>c
        >Try this:
        >>
        >infile = open('data.txt' )
        >outfile = open('outfile.t xt', 'w')
        >>
        >for line in infile:
        > pieces = line.strip().sp lit(',')
        >>
        > data = []
        > for piece in pieces:
        > if piece[0] == '"':
        > data.append(pie ce[1:-2])
        > else:
        > data.append(pie ce)
        >>
        > out_line = '%s\n' % '\t'.join(data)
        > outfile.write(o ut_line)
        >
        Whoops. The line:
        >
        data.append(pie ce[1:-2])
        >
        should be:
        >
        data.append(pie ce[1:-1])
        >
        Even when you have done all this you will still have problems. As Andrew
        pointed out the form is ambiguous, and you'd just better hope none of
        your data items look like

        Nails 2", soldiers for the use of

        because then you will be completely screwed. So there's a need for a
        certain amount of visual scrutiny of the data: I would definitely write
        a validation program first that tries to read the data and catches any
        exceptions like unmatched quotes or the wrong number of items in a line.
        If there aren't too many (and there usually aren't) just edit them out
        of your input data by hand.

        If this is to be a regular task then you'll have to program to recognize
        and correct the common error cases.

        regards
        Steve

        --
        Steve Holden +1 571 484 6266 +1 800 494 3119
        Holden Web LLC http://www.holdenweb.com/

        Comment

        Working...