csv Parser Question - Handling of Double Quotes

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jwbrown77@gmail.com

    csv Parser Question - Handling of Double Quotes

    Hello,

    I am trying to read a csv file. I have the following functioning
    code:

    ---- BEGIN ----
    import csv

    reader = csv.reader(open ("test.csv", "rb"), delimiter=';')

    for row in reader:
    print row
    ---- END ----

    This code will successfully parse my csv file formatted as such:

    "this";"is";"a" ;"test"

    Resulting in an output of:

    ['this', 'is', 'a', 'test']

    However, if I modify the csv to:

    "t"h"is";"is";" a";"test"

    The output changes to:

    ['th"is"', 'is', 'a', 'test']

    My question is, can you change the behavior of the parser to only
    remove quotes when they are next to the delimiter? I would like both
    quotes around the h in the example above to remain, however it is
    instead removing only the first two instances of quotes it runs across
    and leaves the others.

    The closest solution I have found is to add to the reader command
    "escapechar='\\ '" then manually add a single \ character before the
    quotes I'd like to keep. But instead of writing something to add
    those slashes before csv parsing I was wondering if the parser can
    handle it instead.

    Thanks in advance for the help.
  • Gabriel Genellina

    #2
    Re: csv Parser Question - Handling of Double Quotes

    En Thu, 27 Mar 2008 17:37:33 -0300, Aaron Watters
    <aaron.watters@ gmail.comescrib ió:
    >"this";"is";"a ";"test"
    >>
    >Resulting in an output of:
    >>
    >['this', 'is', 'a', 'test']
    >>
    >However, if I modify the csv to:
    >>
    >"t"h"is";"is"; "a";"test"
    >>
    >The output changes to:
    >>
    >['th"is"', 'is', 'a', 'test']
    >
    I'd be tempted to say that this is a bug,
    except that I think the definition of "csv" is
    informal, so the "bug/feature" distinction
    cannot be exactly defined, unless I'm mistaken.
    AFAIK, the csv module tries to mimic Excel behavior as close as possible.
    It has some test cases that look horrible, but that's what Excel does...
    I'd try actually using Excel to see what happens.
    Perhaps the behavior could be more configurable, like the codecs are.

    --
    Gabriel Genellina

    Comment

    • Aaron Watters

      #3
      Re: csv Parser Question - Handling of Double Quotes

      On Mar 27, 6:00 pm, John Machin <sjmac...@lexic on.netwrote:
      ...The Python csv module emulates Excel in delivering garbage silently in
      cases when the expected serialisation protocol has (detectably) not
      been followed....
      Fine, but I'd say the heuristic adopted produces
      bizarre and surprising results in the illustrated case.
      It's a matter of taste of course...
      -- Aaron Watters

      ===

      Comment

      Working...