split CSV fields

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • robert

    split CSV fields

    What is a most simple expression for splitting a CSV line with "-protected fields?

    s='"123","a,b,\ "c\"",5.640 '
  • irfan.habib@gmail.com

    #2
    Re: split CSV fields


    s.split(',');
    robert wrote:
    What is a most simple expression for splitting a CSV line with "-protected fields?
    >
    s='"123","a,b,\ "c\"",5.640 '

    Comment

    • Fredrik Lundh

      #3
      Re: split CSV fields

      robert wrote:
      What is a most simple expression for splitting a CSV line
      with "-protected fields?
      >
      s='"123","a,b,\ "c\"",5.640 '
      import csv

      the preferred way is to read the file using that module. if you insist
      on processing a single line, you can do

      cols = list(csv.reader ([string]))

      </F>

      Comment

      • Diez B. Roggisch

        #4
        Re: split CSV fields

        robert wrote:
        What is a most simple expression for splitting a CSV line with "-protected
        fields?
        >
        s='"123","a,b,\ "c\"",5.640 '
        Use the csv-module. It should have a dialect for this, albeit I'm not 100%
        sure if the escaping of the " is done properly from csv POV. Might be that
        it requires excel-standard.

        Diez

        Comment

        • Peter Otten

          #5
          Re: split CSV fields

          robert wrote:
          What is a most simple expression for splitting a CSV line with "-protected
          fields?
          >
          s='"123","a,b,\ "c\"",5.640 '
          >>import csv
          >>class mydialect(csv.e xcel):
          .... escapechar = "\\"
          ....
          >>csv.reader(['"123","a,b,\\" c\\"",5.640'], dialect=mydiale ct).next()
          ['123', 'a,b,"c"', '5.640']

          Peter

          Comment

          • John Machin

            #6
            Re: split CSV fields

            Fredrik Lundh wrote:
            robert wrote:
            >
            What is a most simple expression for splitting a CSV line
            with "-protected fields?

            s='"123","a,b,\ "c\"",5.640 '
            >
            import csv
            >
            the preferred way is to read the file using that module. if you insist
            on processing a single line, you can do
            >
            cols = list(csv.reader ([string]))
            >
            </F>
            Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
            (Intel)] on win32
            | >>import csv
            | >>s='"123","a,b ,\"c\"",5.640 '
            | >>cols = list(csv.reader ([s]))
            | >>cols
            [['123', 'a,b,c""', '5.640']]
            # maybe we need a bit more:
            | >>cols = list(csv.reader ([s]))[0]
            | >>cols
            ['123', 'a,b,c""', '5.640']

            I'd guess that the OP is expecting 'a,b,"c"' for the second field.

            Twiddling with the knobs doesn't appear to help:

            | >>list(csv.read er([s], escapechar='\\' ))[0]
            ['123', 'a,b,c""', '5.640']
            | >>list(csv.read er([s], escapechar='\\' , doublequote=Fal se))[0]
            ['123', 'a,b,c""', '5.640']

            Looks like a bug to me; AFAICT from the docs, the last attempt should
            have worked.

            Cheers,
            John

            Comment

            • John Machin

              #7
              Re: split CSV fields

              John Machin wrote:
              Fredrik Lundh wrote:
              robert wrote:
              What is a most simple expression for splitting a CSV line
              with "-protected fields?
              >
              s='"123","a,b,\ "c\"",5.640 '
              import csv

              the preferred way is to read the file using that module. if you insist
              on processing a single line, you can do

              cols = list(csv.reader ([string]))

              </F>
              >
              Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
              (Intel)] on win32
              | >>import csv
              | >>s='"123","a,b ,\"c\"",5.640 '
              | >>cols = list(csv.reader ([s]))
              | >>cols
              [['123', 'a,b,c""', '5.640']]
              # maybe we need a bit more:
              | >>cols = list(csv.reader ([s]))[0]
              | >>cols
              ['123', 'a,b,c""', '5.640']
              >
              I'd guess that the OP is expecting 'a,b,"c"' for the second field.
              >
              Twiddling with the knobs doesn't appear to help:
              >
              | >>list(csv.read er([s], escapechar='\\' ))[0]
              ['123', 'a,b,c""', '5.640']
              | >>list(csv.read er([s], escapechar='\\' , doublequote=Fal se))[0]
              ['123', 'a,b,c""', '5.640']
              >
              Looks like a bug to me; AFAICT from the docs, the last attempt should
              have worked.
              Given Peter Otten's post, looks like
              (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
              escapechar in my first twiddle, which should give the same result as
              Peter's.
              (2)
              | >>csv.excel.dou blequote
              True
              According to my reading of the docs:
              """
              doublequote
              Controls how instances of quotechar appearing inside a field should be
              themselves be quoted. When True, the character is doubled. When False,
              the escapechar is used as a prefix to the quotechar. It defaults to
              True.
              """
              Peter's example should not have worked.

              Comment

              • Fredrik Lundh

                #8
                Re: split CSV fields

                John Machin wrote:
                Given Peter Otten's post, looks like
                (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
                escapechar in my first twiddle, which should give the same result as
                Peter's.
                (2)
                | >>csv.excel.dou blequote
                True
                According to my reading of the docs:
                """
                doublequote
                Controls how instances of quotechar appearing inside a field should be
                themselves be quoted. When True, the character is doubled. When False,
                the escapechar is used as a prefix to the quotechar. It defaults to
                True.
                """
                Peter's example should not have worked.
                the documentation also mentions a "quoting" parameter that "controls
                when quotes should be generated by the writer and recognised by the
                reader.". not sure how that changes things.

                anyway, it's either unclear documentation or a bug in the code. better
                submit a bug report so someone can fix one of them.

                </F>

                Comment

                • John Machin

                  #9
                  Re: split CSV fields


                  John Machin wrote:
                  John Machin wrote:
                  Fredrik Lundh wrote:
                  robert wrote:
                  >
                  What is a most simple expression for splitting a CSV line
                  with "-protected fields?

                  s='"123","a,b,\ "c\"",5.640 '
                  >
                  import csv
                  >
                  the preferred way is to read the file using that module. if you insist
                  on processing a single line, you can do
                  >
                  cols = list(csv.reader ([string]))
                  >
                  </F>
                  Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
                  (Intel)] on win32
                  | >>import csv
                  | >>s='"123","a,b ,\"c\"",5.640 '
                  | >>cols = list(csv.reader ([s]))
                  | >>cols
                  [['123', 'a,b,c""', '5.640']]
                  # maybe we need a bit more:
                  | >>cols = list(csv.reader ([s]))[0]
                  | >>cols
                  ['123', 'a,b,c""', '5.640']

                  I'd guess that the OP is expecting 'a,b,"c"' for the second field.

                  Twiddling with the knobs doesn't appear to help:

                  | >>list(csv.read er([s], escapechar='\\' ))[0]
                  ['123', 'a,b,c""', '5.640']
                  | >>list(csv.read er([s], escapechar='\\' , doublequote=Fal se))[0]
                  ['123', 'a,b,c""', '5.640']

                  Looks like a bug to me; AFAICT from the docs, the last attempt should
                  have worked.
                  >
                  Given Peter Otten's post, looks like
                  (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
                  escapechar in my first twiddle, which should give the same result as
                  Peter's.
                  (2)
                  | >>csv.excel.dou blequote
                  True
                  According to my reading of the docs:
                  """
                  doublequote
                  Controls how instances of quotechar appearing inside a field should be
                  themselves be quoted. When True, the character is doubled. When False,
                  the escapechar is used as a prefix to the quotechar. It defaults to
                  True.
                  """
                  Peter's example should not have worked.
                  Doh. The OP's string was a raw string. I need some sleep.
                  Scrap bug #1!

                  | >>s=r'"123","a, b,\"c\"",5.640 '
                  | >>list(csv.read er([s]))[0]
                  ['123', 'a,b,\\c\\""', '5.640']
                  # What's that???
                  | >>list(csv.read er([s], escapechar='\\' ))[0]
                  ['123', 'a,b,"c"', '5.640']
                  | >>list(csv.read er([s], escapechar='\\' , doublequote=Fal se))[0]
                  ['123', 'a,b,"c"', '5.640']

                  And there's still the problem with doublequote ....

                  Goodnight ...

                  Comment

                  • Peter Otten

                    #10
                    Re: split CSV fields

                    John Machin wrote:
                    | >>s='"123","a,b ,\"c\"",5.640 '
                    Note how I fixed the input:
                    >>'"123","a,b,\ "c\"",5.640 '
                    '"123","a,b,"c" ",5.640'
                    >>'"123","a,b,\ \"c\\"",5.64 0'
                    '"123","a,b,\\" c\\"",5.640'

                    Peter

                    Comment

                    • John Machin

                      #11
                      Re: split CSV fields


                      Fredrik Lundh wrote:
                      John Machin wrote:
                      >
                      Given Peter Otten's post, looks like
                      (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
                      escapechar in my first twiddle, which should give the same result as
                      Peter's.
                      (2)
                      | >>csv.excel.dou blequote
                      True
                      According to my reading of the docs:
                      """
                      doublequote
                      Controls how instances of quotechar appearing inside a field should be
                      themselves be quoted. When True, the character is doubled. When False,
                      the escapechar is used as a prefix to the quotechar. It defaults to
                      True.
                      """
                      Peter's example should not have worked.
                      >
                      the documentation also mentions a "quoting" parameter that "controls
                      when quotes should be generated by the writer and recognised by the
                      reader.". not sure how that changes things.
                      Hi Fredrik, I read that carefully -- "quoting" appears to have no
                      effect in this situation.
                      >
                      anyway, it's either unclear documentation or a bug in the code. better
                      submit a bug report so someone can fix one of them.
                      Tomorrow :-)
                      Cheers,
                      John

                      Comment

                      Working...