finding out the number of rows in a CSV file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Fredrik Lundh

    #16
    Re: finding out the number of rows in a CSV file [Resolved]

    John S wrote:
    >>after reading the file throughthe csv.reader for the length I cannot
    >>iterate over the rows. How do I reset the row iterator?
    >
    A CSV file is just a text file. Don't use csv.reader for counting rows
    -- it's overkill. You can just read the file normally, counting lines
    (lines == rows).
    $ more sample.csv
    "Except
    when it
    isn't."
    >>import csv
    >>len(list(csv. reader(open('sa mple.csv'))))
    1
    >>len(list(open ('sample.csv')) )
    3

    </F>

    Comment

    • norseman

      #17
      Re: finding out the number of rows in a CSV file [Resolved]

      Peter Otten wrote:
      John S wrote:
      >
      >[OP] Jon Clements wrote:
      >>On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@g mail.comwrote:
      >>>after reading the file throughthe csv.reader for the length I cannot
      >>>iterate over the rows. How do I reset the row iterator?
      >A CSV file is just a text file. Don't use csv.reader for counting rows
      >-- it's overkill. You can just read the file normally, counting lines
      >(lines == rows).
      >
      Wrong. A field may have embedded newlines:
      >
      >>>import csv
      >>>csv.writer(o pen("tmp.csv", "w")).write row(["a" + "\n"*10 + "b"])
      >>>sum(1 for row in csv.reader(open ("tmp.csv")) )
      1
      >>>sum(1 for line in open("tmp.csv") )
      11
      >
      Peter
      --

      >
      =============== ==============
      Well..... a semantics's problem here.


      A blank line is just an EOL by its self. Yes.
      I may want to count these. Could be indicative of a problem.
      Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
      counting blanks and still avoids tossing, re-opening etc...

      Again - it's how you look at it, but I don't want EOLs in my dbase
      fields. csv was designed to 'dump' data base fields into text for those
      not affording a data base program and/or to convert between data base
      programs. By the way - has anyone seen a good spread sheet dumper? One
      that dumps the underlying formulas and such along with the display
      value? That would greatly facilitate portability, wouldn't it? (Yeah -
      the receiving would have to be able to read it. But it would be a start
      - yes?) Everyone got the point? Just because it gets abused doesn't
      mean .... Are we back on track? Number of lines equals number of
      reads - which is what was requested. No bytes magically disappearing. No
      slight of hand, no one dictating how to or what with ....

      The good part is everyone who reads this now knows two ways to approach
      the problem and the pros/cons of each. No loosers.



      Steve
      norseman@hughes .net

      Comment

      • John Machin

        #18
        Re: finding out the number of rows in a CSV file [Resolved]

        On Aug 28, 7:51 am, norseman <norse...@hughe s.netwrote:
        Peter Otten wrote:
        John S wrote:
        >
        [OP] Jon Clements wrote:
        >On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@g mail.comwrote:
        >>after reading the file throughthe csv.reader for the length I cannot
        >>iterate over the rows. How do I reset the row iterator?
        A CSV file is just a text file. Don't use csv.reader for counting rows
        -- it's overkill. You can just read the file normally, counting lines
        (lines == rows).
        >
        Wrong. A field may have embedded newlines:
        >
        >>import csv
        >>csv.writer(op en("tmp.csv", "w")).write row(["a" + "\n"*10 + "b"])
        >>sum(1 for row in csv.reader(open ("tmp.csv")) )
        1
        >>sum(1 for line in open("tmp.csv") )
        11
        >>
        =============== ==============
        Well..... a semantics's problem here.
        >
        A blank line is just an EOL by its self. Yes.
        Or a line containing blanks. Yes what?
        I may want to count these. Could be indicative of a problem.
        If you use the csv module to read the file, a "blank line" will come
        out as a row with one field, the contents of which you can check.
        Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
        counting blanks and still avoids tossing, re-opening etc...
        What is "tossing", apart from the English slang meaning?
        What re-opening?
        >
        Again - it's how you look at it, but I don't want EOLs in my dbase
        fields.
        <rant>
        Most people don't want them, but many do have them, as well as Ctrl-Zs
        and NBSPs and dial-up line noise (and umlauts/accents/suchlike
        inserted by the temporarily-employed backpacker to ensure that her
        compatriots' names and addresses were spelled properly) ... and the IT
        department fervently believes the content is ASCII even though they
        have done absolutely SFA to ensure that.
        </rant>
        csv was designed to 'dump' data base fields into text for those
        not affording a data base program and/or to convert between data base
        programs. By the way - has anyone seen a good spread sheet dumper? One
        that dumps the underlying formulas and such along with the display
        value? That would greatly facilitate portability, wouldn't it? (Yeah -
        the receiving would have to be able to read it. But it would be a start
        - yes?) Everyone got the point? Just because it gets abused doesn't
        mean .... Are we back on track? Number of lines equals number of
        reads - which is what was requested. No bytes magically disappearing. No
        slight of hand, no one dictating how to or what with ....
        >
        The good part is everyone who reads this now knows two ways to approach
        the problem and the pros/cons of each. No loosers.
        IMHO it is very hard to discern from all that ramble what the alleged
        problem is, let alone what are the ways to approach it.

        Comment

        Working...