Pythonic use of CSV module to skip headers?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ramon Felciano

    Pythonic use of CSV module to skip headers?

    Hi --

    I'm using the csv module to parse a tab-delimited file and wondered
    whether there was a more elegant way to skip an possible header line.
    I'm doing

    line = 0
    reader = csv.reader(file (filename))
    for row in reader:
    if (ignoreFirstLin e & line == 0):
    continue
    line = line+1
    # do something with row

    The only thing I could think of was to specialize the default reader
    class with an extra skipHeaderLine constructor parameter so that its
    next() method can skip the first line appropriate. Is there any other
    cleaner way to do it w/out subclassing the stdlib?

    Thanks!

    Ramon
  • Steve Holden

    #2
    Re: Pythonic use of CSV module to skip headers?

    Ramon Felciano wrote:
    [color=blue]
    > Hi --
    >
    > I'm using the csv module to parse a tab-delimited file and wondered
    > whether there was a more elegant way to skip an possible header line.
    > I'm doing
    >
    > line = 0
    > reader = csv.reader(file (filename))
    > for row in reader:
    > if (ignoreFirstLin e & line == 0):
    > continue
    > line = line+1
    > # do something with row
    >
    > The only thing I could think of was to specialize the default reader
    > class with an extra skipHeaderLine constructor parameter so that its
    > next() method can skip the first line appropriate. Is there any other
    > cleaner way to do it w/out subclassing the stdlib?
    >
    > Thanks!
    >
    > Ramon[/color]

    How about

    line = 0
    reader = csv.reader(file (filename))
    headerline = reader.next()
    for row in reader:
    line = line+1
    # do something with row

    regards
    Steve
    --


    Holden Web LLC +1 800 494 3119

    Comment

    • Marc 'BlackJack' Rintsch

      #3
      Re: Pythonic use of CSV module to skip headers?

      In <76c29906.04120 21521.64ea904f@ posting.google. com>, Ramon Felciano
      wrote:
      [color=blue]
      > Hi --
      >
      > I'm using the csv module to parse a tab-delimited file and wondered
      > whether there was a more elegant way to skip an possible header line.
      > I'm doing
      >
      > line = 0
      > reader = csv.reader(file (filename))
      > for row in reader:
      > if (ignoreFirstLin e & line == 0):
      > continue
      > line = line+1
      > # do something with row[/color]

      What about:

      reader = csv.reader(file (filename))
      reader.next() # Skip header line.
      for row in reader:
      # do something with row

      Ciao,
      Marc 'BlackJack' Rintsch

      Comment

      • Peter Otten

        #4
        Re: Pythonic use of CSV module to skip headers?

        Ramon Felciano wrote:
        [color=blue]
        > I'm using the csv module to parse a tab-delimited file and wondered
        > whether there was a more elegant way to skip an possible header line.
        > I'm doing
        >
        > line = 0
        > reader = csv.reader(file (filename))
        > for row in reader:
        > if (ignoreFirstLin e & line == 0):
        > continue
        > line = line+1
        > # do something with row
        >
        > The only thing I could think of was to specialize the default reader
        > class with an extra skipHeaderLine constructor parameter so that its
        > next() method can skip the first line appropriate. Is there any other
        > cleaner way to do it w/out subclassing the stdlib?[/color]
        [color=blue][color=green][color=darkred]
        >>> import csv
        >>> f = file("tmp.csv")
        >>> f.next()[/color][/color][/color]
        '# header\n'[color=blue][color=green][color=darkred]
        >>> for row in csv.reader(f):[/color][/color][/color]
        .... print row
        ....
        ['a', 'b', 'c']
        ['1', '2', '3']

        This way the reader need not mess with the header at all.

        Peter

        Comment

        • Peter Otten

          #5
          Re: Pythonic use of CSV module to skip headers?

          Ramon Felciano wrote:
          [color=blue]
          > I'm using the csv module to parse a tab-delimited file and wondered
          > whether there was a more elegant way to skip an possible header line.
          > I'm doing
          >
          > line = 0
          > reader = csv.reader(file (filename))
          > for row in reader:
          > if (ignoreFirstLin e & line == 0):
          > continue
          > line = line+1
          > # do something with row
          >
          > The only thing I could think of was to specialize the default reader
          > class with an extra skipHeaderLine constructor parameter so that its
          > next() method can skip the first line appropriate. Is there any other
          > cleaner way to do it w/out subclassing the stdlib?[/color]
          [color=blue][color=green][color=darkred]
          >>> import csv
          >>> f = file("tmp.csv")
          >>> f.next()[/color][/color][/color]
          '# header\n'[color=blue][color=green][color=darkred]
          >>> for row in csv.reader(f):[/color][/color][/color]
          .... print row
          ....
          ['a', 'b', 'c']
          ['1', '2', '3']

          This way the reader need not mess with the header at all.

          Peter

          Comment

          • Skip Montanaro

            #6
            Re: Pythonic use of CSV module to skip headers?


            Ramon> I'm using the csv module to parse a tab-delimited file and
            Ramon> wondered whether there was a more elegant way to skip an possible
            Ramon> header line.

            Assuming the header line has descriptive titles, I prefer the DictReader
            class. Unfortunately, it requires you to specify the titles in its
            constructor. My usual idiom is the following:

            f = open(filename, "rb") # don't forget the 'b'!
            reader = csv.reader(f)
            titles = reader.next()
            reader = csv.DictReader( f, titles)
            for row in reader:
            ...

            The advantage of the DictReader class is that you get dictionaries keyed by
            the titles instead of tuples. The code to manipulate them is more readable
            and insensitive to changes in the order of the columns. On the down side,
            if the titles aren't always named the same you lose.

            Skip

            Comment

            • Michael Hoffman

              #7
              Re: Pythonic use of CSV module to skip headers?

              Skip Montanaro wrote:
              [color=blue]
              > Assuming the header line has descriptive titles, I prefer the DictReader
              > class. Unfortunately, it requires you to specify the titles in its
              > constructor. My usual idiom is the following:[/color]

              I deal so much with tab-delimited CSV files that I found it useful to
              create a subclass of csv.DictReader to deal with this, so I can just write:

              for row in tabdelim.DictRe ader(file(filen ame)):
              ...

              I think this is a lot easier than trying to remember this cumbersome
              idiom every single time.
              --
              Michael Hoffman

              Comment

              • Nick Coghlan

                #8
                Re: Pythonic use of CSV module to skip headers?

                Michael Hoffman wrote:[color=blue]
                > I deal so much with tab-delimited CSV files that I found it useful to
                > create a subclass of csv.DictReader to deal with this, so I can just write:
                >
                > for row in tabdelim.DictRe ader(file(filen ame)):
                > ...
                >
                > I think this is a lot easier than trying to remember this cumbersome
                > idiom every single time.[/color]

                Python 2.4 makes the fieldnames paramter optional:
                "If the fieldnames parameter is omitted, the values in the first row of the
                csvfile will be used as the fieldnames."

                i.e. the following should work fine in 2.4:

                for row in csv.DictReader( file(filename)) :
                print sorted(row.item s())

                Cheers,
                Nick.

                Comment

                • Skip Montanaro

                  #9
                  Re: Pythonic use of CSV module to skip headers?

                  [color=blue][color=green]
                  >> Assuming the header line has descriptive titles, I prefer the
                  >> DictReader class. Unfortunately, it requires you to specify the
                  >> titles in its constructor. My usual idiom is the following:[/color][/color]

                  Michael> I deal so much with tab-delimited CSV files that I found it
                  Michael> useful to create a subclass of csv.DictReader to deal with
                  Michael> this, so I can just write:

                  Michael> for row in tabdelim.DictRe ader(file(filen ame)):
                  Michael> ...

                  Michael> I think this is a lot easier than trying to remember this
                  Michael> cumbersome idiom every single time.

                  I'm not sure what the use of TABs as delimiters has to do with the OP's
                  problem. In my example I flubbed and failed to specify the delimiter to the
                  constructors (comma is the default delimiter).

                  You can create a subclass of DictReader that plucks the first line out as a
                  set of titles:

                  class SmartDictReader (csv.DictReader ):
                  def __init__(self, f, *args, **kwds):
                  rdr = csv.reader(*arg s, **kwds)
                  titles = rdr.next()
                  csv.DictReader. __init__(self, f, titles, *args, **kwds)

                  Is that what you were suggesting? I don't find the couple extra lines of
                  code in my original example all that cumbersome to type though.

                  Skip

                  Comment

                  • Michael Hoffman

                    #10
                    Re: Pythonic use of CSV module to skip headers?

                    Skip Montanaro wrote:[color=blue]
                    > I'm not sure what the use of TABs as delimiters has to do with the OP's
                    > problem.[/color]

                    Not much. :) I just happen to use tabs more often than commas, so my
                    subclass defaults to
                    [color=blue]
                    > You can create a subclass of DictReader that plucks the first line out as a
                    > set of titles:
                    >
                    > class SmartDictReader (csv.DictReader ):
                    > def __init__(self, f, *args, **kwds):
                    > rdr = csv.reader(*arg s, **kwds)
                    > titles = rdr.next()
                    > csv.DictReader. __init__(self, f, titles, *args, **kwds)
                    >
                    > Is that what you were suggesting?[/color]

                    Exactly.
                    [color=blue]
                    > I don't find the couple extra lines of
                    > code in my original example all that cumbersome to type though.[/color]

                    If you started about half of the programs you write with those extra
                    lines, you might <wink>. I'm a strong believer in OnceAndOnlyOnce .

                    Thanks to Nick Coghlan for pointing out that I no longer need do this in
                    Python 2.4.
                    --
                    Michael Hoffman

                    Comment

                    • Skip Montanaro

                      #11
                      Re: Pythonic use of CSV module to skip headers?

                      [color=blue][color=green]
                      >> I don't find the couple extra lines of code in my original example
                      >> all that cumbersome to type though.[/color][/color]

                      Michael> If you started about half of the programs you write with those
                      Michael> extra lines, you might <wink>. I'm a strong believer in
                      Michael> OnceAndOnlyOnce .

                      You're right of course. I do use csv a lot, but only from a couple
                      specialized programs.

                      Skip

                      Comment

                      Working...