repeated delimiters with cvs.DictReader

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • monniewolf
    New Member
    • Mar 2010
    • 3

    repeated delimiters with cvs.DictReader

    Hi All,
    Just a quick question. I have some text files that are space-delimited. Some of the columns in this file have been padded with zeros so that when you open the text file in a simple text editor, the columns all line up and it is easy to read. I am using csv,DictReader to read in the files, so that it automatically generates a dictionary based off the first row. The problem is that when I read this file in with csv.DictReader, I end up with a lot of blank columns.

    here is my read-in line:
    isc = csv.DictReader( open(inisc), delimiter=' ')

    Is there a way to specify that if python encounters one or more spaces in a row, that they should be treated as 1 delimiter rather than multiple ones?

    I tried:
    isc = csv.DictReader( open(inisc), delimiter=' '+)
    and some similar variations (though I do not remember them all now) to no avail.

    If I alter the input text files to remove the padding and have only one space between each column, this works great. I can stick with this method, it is just that for readability of the text files (which can be quite big), the padding is nice. These files are used for other purposes than just input to my code, so if there is a way to keep the padding and make csv.DictReader happy, that would be best.

    Thanks,
    Monica
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Monica,

    Can you post a sample of the text file? You could parse the file without the csv module if the file is consistently formatted.

    BV

    Comment

    • monniewolf
      New Member
      • Mar 2010
      • 3

      #3
      Hi BV,
      Attached is a small sample of one of the text files (I tried copying and pasting it below, but it actually messed up the format). The problem is that some of the columns contain position data (lat, long). In some cases, the lat and long are negative, and in others they are positive. This means that not only does the number of units in each entry change as position changes, but also some entries have a '-' in front of them. These entries are padded so they remain aligned. Unfortunately, this means the format is not consistent through out the file.

      You will also see that the month, day, hour, min, sec, columns are not consistent. I have padded these with zeros, so they now follow the more standard yyyy, mm, dd, hh, mm, ss.ss format, so these columns are no longer a problem. Padding the position field with zeros as well solves my delimiter problem while keeping the fields aligned, however, it looks horrible and makes the positions hard to read.
      Attached Files

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        This code will create a dictionary with line 1 as the keys and the columnar data as the values:
        Code:
        fn = 'ex_text-2.txt'
        f = open(fn)
        
        labels = [item for item in f.readline().strip().split() if item]
        
        dd = {}
        
        for line in f:
            # strip new line character
            line = line.strip()
            # in case a blank line is encountered
            if line:
                lineList = [item for item in line.split() if item]
                for i, item in enumerate(lineList):
                    dd.setdefault(labels[i], []).append(item)
        
        f.close()
        To print the results:
        Code:
        for label in labels:
            print "%s: %s" % (label, dd[label])
        As you can see, it is very straightforward to read a formatted file into a list or dictionary. I used a list comprehension to eliminate the empty items.

        Comment

        • monniewolf
          New Member
          • Mar 2010
          • 3

          #5
          Thanks bvdet!
          This does seem to read in everything from the text files directly, regardless of the white-space padding.

          Now I just need to figure out how to integrate it into the rest of my code so that the data is written to the SQL database correctly. I am a newbie to Python, so it sometimes takes me a while to figure out how I need to proceed. If I can though, I want to figure out how to tweak my code myself to use your solution, since that is how i will learn the most.

          Comment

          Working...