CSV Problem using Python 2.6

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tran0191
    New Member
    • Mar 2010
    • 7

    CSV Problem using Python 2.6

    Hi, I am fairly new to python and I am doing a project where i need to read a CSV file and compare the columns from each row of a file. If the columns from one row has the same value as the column from another row then write those row's to the same file, if they are different then write them into different files.. so essentially any rows with the same column gets written to the same CSV file.


    Here is what I have so far in terms of code


    import csv

    file = csv.reader(open ('C:/NEW.CSV','r'), delimiter=',')
    first = csv.writer(open ('C:/First.CSV','w') )
    second = csv.writer(open ('C:/fir.CSV','w'))

    header = file.next() #ignore the header

    >>> temp = 0
    >>> for row in file:
    ...
    ... if temp == row[4]: #this is the column i need to compare for each row
    ... first.writerow( row)
    ... else:
    ... second.writerow (row)
    ...
    ... temp = row[4]
    ...

    It writes to a CSV file fine but it doesnt write the same rows to the same CSV.
    I know the problem is with my assiging temp equal to row[4]. I believe my indentation is wrong, and i tried everything, but when i even tried a simple program for ex:

    >>> x = 5
    >>> if x ==5:
    ... print 'x is 5'
    ... else:
    ... print 'x is not 5'
    ...
    ... print ' this should get printed regardless if x is 5 or not'
    Traceback ( File "<interacti ve input>", line 6
    print ' this should get printed regardless if x is 5 or not'
    ^
    SyntaxError: invalid syntax
    >>>

    i get that error. Im not sure how to indent as their are no braces and i would think that my indentation is correct.
  • RedSon
    Recognized Expert Expert
    • Jan 2007
    • 4980

    #2
    Send some example csv files and post up your code.

    Comment

    • tran0191
      New Member
      • Mar 2010
      • 7

      #3
      I did post my code

      Code:
      import csv
      
      
      file = csv.reader(open('C/NEW.CSV','r'), delimiter=',')
      
      first = csv.writer(open('C/First.CSV','w'))
      
      second = csv.writer(open('C/fir.CSV','w'))
      
      
      header = file.next()
      
      >>> temp = 0
      >>> for row in file:
      ... 	
      ... 	if temp == row[4]:
      ... 		first.writerow(row)
      ... 	else:
      ... 		second.writerow(row)
      …
      ... 		temp = row[4]
      ...
      Last edited by bvdet; Mar 20 '10, 01:49 PM. Reason: Add code tags

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        tran0191,

        Please use code tags when posting code. See posting guidelines here.

        It would be much easier for us to offer a solution to your problem if you would post representative samples of your data and a sample of the required output.

        BV - Moderator

        Comment

        • Glenton
          Recognized Expert Contributor
          • Nov 2008
          • 391

          #5
          Remove one tab from your line 21. In other words it needs to be on the same level as the if and else statements. As it stands, you only set temp equal to row[4] in the else branch of the if statement. You want temp to be equal to row[4] either way.

          Good luck.

          Comment

          • tran0191
            New Member
            • Mar 2010
            • 7

            #6
            Ok so I removed the tab from the code and it works, but I have another problem...

            Here is what I have in terms of code...

            Code:
            >>> import csv
            
            >>> file = csv.reader(open('C/NEW.CSV','r'), delimiter=',')
            
            >>> first = csv.writer(open('C/First.CSV','w'))
            
            >>> second = csv.writer(open('C/fir.CSV','w'))
            
            >>> header = file.next()
            
            >>> for row in file:
            ... 	lon ="51"
            ...	lat =”45.9”
            ...	depth = “80”
            ... 	if row[3] == lat and row[4] == lon and row[5] == depth:
            ... 		first.writerow(row)
            ...		
            ... 	else:
            ... 		second.writerow(row)
            ...	lat = row[3]
            ... 	lon = row[4]
            ... 	depth = row[5]
            >>>
            This code works fine, but the thing is, I dont want to hard code the values in for lon, lat and depth because I will be opening a different CSV file everytime.. I want a way to retrieve the lon, lat, depth for the first row

            And another thing I dont understand is how come when I set the variables lon,lat and depth at the beginning of the for loop and then change them at the end of the for loop the values dont reset to what I initially set them to after each iteration? The values change? I would think the values for those variables would be constant since I set them at the beginning of the for loop...

            Any help would be appreciated.

            Comment

            • bvdet
              Recognized Expert Specialist
              • Oct 2006
              • 2851

              #7
              I can show you how if you would post a sample of your data and a sample of the required output.

              Comment

              • tran0191
                New Member
                • Mar 2010
                • 7

                #8
                Code:
                DATE	     LAT        LON	DEPTH
                12/17/1991 	45.9	51	80
                12/17/1991 	45.9	51	80
                12/17/1991 	45.9	51	80
                12/17/1991 	45.9	52	80
                12/17/1991 	45.9	52	80
                12/18/1991 	45.9	52	80
                12/18/1991 	45.9	53	80
                12/18/1991 	45.9	53	80
                12/18/1991 	45.9	53	80
                Here is the sample data. Now I have to seperate that data by LON. For all the rows with the same value of LON they should go in seperate files.
                So the output should be something like this.


                Code:
                First.CSV
                DATE	      LAT        LON	DEPTH
                12/17/1991 	45.9	51	80
                12/17/1991 	45.9	51	80
                12/17/1991	45.9	51	80
                Code:
                Second.CSV
                DATE	      LAT        LON	DEPTH
                12/17/1991 	45.9	52	80
                12/17/1991 	45.9	52	80
                12/18/1991 	45.9	52	80

                Code:
                Third.CSV
                DATE	      LAT        LON	DEPTH
                12/18/1991               45.9	53	80
                12/18/1991 	45.9	53	80
                12/18/1991 	45.9	53	80

                So there would be 3 different CSV files because there are 3 different values for LON in the initial file.

                Comment

                • bvdet
                  Recognized Expert Specialist
                  • Oct 2006
                  • 2851

                  #9
                  In your first post, you were specifying the delimiter to be a comma. Your sample data uses no commas. What's up with that?

                  I saved the contents of your sample data file in a file named "lon.txt". The code creates a file for each different value of the column # occupied by "LON".
                  Code:
                  fn = "lon.txt"
                  
                  kword = 'LON'
                  
                  f = open(fn)
                  labels = [item for item in f.readline().strip().split() if item]
                  lonidx = labels.index(kword)
                  dd = {}
                  for line in f:
                      lineList = [item for item in line.strip().split() if item]
                      dd.setdefault(lineList[lonidx], []).append(lineList)
                  
                  f.close()
                  for key in dd:
                      f = open("lon%s.txt" % (key), 'w')
                      f.write("".join(["%-12s" % item for item in labels])+"\n")
                      f.write("\n".join(["".join(["%-12s" % item for item in sublist]) for sublist in dd[key]]))
                      f.close()
                  A sample output file named "lon51.txt" :
                  Code:
                  DATE        LAT         LON         DEPTH       
                  12/17/1991  45.9        51          80          
                  12/17/1991  45.9        51          80          
                  12/17/1991  45.9        51          80
                  The formatting of the output may be another exercise.

                  Comment

                  • tran0191
                    New Member
                    • Mar 2010
                    • 7

                    #10
                    That is exactly what I needed, but is there way to do this to a CSV file? Because I am reading a CSV file and the output also needs to be in a CSV file.

                    Comment

                    • bvdet
                      Recognized Expert Specialist
                      • Oct 2006
                      • 2851

                      #11
                      In my previous post, I asked:
                      In your first post, you were specifying the delimiter to be a comma. Your sample data uses no commas. What's up with that?
                      To write out a CSV file, the code is almost the same (untested):
                      Code:
                          f.write(",".join(labels)+"\n")
                          f.write("\n".join([",".join(sublist) for sublist in dd[key]]))

                      Comment

                      • tran0191
                        New Member
                        • Mar 2010
                        • 7

                        #12
                        I tried running the code but I ran into an error. I think this code only works for .txt files?
                        Because when the list is populated and when I changed the kword variable to "longitude" which is in my CSV file it returns the error:

                        ValueError: list.index(x): x not in list


                        when i try to run
                        "lonidx = labels.index(kw ord)"



                        I thought in order to read a CSV file you would need to import CSV?

                        Comment

                        • tran0191
                          New Member
                          • Mar 2010
                          • 7

                          #13
                          I believe any files with a CSV extension is a comma seperated (Comma Seperated Values)

                          Comment

                          • bvdet
                            Recognized Expert Specialist
                            • Oct 2006
                            • 2851

                            #14
                            Add a print statement before the offending line to see what is in labels:
                            Code:
                            f = open(fn)
                            labels = [item for item in f.readline().strip().split() if item]
                            print labels
                            lonidx = labels.index(kword)
                            A CSV file is a text file. The items in each line are delimited by commas. For simple data, you really don't need the csv module. Example:
                            Code:
                            >>> line = "1,2,3,4,5,6,7\n"
                            >>> line.strip().split(",")
                            ['1', '2', '3', '4', '5', '6', '7']
                            >>>
                            The csv module comes in handy when the data is more complicated, as in items in quotes that have embedded commas.
                            Given the file contents:
                            Code:
                            0.0,0.0,0.0,0.0,0.0,0.0,5.0,10.0,15.0,20.0,25.0
                            Blues,Elwood,1060 W Addison,"Chicago, Illinois 60613",B263-1655-2187,116,56
                            The csv module will return an iterable that will produce:
                            Code:
                            [['0.0', '0.0', '0.0', '0.0', '0.0', '0.0', '5.0', '10.0', '15.0', '20.0', '25.0'], ['Blues', 'Elwood', '1060 W Addison', 'Chicago, Illinois 60613', 'B263-1655-2187', '116', '56']]

                            Comment

                            • bvdet
                              Recognized Expert Specialist
                              • Oct 2006
                              • 2851

                              #15
                              I believe any files with a CSV extension is a comma seperated (Comma Seperated Values)
                              A CSV is a text file for tabular data with the columns separated by commas and rows separated by newline characters.

                              Comment

                              Working...