I m a beginner to python. Could you tell me how should i proceed to remove duplicate rows in a csv file
Removing duplicate entries in a csv file using a python script
Collapse
X
-
Tags: None
-
Originally posted by sathish119I m a beginner to python. Could you tell me how should i proceed to remove duplicate rows in a csv file
[CODE=python]reader = open("file.csv" , "r")
lines = reader.read().s plit("\n")
reader.close()
writer = open("file.csv" , "w")
for line in set(lines):
writer.write(li ne + "\n")
writer.close()[/CODE] -
This code maintains the order of the data:[code=Python]>>> rows = open('data.txt' ).read().split( '\n')
>>> newrows = []
>>> for row in rows:
... if row not in newrows:
... newrows.append( row)
...
>>> f = open('data1.txt ', 'w')
>>> f.write('\n'.jo in(newrows))
>>> f.close()[/code]Comment
-
Here is another way to solve your problem using bvdet's method and the csv module.
[CODE=python]import csv
rows = csv.reader(open ("file.csv", "rb"))
newrows = []
for row in rows:
if row not in newrows:
newrows.append( row)
writer = csv.writer(open ("file.csv", "wb"))
writer.writerow s(newrows)[/CODE]Comment
-
Originally posted by KaezarRexHere is another way to solve your problem using bvdet's method and the csv module.
[CODE=python]import csv
rows = csv.reader(open ("file.csv", "rb"))
newrows = []
for row in rows:
if row not in newrows:
newrows.append( row)
writer = csv.writer(open ("file.csv", "wb"))
writer.writerow s(newrows)[/CODE]
from above code,when i am using set(rows),am getting error' list objects are unhashable'..i think list is hashble (by hashq module)...then y am i getting this error??pls explainComment
-
hi...
from above code,when i am using set(rows),am getting error' list objects are unhashable'..i think list is hashble (by hashq module)...then y am i getting this error??pls explainComment
-
Originally posted by gpadmini24hi...
from above code,when i am using set(rows),am getting error' list objects are unhashable'..i think list is hashble (by hashq module)...then y am i getting this error??pls explainComment
-
Originally posted by KaezarRexHere is another way to solve your problem using bvdet's method and the csv module.
[CODE=python]import csv
rows = csv.reader(open ("file.csv", "rb"))
newrows = []
for row in rows:
if row not in newrows:
newrows.append( row)
writer = csv.writer(open ("file.csv", "wb"))
writer.writerow s(newrows)[/CODE]Comment
-
Originally posted by bvdetThis code maintains the order of the data:[code=Python]>>> rows = open('data.txt' ).read().split( '\n')
>>> newrows = []
>>> for row in rows:
... if row not in newrows:
... newrows.append( row)
...
>>> f = open('data1.txt ', 'w')
>>> f.write('\n'.jo in(newrows))
>>> f.close()[/code]Comment
-
Originally posted by sathish119Hey i used this code and i was able to remove the duplicate entries. thanks. actually this csv file is generated by a java code. if the code is modified, the output should remain the same. to acheive this i found that the files should be sorted in some order to compare(since the rows are selected by the java code randomly). could you tell me how to sort the contents for ex. priority: Column 5, Column 8, Column1. is it possible to sort the newrows list before writing.
x = cmp(a[5], b[5])
if not x:
y = cmp(a[8], b[8])
if not y:
return cmp(a[1], b[1])
return y
return x
yourList.sort(c omp581)[/code]In Python 2.4:[code=Python]yourList.sort(k ey=lambda i: (i[5], i[8], i[1]))[/code]Comment
-
Originally posted by bvdetI am in Python 2.3. Define a comparison function to pass to the list sort method:[code=Python]def comp581(a, b):
x = cmp(a[5], b[5])
if not x:
y = cmp(a[8], b[8])
if not y:
return cmp(a[1], b[1])
return y
return x
yourList.sort(c omp581)[/code]In Python 2.4:[code=Python]yourList.sort(k ey=lambda i: (i[5], i[8], i[1]))[/code]Comment
Comment