How to parse specific numeric data from csv file using python

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pikkukoira
    New Member
    • May 2010
    • 4

    How to parse specific numeric data from csv file using python

    Good day.

    I have series of data in cvs file like below,

    1,,,
    1,137.1,1198,1. 6
    2,159,300,0.4
    3,176,253,0.3
    4,197,231,0.3
    5,198,525,0.7
    6,199,326,0.4
    7,215,183,0.2
    8,217.1,178,0.2
    9,244.2,416,0.5
    10,245.1,316,0. 4

    I want to extract specific data from second column for example 217.1 and 245.1 and have them concatenated into a new file like,

    8,217.1,178,0.2
    10,245.1,316,0. 4

    I use cvs module to read my cvs file, but, I can't extract specific data as I desire. Could anyone kindly please help me. Thank you.
  • Glenton
    Recognized Expert Contributor
    • Nov 2008
    • 391

    #2
    Just open the cvs file, and open a second cvs file to write.

    Then loop through the first cvs file, (for line in fileobject), and check whether the data in the second column matches your criteria. If it does, write the line to the second cvs file.

    Personally, if your data is really as simple as you suggest, I wouldn't bother with the cvs file, and just use the built in open, read, write and close commands. Should only be a handful of lines. If you're struggling, post back with what you've tried, and I'll write out an example of the code. I'm rushing now, though, I'm afraid.

    Comment

    • woooee
      New Member
      • Mar 2008
      • 43

      #3
      +1 on opening it as a normal file and splitting on the comma. Also, check the length of each split record before doing anything else. You should not rely on their not being any empty or malformed records in the file.

      Comment

      • pikkukoira
        New Member
        • May 2010
        • 4

        #4
        Originally posted by Glenton
        Just open the cvs file, and open a second cvs file to write.

        Then loop through the first cvs file, (for line in fileobject), and check whether the data in the second column matches your criteria. If it does, write the line to the second cvs file.

        Personally, if your data is really as simple as you suggest, I wouldn't bother with the cvs file, and just use the built in open, read, write and close commands. Should only be a handful of lines. If you're struggling, post back with what you've tried, and I'll write out an example of the code. I'm rushing now, though, I'm afraid.
        Dear Glenton,

        Actually I'm dealing with 10K rows of data in my csv file. I will try your suggestion. Thanks for prompt replied.

        Comment

        • Glenton
          Recognized Expert Contributor
          • Nov 2008
          • 391

          #5
          10K should be trivial, ie comfortably less than 0.1s to run. But I was referring to the structure of the file, rather that the size of it anyway. Let us know how it goes, and if you can post back your solution - it will help future users!

          Comment

          • pikkukoira
            New Member
            • May 2010
            • 4

            #6
            Originally posted by Glenton
            10K should be trivial, ie comfortably less than 0.1s to run. But I was referring to the structure of the file, rather that the size of it anyway. Let us know how it goes, and if you can post back your solution - it will help future users!
            Thanks for suggestion.

            With help, I managed extract the reading I desire. Below are my codes,

            Code:
            import sys
            import re
            import csv
            
            filename = sys.argv [1]
            myfile = csv.reader(open(filename) , delimiter = ',')
            
            d_values = ['288', '305', '347', '389', '437', '483']
            
            for row in myfile:
                for val in d_values:
                    m = re.match(val +  '\.[0-9]*', row[1])
                    if m:
                       print row
            The output on the screen will be,

            ['22', '288.4', '11239', '14.7']
            ['31', '305.2', '2241', '2.9']
            ['56', '347.2', '76661', '100']
            ['86', '389.2', '48408', '63.1']
            ['118', '437.3', '1701', '2.2']
            ['158', '483.2', '11048', '14.4']
            ['192', '521.3', '8429', '11']
            ['233', '563.3', '9916', '12.9']
            ['281', '613.4', '327', '0.4']
            ['295', '627.3', '370', '0.5']
            ['337', '669.4', '1032', '1.3']
            ['362', '695.3', '4592', '6']
            ['401', '737.3', '6065', '7.9']
            ['422', '759.3', '300', '0.4']
            ['439', '779.3', '1775', '2.3']
            ['527', '869.3', '1640', '2.1']
            ['567', '911.4', '1598', '2.1']
            ['21', '288.4', '14775', '18.3']
            ['30', '305.2', '1979', '2.4']
            ['57', '347.2', '80888', '100']
            ['84', '389.2', '52990', '65.5']
            ['118', '437.3', '2052', '2.5']
            ['155', '483.2', '12031', '14.9']
            Now, I'm stuck with how to write the output into a flie. I tried convert the string list to numeric, but, I failed also.

            I'm sorry for asking many questions. Python is my first programming language, I'm still learning. Thanks for your time.
            Last edited by numberwhun; May 16 '10, 03:02 PM. Reason: Please use CODE TAGS!

            Comment

            • Glenton
              Recognized Expert Contributor
              • Nov 2008
              • 391

              #7
              Good job! Please use code tags for posting code.

              The easiest way to write to file is to do so line by line, much like you printed it! You can also simplify the matching!

              Code:
              import sys
              import re
              import csv
              
              filename = sys.argv [1]
              myfile = csv.reader(open(filename) , delimiter = ',')
              myfile2 = open(filename2,"w")
              
              d_values = ['288', '305', '347', '389', '437', '483']
              
              for row in myfile:
                  if row[1] in d_values:
                      myfile2.write(row)
              It might be better for you to create a csv writer object instead and use writerow(row), but I haven't had much experience with this. But what I gave you should work, since row is coming out of csv anyway.

              Comment

              • pikkukoira
                New Member
                • May 2010
                • 4

                #8
                Originally posted by Glenton
                Good job! Please use code tags for posting code.

                The easiest way to write to file is to do so line by line, much like you printed it! You can also simplify the matching!

                Code:
                import sys
                import re
                import csv
                
                filename = sys.argv [1]
                myfile = csv.reader(open(filename) , delimiter = ',')
                myfile2 = open(filename2,"w")
                
                d_values = ['288', '305', '347', '389', '437', '483']
                
                for row in myfile:
                    if row[1] in d_values:
                        myfile2.write(row)
                It might be better for you to create a csv writer object instead and use writerow(row), but I haven't had much experience with this. But what I gave you should work, since row is coming out of csv anyway.
                Thanks for your reminder. I'll use code tag to post code in future. Thanks.

                Comment

                • numberwhun
                  Recognized Expert Moderator Specialist
                  • May 2007
                  • 3467

                  #9
                  While I am no Python expert, have you looked at all at the csv module in the Python library? It may prove helpful in the process.

                  Regards,

                  Jeff

                  Comment

                  • Glenton
                    Recognized Expert Contributor
                    • Nov 2008
                    • 391

                    #10
                    Originally posted by numberwhun
                    While I am no Python expert, have you looked at all at the csv module in the Python library? It may prove helpful in the process.

                    Regards,

                    Jeff
                    Thanks Jeff, we have looked at the csv module.

                    Comment

                    • Glenton
                      Recognized Expert Contributor
                      • Nov 2008
                      • 391

                      #11
                      Oh, by the way, you should append the following to the end of your code:
                      Code:
                      myfile.close()
                      myfile2.close()

                      Comment

                      Working...