Python csv calculate percentage by group

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • larafaelivrin
    New Member
    • Jan 2015
    • 6

    Python csv calculate percentage by group

    I have a table (csv file) with three columns:

    Wood [m2] Polygon Area [m2]
    15 A 50
    10 A 50
    12 B 30
    10 C 30
    05 D 50
    10 D 50

    My aim is to calculate the percentage of wood for each Polygon. I want to print this result into a new csv table:

    Polygon Percentage of Wood (%)
    A 0.5 (=25/50)
    B
    C
    D

    I usually use Python through ArcGIS (arcpy module) but the modules are very slow for certain things. This is why I want to try to solve the question without this module. But I cannot figure out how to do this. Any help is greatly appreciated.
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    You would start by opening the file, reading the file, breaking up the file contents to individual parts and saving in a container object such as a list or dictionary, iterate on the container and perform your calculations, print the output or save to disk. Would not you have to do those steps in ArcGIS?

    Comment

    • larafaelivrin
      New Member
      • Jan 2015
      • 6

      #3
      no, there are arcpy tools which you can call and as I understand they simplify the steps. But the problem is that some of them take very long to run. This website shows me how to read a csv file (https://docs.python.org/2/library/csv.html) and I managed to do that but how can I group the variables? Is there a function?

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        Here's an example of manipulating the data after the file is read:
        Code:
        data = """Wood [m2],Polygon,Area [m2]
        15,A,50
        10,A,50
        12,B,30
        10,C,30
        05,D,50
        10,D,50"""
        
        dataLines = data.split("\n")
        for line in dataLines[1:]:
            items = line.split(",")
            print ("Polygon %s: \nPercentage: %0.0f%%" %
                   (items[1], float(items[0])/float(items[2])*100))
            print "========================"
        And the output:
        Code:
        >>> Polygon A: 
        Percentage: 30%
        ========================
        Polygon A: 
        Percentage: 20%
        ========================
        Polygon B: 
        Percentage: 40%
        ========================
        Polygon C: 
        Percentage: 33%
        ========================
        Polygon D: 
        Percentage: 10%
        ========================
        Polygon D: 
        Percentage: 20%
        ========================
        >>>

        Comment

        • larafaelivrin
          New Member
          • Jan 2015
          • 6

          #5
          ok but with this solution I get several output for Polygon A and D. I am interested in summarizing the wooden Areas for each Polygon which has the same name. For Polygon A for example this would be 15+20/50. Is the quickest way to sum up the outputs or to do this step beforehand? Thanks a lot!!

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            I don't think you want 15+(20/50)(operator precedence). I think you want (15+20)/50.

            Here's where a dictionary comes in handy:
            Code:
            data = """Wood [m2],Polygon,Area [m2]
            15,A,50
            10,A,50
            12,B,30
            10,C,30
            05,D,50
            10,D,50"""
            
            dataLines = data.split("\n")
            dd = {}
            for line in dataLines[1:]:
                items = line.split(",")
                dd.setdefault(items[1], []).append((float(items[0]), float(items[2])))
            
            keys = sorted(dd.keys())
            for key in keys:
                print ("Polygon %s: \nPercentage: %0.0f%%" %
                       (key, sum((item[0] for item in dd[key]))/dd[key][0][1]*100))
                print "========================"

            Comment

            • larafaelivrin
              New Member
              • Jan 2015
              • 6

              #7
              I just copied your code and it works perfectly! Thank you so much!! I will try to understand what you did and maybe I can get back to you in case I do not understand something. Thanks!:)

              Comment

              • larafaelivrin
                New Member
                • Jan 2015
                • 6

                #8
                Another question (sry...): If I import my csv file I get the fallowing structure:

                ['15', 'A', '50']
                ['10', 'A', '50']
                ['12', 'B', '30']
                ['10', 'C', '30']
                ['5', 'D', '50']
                ['10', 'D', '50']

                How do you import your csv file without listing each row separately? I don“t seem to be able to figure out what I am doing wrong...

                Comment

                • larafaelivrin
                  New Member
                  • Jan 2015
                  • 6

                  #9
                  Aha, maybe I figured out how to do it:

                  data = open("Test.csv" , "r")
                  print data.read()

                  but now I get this error:
                  Traceback (most recent call last):
                  File "/home/katharina/Desktop/Test.py", line 14, in <module>
                  dataLines = data.split("\n" )
                  AttributeError: 'file' object has no attribute 'split'

                  and if I uncomment the dataLines line the fallowing error appears: Traceback (most recent call last):
                  File "/home/katharina/Desktop/Test.py", line 16, in <module>
                  for line in data[1:]:
                  TypeError: 'file' object has no attribute '__getitem__'

                  Any clue what I am doing wrong?

                  Comment

                  • bvdet
                    Recognized Expert Specialist
                    • Oct 2006
                    • 2851

                    #10
                    There are several ways of doing this. You don't have to create a file object.
                    Code:
                    data = open("Test.csv", "r").read()
                    OR
                    Code:
                    dataLines = [item.strip() for item in open("Test.csv", "r").readlines()

                    Comment

                    Working...