How to average column data in txt files?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Randall Benson
    New Member
    • Dec 2010
    • 11

    How to average column data in txt files?

    80409 1110 16.77 19.2 216.5 -9.97 -13.48

    My data resembles the above. I'm trying to average all columns except the first two, which are date columns. I need to average when the values in column 1 (the 1110 value) equal to 1100 or 1200, 1300, etc). I need to write the averaged columns to an output file. Thanks. Code sample below.
    Code:
    >>> fileIN = open('c:\TestColData.txt','r')
    >>> line = fileIN.readlines()
    >>> i = 0
    >>> while line:
    	line = fileIN.readlines()
    	i = i + 1
    	third_col = float(line[i].split( )[2])
    	print line
    	print third_col


    Traceback (most recent call last):
    File "<pyshell#496>" , line 4, in <module>
    third_col = float(line[i].split( )[2])
    IndexError: list index out of range
    Last edited by bvdet; Dec 16 '10, 12:19 AM. Reason: Add code tags
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Create a file object as you did, but iterate on the file object instead of calling readlines(). Untested:
    Code:
    for line in fileObj:
        lineList = line.strip().split()
        if lineList[1] in ['1100', '1200', '1300']:
            list_to_average = [float(s) for s in lineList[2:]]
            average = sum(list_to_average)/len(list_to_average)

    Comment

    • Randall Benson
      New Member
      • Dec 2010
      • 11

      #3
      I tried your solution and this is what i got:


      >>> fileObj = open('c:\TestCo lData.txt','r')
      >>> for line in fileObj:
      lineList = line.strip().sp lit()
      if lineList[1] in ['1200', '1300']:
      list_to_average = [float(s) for s in lineList[2:]]
      average = sum(list_to_ave rage)/len(list_to_ave rage)
      print average



      Traceback (most recent call last):
      File "<pyshell#506>" , line 5, in <module>
      average = sum(list_to_ave rage)/len(list_to_ave rage)
      TypeError: 'float' object is not callable

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        Let's test it:
        Code:
        >>> line = "80409 1100 16.77 19.2 216.5 -9.97 -13.48\n"
        >>> lineList = line.strip().split()
        >>> if lineList[1] in ['1100', '1200', '1300']:
        ... 	list_to_average = [float(s) for s in lineList[2:]]
        ... 	average = sum(list_to_average)/len(list_to_average)
        ... 	
        >>> print average
        45.804
        >>>
        Maybe you used sum or len as an identifier for a float.

        Comment

        • Randall Benson
          New Member
          • Dec 2010
          • 11

          #5
          hmmm...still getting an error and typed in exactly as you suggested.

          Traceback (most recent call last):
          File "<pyshell#522>" , line 3, in <module>
          average = sum(list_to_ave rage)/len(list_to_ave rage)
          TypeError: 'float' object is not callable

          Also, i need each column averaged or columns 2 thru 6 not just one average of all data in columns 2 thru 6. Could you insert comments? Thanks!!!

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            This illustrates your error, I believe:
            Code:
            >>> len = 8.0
            >>> len([1,2,3,4,5])
            Traceback (most recent call last):
              File "<interactive input>", line 1, in ?
            TypeError: 'float' object is not callable
            >>> del len
            >>> len([1,2,3,4,5])
            5
            >>>

            Comment

            • bvdet
              Recognized Expert Specialist
              • Oct 2006
              • 2851

              #7
              The code I posted will not average a column of numbers. You will need to compile a list of each column (maybe a list of lists), then you can average each column list after reaching the end of the file.

              Comment

              • Randall Benson
                New Member
                • Dec 2010
                • 11

                #8
                Hi again. Would you be able to tell me what is "s" in your code:
                list_to_average = [float(s) for s in lineList[2:]]

                Thanks,
                Randall

                Comment

                • bvdet
                  Recognized Expert Specialist
                  • Oct 2006
                  • 2851

                  #9
                  Sure. "s" is a temporary variable used in the list comprehension. In the example, each element of lineList is in turn assigned to "s". Example:
                  Code:
                  >>> lineList = ['1', '2', '3', '4', '5']
                  >>> [float(s) for s in lineList]
                  [1.0, 2.0, 3.0, 4.0, 5.0]
                  >>> s
                  '5'
                  >>>
                  An equivalent for loop:
                  Code:
                  >>> newList = []
                  >>> for s in lineList:
                  ... 	newList.append(float(s))
                  ... 	
                  >>> print newList
                  [1.0, 2.0, 3.0, 4.0, 5.0]
                  >>>

                  Comment

                  Working...