How to insert data where it's missing?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Randall Benson
    New Member
    • Dec 2010
    • 11

    How to insert data where it's missing?

    Hello,

    I am trying to insert -9999 for any missing 10 minute data in a text file I am reading.

    Data format sample is as follows in the input text file:

    061201 1910 4.88 5.01
    061201 1920 4.54 4.71
    061201 1930 4.58 5.20
    061201 2010 4.65 4.91

    Required output:

    061201 1910 4.88 5.01
    061201 1920 4.54 4.71
    061201 1930 4.58 5.20
    061201 1940 -9999 -9999
    061201 1950 -9999 -9999
    061201 2000 -9999 -9999
    061201 2010 4.65 4.91

    I would appreciate any help. Thank you, RB
  • dwblas
    Recognized Expert Contributor
    • May 2008
    • 626

    #2
    Why aren't you including 1960-1990, and what happens if 1950 is the first record or the last record. Should the values always range from 1910-2010?
    Code:
    test_list = ["061201 1910 4.88 5.01", 
    "061201 1920 4.54 4.71",
    "061201 1930 4.58 5.20",
    "061201 2010 4.65 4.91" ]
    
    test_dict = {}
    first = ""
    for rec in test_list:
        substrs = rec.split()
        test_dict[int(substrs[1])] = rec
        first = substrs[0]
    
    years_list = [1910, 1920, 1930, 1940, 1950, 2000, 2010]
    for year in years_list:
        if year in test_dict:
            print test_dict[year]
        else:
            print "%s %d -9999 -9999" % (first, year)

    Comment

    • Randall Benson
      New Member
      • Dec 2010
      • 11

      #3
      Here's more detail: Col 0 is date (mmddyy), Col 1 is time (hhmm), cols 2,3 are wind speeds. Col 1 time and Col 0 date may start at any time whenever sensor starts measuring data. Once I fill in missing dates, times and values (-9999), then I'll go back and make hourly averages of the 10minute data - but I know how to do that.
      061201 1910 4.88 5.01
      061201 1920 4.54 4.71
      061201 1930 4.58 5.20
      061201 2010 4.65 4.91

      Comment

      • Rabbit
        Recognized Expert MVP
        • Jan 2007
        • 12517

        #4
        Why would you want to use -9999 for missing values if you're going to calculate averages? It's going to throw your averages way off and will also give you negative averages.

        While dwblas' code isn't exactly what you wanted, it contains the basic concept of what you're trying to accomplish. You should be able to use that as a starting point to move towards your end goal.

        Comment

        • Randall Benson
          New Member
          • Dec 2010
          • 11

          #5
          Thanks for the feedback - I'll deal with the -9999's in the average calculations loop.

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            Here's my suggested code:
            Code:
            data = '''061201 1910  4.88  5.01
            061201 1920  4.54  4.71
            061201 1930  4.58  5.20
            061201 2010  4.65  4.91'''
            
            # parse the data string to simulate reading in the file
            dataList = [[s.strip() for s in item.split() if s.strip()] \
                        for item in data.split("\n")]
            
            # create time list of 10 minute intervals based on first and last time
            start = int(dataList[0][1])
            stop = int(dataList[-1][1])
            timeList = ["%s" % (start+i*10) for i in range((stop-start)/10+1)]
            
            # create dictionary from dataList, then add missing data
            dateStr = dataList[0][0]
            dd = {}
            for item in dataList:
                dd[item[1]] = [item[i] for i in [0,2,3]]
            for timeStr in timeList:
                dd.setdefault(timeStr, [dateStr, "-9999", "-9999"])
            
            # join data in dictionary for output
            output = "\n".join([" ".join([dd[key][0],
                                          key, dd[key][1],
                                          dd[key][2]]) for key in timeList])
            
            print output
            Output:
            Code:
            >>> 061201 1910 4.88 5.01
            061201 1920 4.54 4.71
            061201 1930 4.58 5.20
            061201 1940 -9999 -9999
            061201 1950 -9999 -9999
            061201 1960 -9999 -9999
            061201 1970 -9999 -9999
            061201 1980 -9999 -9999
            061201 1990 -9999 -9999
            061201 2000 -9999 -9999
            061201 2010 4.65 4.91
            >>>

            Comment

            • Randall Benson
              New Member
              • Dec 2010
              • 11

              #7
              Hello,
              I'm trying to follow your (dwblas) code solution above but I need some comments along the way. Could you briefly comment each line please? Thanks, RB

              Comment

              • Randall Benson
                New Member
                • Dec 2010
                • 11

                #8
                bvdet,

                I'm interested to know how you would change your solution above since column 2 is in the form of hhmm and so after 1950 the output needs to be 2000 and so on. And, after 2350, the date needs to change to 061202. How would you adjust for this? Thanks much! RB

                Comment

                • bvdet
                  Recognized Expert Specialist
                  • Oct 2006
                  • 2851

                  #9
                  In view of that requirement, the best option would be to go to the time module.

                  Convert the string "061201 1910" into a struct_time object, then convert to the number of seconds since the epoch. Add 10 minutes (600 seconds) and convert back to a string.
                  Code:
                  >>> import time
                  >>> stObj = time.strptime("061201 1910", "%y%m%d %H%M")
                  >>> stObj
                  (2006, 12, 1, 19, 10, 0, 4, 335, -1)
                  >>> seconds = time.mktime(stObj)
                  >>> seconds
                  1165021800.0
                  >>> newStr = time.strftime("%y%m%d %H%M", time.localtime(seconds+600))
                  >>> newStr
                  '061201 1920'
                  >>> stObj = time.strptime("061201 2350", "%y%m%d %H%M")
                  >>> time.strftime("%y%m%d %H%M", time.localtime(time.mktime(stObj)+600))
                  '061202 0000'
                  >>> stObj = time.strptime("061201 1950", "%y%m%d %H%M")
                  >>> time.strftime("%y%m%d %H%M", time.localtime(time.mktime(stObj)+600))
                  '061201 2000'
                  >>>

                  Comment

                  Working...