How to Remove Header info, add date, merge files

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • turnerca902
    New Member
    • Feb 2008
    • 1

    How to Remove Header info, add date, merge files

    Hi Folks,

    I am working on a little project and hoping someone out there might offer me some info/insight. I have a folder containing about 50 html files where the contents look like this:

    Astronomical Applications Dept.
    U.S. Naval Observatory
    Washington, DC 20392-5420

    ST. PETER'S NOVA SCOTIA
    o , o ,
    W 60 52, N45 39

    Altitude and Azimuth of the Sun
    Jun 1, 2008
    Zone: 3h West of Greenwich

    Altitude Azimuth
    (E of N)

    h m o o
    03:55 -11.6 40.5
    04:00 -11.0 41.6
    04:05 -10.4 42.6
    04:10 -9.9 43.6
    04:15 -9.2 44.6
    04:20 -8.6 45.6
    04:25 -8.0 46.6
    04:30 -7.4 47.5
    04:35 -6.7 48.5
    04:40 -6.0 49.5
    04:45 -5.4 50.4
    04:50 -4.7 51.4
    04:55 -4.0 52.3
    05:00 -3.3 53.2
    05:05 -2.6 54.1
    05:10 -1.9 55.0
    05:15 -1.2 56.0
    05:20 0.1 56.9
    05:25 0.7 57.8
    05:30 1.4 58.6
    05:35 2.1 59.5
    05:40 2.8 60.4

    What I need to do is remove the header info from each file, and add a date in front of each remaining line so that it looks like this:

    2008-06-01 03:55 -11.6 40.5

    Finally, I need to merge all the info from these individual files into one master.dat file.

    If anyone can offer some advice on how I would be able to do this, that would be awesome.
    Thanks Very Much!
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Something like this:[code=Python]import re, os

    dir_name = 'your_directory '

    def combine_files(d ir_name, fn, prefix):
    fileList = []
    for file in os.listdir(dir_ name):
    dirfile = os.path.join(di r_name, file)
    if os.path.isfile( dirfile):
    fileList.append (dirfile)
    patt = re.compile(r'\d {2}:')
    outList = []
    for f in fileList:
    print 'Parsing file %s' % f
    for line in open(f).readlin es():
    if patt.match(line ):
    outList.append( ' '.join([prefix, line.strip()]))
    ff = open(fn, 'w')
    ff.write('\n'.j oin(outList))
    ff.close()

    combine_files(d ir_name, 'combined.dat', '2008-06-01')[/code]

    Comment

    Working...