Hello,
I'm working on a project that involves parsing dates out of HTML files (text files, for all intents and purposes).
The problem is the date format and placement in the page isn't consistent. I managed to work around this for the first batch of files, but the second batch is in French and has even more formats.
For example
Le 5 Janvier 2006
1er Fevrier 2009
10 Fevrier, 2009
Fevrier 10, 2009
And so on. There are accented characters but I can replace them with their unaccented counterpart.
Efficiency doesn't matter as once all the files have the dates extracted, the program won't be used again.
Any suggestions?
I'm working on a project that involves parsing dates out of HTML files (text files, for all intents and purposes).
The problem is the date format and placement in the page isn't consistent. I managed to work around this for the first batch of files, but the second batch is in French and has even more formats.
For example
Le 5 Janvier 2006
1er Fevrier 2009
10 Fevrier, 2009
Fevrier 10, 2009
And so on. There are accented characters but I can replace them with their unaccented counterpart.
Efficiency doesn't matter as once all the files have the dates extracted, the program won't be used again.
Any suggestions?
Comment