Hi Guys. I am a newbie to perl and need some help with a problem.
PROBLEM: I have to parse an HTML file and get rid of all the HTML tags and count the number of sumbissions a person has through out the dates found. The condition is that multiple submissions by the same person on the same date is counted as 1.
I have already gotten rid of the HTML tags using:
And now after parsing the HTML file my output looks like: (This is just a part of the output)
Aneeka Bhalla Bhalla, Aneeka (abhalla7840)Re ceived 01-24-2007 10:51
Andrew Johnson 1-24-07 Johnson, Andrew (aljohnson8711) Received 01-24-2007 10:51
Stephen Pennington - Jan 24, 06 Pennington, Stephen (sjpennington84 23)Received 01-24-2007 10:51
Sarah Gatliff Gatliff, Sarah (sngatliff7093) Received 01-24-2007 10:51
Kyle McCracken McCracken, Kyle (krmccracken903 2)Received 01-24-2007 10:51
Exercise 1 1/24/07 Monk, Megan (mjmonk7907)Rec eived 01-24-2007 10:50
homework Ilieva, Mariya (mkilieva7030)R eceived 01-18-2007 15:15
Sarah Gatliff Gatliff, Sarah (sngatliff7093) Received 01-17-2007 10:48
William Shaun Greening Greening, William (wsgreening7657 )Received 01-17-2007 10:48
Shearita Henderson Received 01-17-2007 10:48
pfe quotes 1-17-07 Monk, Megan (mjmonk7907)Rec eived 01-17-2007 10:47
Sondra Denise Grissom Received 01-17-2007 10:47
Anthony Harris Harris, Anthony (adharris9208)R eceived 01-17-2007 10:47
Curtis Box Intro Worksheet Box, Curtis (cbox9827)Recei ved 01-17-2007 10:47
Jason Hughes Hughes, Jason (jbhughes8891)R eceived 01-17-2007 10:47
charles christopherson Christopherson, Charles (cachristophers on9444)Received 01-17-2007 10:47
Darwin Moore Moore, Darwin (ddmoore7092)Re ceived 01-17-2007 10:47
April Stephens Stephens, April (atstephens4498 )Received 01-17-2007 10:47
Lyntisha Miller Miller, Lyntisha (lsmiller8647)R eceived 01-17-2007 10:47
Kyle McCracken McCracken, Kyle (krmccracken903 2)Received 01-17-2007 10:47
Aneeka Bhalla Bhalla, Aneeka (abhalla7840)Re ceived 01-17-2007 10:47
Format for your understanding:
<file name> <lastname>,<fir stname> <userid> Received <Date and Time>
My output should be:
<firstname> <lastname> (< number of time user submitted>)
eg.
Aneeka Bhalla (2)
Kyle McCracken (1)
....
I need help with the counting and comparing dates part.
Any help appreciated !
PROBLEM: I have to parse an HTML file and get rid of all the HTML tags and count the number of sumbissions a person has through out the dates found. The condition is that multiple submissions by the same person on the same date is counted as 1.
I have already gotten rid of the HTML tags using:
Code:
#!/usr/bin/perl -w
use strict;
package HTMLStrip;
use base "HTML::Parser";
sub text {
my ($self, $text) = @_;
print $text;
}
my $p = new HTMLStrip;
# parse line-by-line, rather than the whole file at once
while (<>) {
#chomp;
s/ / /g;
s/>/ /g;
s/Remove/ /g;
$p->parse($_);
}
# flush and parse remaining unparsed HTML
$p->eof;
Aneeka Bhalla Bhalla, Aneeka (abhalla7840)Re ceived 01-24-2007 10:51
Andrew Johnson 1-24-07 Johnson, Andrew (aljohnson8711) Received 01-24-2007 10:51
Stephen Pennington - Jan 24, 06 Pennington, Stephen (sjpennington84 23)Received 01-24-2007 10:51
Sarah Gatliff Gatliff, Sarah (sngatliff7093) Received 01-24-2007 10:51
Kyle McCracken McCracken, Kyle (krmccracken903 2)Received 01-24-2007 10:51
Exercise 1 1/24/07 Monk, Megan (mjmonk7907)Rec eived 01-24-2007 10:50
homework Ilieva, Mariya (mkilieva7030)R eceived 01-18-2007 15:15
Sarah Gatliff Gatliff, Sarah (sngatliff7093) Received 01-17-2007 10:48
William Shaun Greening Greening, William (wsgreening7657 )Received 01-17-2007 10:48
Shearita Henderson Received 01-17-2007 10:48
pfe quotes 1-17-07 Monk, Megan (mjmonk7907)Rec eived 01-17-2007 10:47
Sondra Denise Grissom Received 01-17-2007 10:47
Anthony Harris Harris, Anthony (adharris9208)R eceived 01-17-2007 10:47
Curtis Box Intro Worksheet Box, Curtis (cbox9827)Recei ved 01-17-2007 10:47
Jason Hughes Hughes, Jason (jbhughes8891)R eceived 01-17-2007 10:47
charles christopherson Christopherson, Charles (cachristophers on9444)Received 01-17-2007 10:47
Darwin Moore Moore, Darwin (ddmoore7092)Re ceived 01-17-2007 10:47
April Stephens Stephens, April (atstephens4498 )Received 01-17-2007 10:47
Lyntisha Miller Miller, Lyntisha (lsmiller8647)R eceived 01-17-2007 10:47
Kyle McCracken McCracken, Kyle (krmccracken903 2)Received 01-17-2007 10:47
Aneeka Bhalla Bhalla, Aneeka (abhalla7840)Re ceived 01-17-2007 10:47
Format for your understanding:
<file name> <lastname>,<fir stname> <userid> Received <Date and Time>
My output should be:
<firstname> <lastname> (< number of time user submitted>)
eg.
Aneeka Bhalla (2)
Kyle McCracken (1)
....
I need help with the counting and comparing dates part.
Any help appreciated !
Comment