How can I delete contents between <SEC-HEADER> and </SEC-HEADER> in a htm file?
Why my code does not work?
Thanks!
Why my code does not work?
Thanks!
Code:
#!/usr/bin/perl # This is a program which can process the Edgar 10-k html file into a plain text # file without graphs and tables. $filename="H:/Test Data/wmt2004.htm"; open IN, '<', $filename or die; @contents = <IN>; close IN; @contents = grep !/<SEC-HEADER>.*</SEC-HEADER>/ @contents; $filenameout="H:/Test Data/wmt2004-processed.htm"; open OUT, '>', $filenameout or die; print OUT @contents; close OUT;
Comment