urgent need help in parsing html tables

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • poisonedapple
    New Member
    • Jul 2008
    • 8

    urgent need help in parsing html tables

    I am trying to parse a simple table with two headings and get the rows but I am having a big problem trying to find out how to pass the link to the html or path to the html.

    Html is apparently in my desktop itself I have a path but I have no clue how to use that in HTML::TableExtr act.

    Code:
    use HTML::TableExtract;
     $te = HTML::TableExtract->new( headers => [qw(Date Price Cost)] );
     $te->parse($html_string);
    
     # Examine all matching tables
     foreach $ts ($te->tables) {
       print "Table (", join(',', $ts->coords), "):\n";
    
       foreach $row ($ts->rows) {
          print join(',', @$row), "\n";
       }
     }
    Lets say I put those headings supposed heading1 and heading2 in place of Data Price
    Where should put the link to the html
    which is something like /home/jack/desktop/sample.html
    I tried doing $html_string="/home/jack/desktop/sample.html" but it does not work at all

    what am I supposed to do I appreciate if you can help me out of this .

    thanks a lot
    Last edited by eWish; Jul 1 '08, 11:39 PM. Reason: Please use code tags
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    If you use the better HTML::TablePars er module it can open the file for you. See the parse_file method:



    basically:

    Code:
    $p->parse_file('c:/windows/desktop/foo.html');
    where $p is the parser object and the file path is the correct one for your computer and file. Note: you can use forward slashes in windows file/directory paths.

    Comment

    • poisonedapple
      New Member
      • Jul 2008
      • 8

      #3
      Thanks for the post but that looks more complicated then the previous one.
      I just need to parse the a table in html which is in my desktop itself.
      I do not want to use any kind of table id or sizes just the heading name.

      What would be the best way to use HTML::TableExtr act,
      -I need to put the file path for html somewhere
      (the problem I am facing here is everywhere throughout the examples in cspan html_string is already there without initialization its an incomplete program)

      -I need to put the headers

      Results: I need the table data thats all I am sorry but I do not want to get to see what id is my table and all that.


      Please help me I think this is seems like a simple problem. I could not debug this problem because whenever I run I dont get errors and I dont get anything printed I am pretty much very irritatted and more hopeless everyday.I think I made a big mistake to tr using perl for this project the whole thing is so disorganized cant find a single example to just to that.

      Please I would reall appreciate if someone can help me .

      Prior thanks to all of those and thanks for the reply

      Comment

      • KevinADC
        Recognized Expert Specialist
        • Jan 2007
        • 4092

        #4
        here you go:

        Code:
        open (HTML, 'c:/path/to/foo.html') or die "$!";
        my $html = do {local $/; <HTML>};#puts the entire file in a scalar variable
        close HTML;
        Now you can parse $html.

        Comment

        • poisonedapple
          New Member
          • Jul 2008
          • 8

          #5
          This is the program I wrote:
          #!/usr/bin/perl
          use HTML::TableExtr act;
          open (HTML, '/root/Desktop/test.html') or die "$!";
          my $html = do {local $/; <HTML>};#puts the entire file in a scalar variable
          $te = HTML::TableExtr act->new( headers => [qw(Heading Heading_2)] );
          $te->parse($HTML) ;
          # Examine all matching tables
          foreach $ts ($te->tables) {
          print "Table (", join(',', $ts->coords), "):\n";
          foreach $row ($ts->rows) {
          print join(',', @$row), "\n";
          }
          }

          But when I do perl program.pl it does not do anything, it gives me a prompt.
          Thanks for the reply I would appreciate if you solve this problem.

          I am literally not getthing anything and after I do perl program.pl I get another prompt.
          Thanks , please help

          Comment

          • poisonedapple
            New Member
            • Jul 2008
            • 8

            #6
            Ok I think I got it there was a minor problem . Thanks a lot for help I appreciate

            Comment

            • poisonedapple
              New Member
              • Jul 2008
              • 8

              #7
              Hi ,
              I got the table extracted and I have a huge document full of tables. From this(HTML::Tabl eExtract) module I am trying to search for keywords(from the user input) on the parsed tables I have to print only the necessary data.
              I tried going CPAN but could not really find how to search through it for particular keywords.

              One way to do it would be(a rather wrong way for me since I need corresponding columns or some other relevant data from the table if I find that in that particular table):
              Output the result of the parsed tables into some .text and parse it from there
              but parsing from there would hinder my aim to actually get the keywords corresponding columns

              Aim and problem here:: is I cant find anyway to search through the resulting parsed table and get necessary data.


              thanks for the reply I appreciate

              Comment

              Working...