Creating an array from an HTML table

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • axlq

    Creating an array from an HTML table

    Before I try to do this myself (I remember doing it in Java years ago
    and it was a pain)....

    Has anyone run across a function that will take a string parameter
    containing an HTML table, and return a 2-dimensional array with each
    element corresponding to the contents of a table cell?

    I see plenty of examples of doing the opposite: convert an array to
    an HTML table. I want to go the other way, from an HTML table to an
    array.

    -A
  • Rik

    #2
    Re: Creating an array from an HTML table

    On Fri, 20 Jul 2007 04:01:07 +0200, axlq <axlq@spamcop.n etwrote:
    Before I try to do this myself (I remember doing it in Java years ago
    and it was a pain)....
    >
    Has anyone run across a function that will take a string parameter
    containing an HTML table, and return a 2-dimensional array with each
    element corresponding to the contents of a table cell?
    >
    I see plenty of examples of doing the opposite: convert an array to
    an HTML table. I want to go the other way, from an HTML table to an
    array.
    Regex could be the way to go. Before I start an elborate pattern: any
    ideas how you'd like to treat col-/rowspans?
    --
    Rik Wasmus

    Comment

    • Michael Fesser

      #3
      Re: Creating an array from an HTML table

      ..oO(Rik)
      >On Fri, 20 Jul 2007 04:01:07 +0200, axlq <axlq@spamcop.n etwrote:
      >
      >Has anyone run across a function that will take a string parameter
      >containing an HTML table, and return a 2-dimensional array with each
      >element corresponding to the contents of a table cell?
      >[...]
      >
      >Regex could be the way to go.
      Or maybe an XML/DOM approach, if the structure is valid.

      Micha

      Comment

      • Toby A Inkster

        #4
        Re: Creating an array from an HTML table

        Rik wrote:
        Regex could be the way to go.
        Argh! No! That way lies nightmares. Get the XML_HTMLSax3 class from PEAR
        and use that.

        Here's an example that should parse TR, TD and TH tags (ignoring others)
        including ROWSPAN and COLSPAN attributes. It creates an array of arrays
        representing rows of cells. It uses 0-based indices.

        <?php

        class TableParser
        {
        private $currow = -1;
        private $curcol = -1;

        private $shape = array();
        private $data = array();

        public function openHandler ($parser, $tag, $attrs)
        {
        $tag = strtolower($tag );

        // Move to the correct cell co-ordinates.
        if ($tag=='tr')
        {
        $this->currow++;
        $this->curcol = -1;
        }
        elseif ($tag=='td'||$t ag=='th')
        {
        $this->curcol++;
        }

        // This should account for rowspan and colspan.
        while ($this->shape[$this->currow][$this->curcol])
        $this->curcol++;
        $rowspan = 1;
        $colspan = 1;
        foreach ($attrs as $k=>$v)
        {
        $k = strtolower($k);
        if ($k=='rowspan')
        $rowspan=(int)$ v;
        elseif ($k=='colspan')
        $colspan=(int)$ v;
        }
        for ($i=0; $i<$rowspan; $i++)
        for ($j=0; $j<$colspan; $j++)
        {
        $x = $this->currow + $i;
        $y = $this->curcol + $j;
        if ($this->shape[$x][$y])
        error_log('Over lap!');
        $this->shape[$x][$y] = TRUE;
        }
        }

        public function closeHandler ($parser, $tag)
        {
        }

        public function dataHandler ($parser, $data)
        {
        $this->data[$this->currow][$this->curcol] .= $data;
        }

        public function getData ()
        {
        unset($this->data[-1]);
        foreach ($this->data as $k=>$v)
        unset($this->data[$k][-1]);
        return $this->data;
        }

        }
        include 'XML/HTMLSax3.php';
        $sax = new XML_HTMLSax3;
        $hdlr = new TableParser;
        $sax->set_object($hd lr);
        $sax->set_element_ha ndler('openHand ler', 'closeHandler') ;
        $sax->set_data_handl er('dataHandler ');
        $sax->parse('
        <table>
        <tr>
        <td rowspan="2">Tes t table lalala</td>
        <td>123</td>
        <td>456</td>
        </tr>
        <tr>
        <td>789</td>
        <td>ABC</td>
        </tr>
        <tr>
        <td colspan="2" rowspan="2">123 </td>
        <td>456</td>
        </tr>
        <tr>
        <td>789</td>
        </tr>
        </table>
        ');

        print_r($hdlr->getData());

        ?>


        --
        Toby A Inkster BSc (Hons) ARCS
        [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
        [OS: Linux 2.6.12-12mdksmp, up 29 days, 10:43.]

        PHP Domain Class

        Comment

        • Rik

          #5
          Re: Creating an array from an HTML table

          On Fri, 20 Jul 2007 09:56:44 +0200, Toby A Inkster
          <usenet200707@t obyinkster.co.u kwrote:
          Rik wrote:
          >
          >Regex could be the way to go.
          >
          Argh! No! That way lies nightmares.
          Depends on how well both the regex(es) and HTML are written. Allthough it
          could be a nightmare with nested tables indeed.
          Get the XML_HTMLSax3 class from PEAR
          and use that.
          With a lot of overhead, but it would be the more robust solution indeed.
          It's somewhat depended on wether to OP wants a 'fits (almost) all'
          solution, or just for a single known table.

          --
          Rik Wasmus

          Comment

          • Toby A Inkster

            #6
            Re: Creating an array from an HTML table

            Toby A Inkster wrote:
            class TableParser
            I've now published this class on my blog under the LGPL.

            This means that the class itself is Open Source -- and any improvements
            you make should be shared with the rest of us -- but you may use it
            within closed source software if desired.

            --
            Toby A Inkster BSc (Hons) ARCS
            [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
            [OS: Linux 2.6.12-12mdksmp, up 29 days, 14:30.]

            Parsing an HTML Table with PEAR's XML_HTTPSax3

            Comment

            • Rik

              #7
              Re: Creating an array from an HTML table

              On Fri, 20 Jul 2007 12:53:29 +0200, Toby A Inkster
              <usenet200707@t obyinkster.co.u kwrote:
              Toby A Inkster wrote:
              >
              >class TableParser
              >
              I've now published this class on my blog under the LGPL.
              >
              This means that the class itself is Open Source -- and any improvements
              you make should be shared with the rest of us -- but you may use it
              within closed source software if desired.
              >
              Hehe, that's a nice way of enforcing to keep us informed about possible
              progress :-)
              --
              Rik Wasmus

              Comment

              • axlq

                #8
                Re: Creating an array from an HTML table

                Thanks everyone for the replies. I found exactly what I need at
                http://realyshine.com - a class called tableExtractor. php.class.

                It works very well.

                -A

                In article <f7p513$ngn$1@b lue.rahul.net>, axlq <axlq@spamcop.n etwrote:
                >Before I try to do this myself (I remember doing it in Java years ago
                >and it was a pain)....
                >
                >Has anyone run across a function that will take a string parameter
                >containing an HTML table, and return a 2-dimensional array with each
                >element corresponding to the contents of a table cell?
                >
                >I see plenty of examples of doing the opposite: convert an array to
                >an HTML table. I want to go the other way, from an HTML table to an
                >array.
                >
                >-A

                Comment

                • axlq

                  #9
                  Re: Creating an array from an HTML table

                  In article <se66n4-mvu.ln1@ophelia .g5n.co.uk>,
                  Toby A Inkster <usenet200707@t obyinkster.co.u kwrote:
                  >Regex could be the way to go.
                  >
                  >Argh! No! That way lies nightmares. Get the XML_HTMLSax3 class from PEAR
                  >and use that.
                  I agree Regex isn't what I want to mess with either. But PEAR is
                  unnecessary - especially if you don't run your own server and your
                  web-host provider doesn't support PEAR. The tableExtractor. class.php
                  from reallyshiny.com turned out to solve my problem, doesn't require
                  PEAR, and works quite well.

                  -A

                  Comment

                  • Sanders Kaufman

                    #10
                    Re: Creating an array from an HTML table

                    axlq wrote:
                    >
                    I agree Regex isn't what I want to mess with either. But PEAR is
                    unnecessary - especially if you don't run your own server and your
                    web-host provider doesn't support PEAR. The tableExtractor. class.php
                    from reallyshiny.com turned out to solve my problem, doesn't require
                    PEAR, and works quite well.
                    Actually, you can install PEAR on many provider's servers without an
                    special permissions.


                    Comment

                    Working...