Analyze and read in html file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Radium

    Analyze and read in html file

    Hi,

    what i want is something similar to th simple-xml extension of php, but for
    html.

    I have to analyze and read in certain tags from a html file in a comfortable
    manner.
    Is there a php extension/library which makes this possible?

    Thx

    Axel



  • lorento

    #2
    Re: Analyze and read in html file

    I think xml extensions support parsing for html
    --



    Comment

    • Radium

      #3
      Re: Analyze and read in html file

      I thought that only XHTML is XML compatible, but not HTML.
      So it would not be possible to read it via XML extension.

      Can someone please comment on that thought?

      Thx

      Axel




      "lorento" <laurente1234@y ahoo.com> schrieb im Newsbeitrag
      news:1149583490 .071250.31460@g 10g2000cwb.goog legroups.com...[color=blue]
      >I think xml extensions support parsing for html
      > --
      > http://www.mastervb.net
      > http://www.padbuilder.com
      >[/color]


      Comment

      • Colin McKinnon

        #4
        Re: Analyze and read in html file

        Radium wrote:
        [color=blue]
        > I thought that only XHTML is XML compatible, but not HTML.
        > So it would not be possible to read it via XML extension.
        >
        > Can someone please comment on that thought?[/color]

        These are rather sweeping descriptions - not actual language descriptions.
        The short answer is that if your code isn't xml you need to get it fixed
        soon.

        C.

        Comment

        • John Dunlop

          #5
          Re: Analyze and read in html file

          Radium:
          [color=blue]
          > what i want is something similar to th simple-xml extension of php, but for
          > html.[/color]

          Be warned that there are two kinds of HTML: SGML-HTML, as specified
          in HTML specs, and tag-soup-HTML, as digested by browsers.

          --
          Jock

          Comment

          • Malcolm Dew-Jones

            #6
            Re: Analyze and read in html file

            Radium (uh5d@rz.uni-karlsruhe.de) wrote:
            : Hi,

            : what i want is something similar to th simple-xml extension of php, but for
            : html.

            : I have to analyze and read in certain tags from a html file in a comfortable
            : manner.
            : Is there a php extension/library which makes this possible?

            In php, not that I know off though I would like to be wrong.

            If you know any perl then use the excellent HTML::Parser. It handles just
            about anything that a web site might throw at it. You could use the perl
            script to build a PHP script


            Assume text input something like

            <html><head><ti tle>example page</title> (etc)


            So write a perl script with handlers something like (totally pseudo code)

            sub do_start_tag
            {
            my $tag_name = this is available in the parser, but I forget how
            print TMP_PHP_SCRIPT , "handle_tag('$t ag_name');\n";
            }

            sub do_text
            {
            my $raw_text = this is available in the parser, but I forget how
            my $safe_text = quotemeta($raw_ text);
            print TMP_PHP_SCRIPT , "handle_text('$ safe_text');\n" ;
            }

            sub do_end_tag
            {
            my $tag_name = this is available in the parser, but I forget how
            print TMP_PHP_SCRIPT , "handle_end_tag ('$tag_name');\ n";
            }


            From that you would get a temporary files with lines like

            handle_tag('htm l');
            handle_tag('hea d');
            handle_tag('tit le');
            handle_text( 'example page');
            handle_end_tag( 'title');
            handle_end_tag( 'head');


            Your main php script would run the perl script, and then run the temporary
            php script (example shown just above), and your php functions like
            handle_tag etc would be called just as if you had been able to parse the
            data directly from within php.

            $0.10

            Comment

            Working...