Howto use php as filter for HTML files? Curl?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Peter Valdemar M?rch

    Howto use php as filter for HTML files? Curl?

    Hi,

    In short, how to modify selected tags/sections of a HTML file, using
    PHP as the "modifier"/filter? I would have thought this was a very
    common usage for PHP...

    I have a set of existing .html files that are plain and ugly. I'd like
    to create a showdoc.php filter that adds consistent menus, css, look
    and feel, so that http://me/showdoc.php?d=story shows a nicely
    formatted http://me/story.html
    It:
    * puts in a nice standard header
    * opens story.html
    * extacts all <link> and <script> tags from the story <head>
    and adds them to the output <head>
    * extracts everything between <body> and </body>
    * rewrites all non-absolute hrefs e.g.
    <a href="other.htm l"> to <a href="showdoc.p hp?d=other">
    * closes story.html
    * puts in a nice standard footer

    I realize I can do this by editing all the .html files instead, but
    can't I just use php as a filter? Am I the first person to want to do
    this?

    How?

    * I _really_ want to avoid using regexps to match e.g. body and hrefs,
    because there are so many caveats involved. Multiline tags,
    attributes, for starters. Or how about <nasty attr="</body>"></nasty>
    (not sure that really is legal, though...)

    * xml_parse() parses XML and HTML is not XML (e.g. valid HTML missing
    </end> tags) so xml_parse is out. Or what?

    * Since I want to preserve all the <body> except the rewritten hrefs,
    if there is a parser involved, I'd like for any parser to produce
    output that is easy to re-flatten when generating output.

    There are examples out there using CURL, but they often are so simple
    that they don't print out *anything* on their own and only the output
    of curl_exec(). In any useful application, wouldn't everyone have to
    extract selected info from the retrieved web page? What do CURL users
    do? regexps only?

    Peter
Working...