I'm returning to Perl and Linux after many years away and while I
know/knew way back when about Perl and Unix I'm new to this world
today.
I'm considering using LWP as the heart of a Web application and have a
number of questions.
It appears to me that the Get method returns ONLY the content of the
single object referenced by the URL. Is this correct? To what
degree, if any, does LWP Get deal with script on the page that may be
involved in building the page content?
In the end, I need to get a page in much the same way a browser does
and then examine it, looking at the text on the page (as it would be
rendered by IE or Mozilla) for a bunch of stuff. I also need to
examine the HTML as it exist in the abstract for the page as actually
displayed for a bunch of stuff. On XP (no flame please, surely Perl
programmers can forgive an attachment to the ugly real world) the IE
object model has two objects InnerText and InnerHTML. InnerText is a
linearized version of the text as displayed on the page AFTER all
scripts have executed. InnerHTML seems to be the HTML that would
exist to create the page AFTER all scripts have executed. It is this
kind of structure that I need. Can LWP help me here? What is the
basic attack? Are there any examples in the Perl world.
Thanks for any help/clues.
R
Comment