LWP questions

**Roel van der Steen** · Jul 19 '05, 05:00 AM

Re: LWP questions

On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rbell01824@ear thlink.net> wrote:[color=blue]
> I'm considering using LWP as the heart of a Web application and have a
> number of questions.[/color]

LWP does not render the page, nor does it execute (client-side)
scripts, nor does it provide you with a DOM. However, you can
get the HTML using LWP and parse that with any of the available
HTML parsers (e.g., HTML-TreeBuilder).

#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuild er;
use LWP::Simple;

my $cachefile = 'mirrored.htm';

mirror('http://cpan.org', $cachefile);

my $tree = HTML::TreeBuild er->new_from_file( $cachefile);

my $h1 = $tree->look_down('_ta g', 'table');
print $h1->as_text if $h1;

**Richard Bell** · Jul 19 '05, 05:00 AM

Re: LWP questions

Thanks Roel, that was very helpful.

For my application, I need something that will do all such things as
might happen in a real browser that would create user visible content
on the screen. For many of the pages I'll be working with that
includes various client side scripts and includes. While LWP gets
part of the way, it doesn't seem to go as far as this project needs.

As I mentioned, I'm newly returned to Unix/Linux and Perl. Is there
something that might be more appropriate? I've some previous
experience in IE com automation under XP. Can I play the same sort of
game (or hopefully a simpler one) under Linux? What do I use for an
engine? Can I get by with wget (it seems to do a good job of
mirroring)? Will I need to work with Mozilla?

I'd appreciate any advice.

Thanks again.

R

On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <roel-perl@st2x.net>
wrote:
[color=blue]
>On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rbell01824@ear thlink.net> wrote:[color=green]
>> I'm considering using LWP as the heart of a Web application and have a
>> number of questions.[/color]
>
>LWP does not render the page, nor does it execute (client-side)
>scripts, nor does it provide you with a DOM. However, you can
>get the HTML using LWP and parse that with any of the available
>HTML parsers (e.g., HTML-TreeBuilder).
>
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>use HTML::TreeBuild er;
>use LWP::Simple;
>
>my $cachefile = 'mirrored.htm';
>
>mirror('http ://cpan.org', $cachefile);
>
>my $tree = HTML::TreeBuild er->new_from_file( $cachefile);
>
>my $h1 = $tree->look_down('_ta g', 'table');
>print $h1->as_text if $h1;[/color]

**Roel van der Steen** · Jul 19 '05, 05:00 AM

Re: LWP questions

(Top-posting reordered.)

On Wed, 17 Mar 2004 at 01:50 GMT, Richard Bell <rbell01824@ear thlink.net> wrote:[color=blue]
> On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <roel-perl@st2x.net>
> wrote:
>[color=green]
>>On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rbell01824@ear thlink.net> wrote:[color=darkred]
>>> I'm considering using LWP as the heart of a Web application and have a
>>> number of questions.[/color]
>>
>>LWP does not render the page, nor does it execute (client-side)
>>scripts, nor does it provide you with a DOM.[/color]
>
> For many of the pages I'll be working with that
> includes various client side scripts and includes.
>[/color]
Maybe HTML:Display is more in the direction you want. Or WWW::Mechanize.
Did you already have a look at http://cpan.org ?

**Richard Bell** · Jul 19 '05, 05:00 AM

Re: LWP questions

On 17 Mar 2004 03:12:38 GMT, Roel van der Steen <roel-perl@st2x.net>
wrote:
[color=blue]
>(Top-posting reordered.)
>
>On Wed, 17 Mar 2004 at 01:50 GMT, Richard Bell <rbell01824@ear thlink.net> wrote:[color=green]
>> On 17 Mar 2004 00:55:24 GMT, Roel van der Steen <roel-perl@st2x.net>
>> wrote:
>>[color=darkred]
>>>On Tue, 16 Mar 2004 at 18:01 GMT, Richard Bell <rbell01824@ear thlink.net> wrote:
>>>> I'm considering using LWP as the heart of a Web application and have a
>>>> number of questions.
>>>
>>>LWP does not render the page, nor does it execute (client-side)
>>>scripts, nor does it provide you with a DOM.[/color]
>>
>> For many of the pages I'll be working with that
>> includes various client side scripts and includes.
>>[/color]
>Maybe HTML:Display is more in the direction you want. Or WWW::Mechanize.[/color]

Thanks, I'll look into HTML:Display and WWW:Mechanize. I picked up
the O'Reilly books and am also checking the web on these packages, but
the learning curve right now is a bit stiff particularly when I'm not
really sure where to look or what to look at. Thanks for your help.
[color=blue]
>Did you already have a look at http://cpan.org ?[/color]

I have checked cpan. Lots of apparently good stuff there, but again
I'm faced with not knowing what is really appropriate for my needs.

I've thought about trying to automate Mozilla and accessing its DOM
object to get at what I want. Do you have any reflections on that
attack?

Thanks again for the new clues.

R

**Joe Smith** · Jul 19 '05, 05:00 AM

Re: LWP questions

Richard Bell wrote:
[color=blue]
> Thanks Roel, that was very helpful.
>
> For my application, I need something that will do all such things as
> might happen in a real browser that would create user visible content
> on the screen. For many of the pages I'll be working with that
> includes various client side scripts and includes. While LWP gets
> part of the way, it doesn't seem to go as far as this project needs.[/color]

When LWP requests a page from a server, it is no different than any
other brower's request, in that the server will process server-side
includes.

If the HTML returned contains JavaScript, it is up to you to provide
a JavaScript interpreter. I've seen many JavaScript functions that
do things like ask the graphic brower it is running in as to the
size (in pixels) of the currently active window so that it can
decide on the layout of the text is will be writing to the
document window. Other JavaScript uses include reading or
modifying the text being displayed in a field of a form. (Think of
<input type="text" name="clock" value="12:45:00 pm">.)

In other words, to handle a full range of client-side scripts,
you will have to re-invent a very large wheel: a complete browser
with graphical display and GUI widgets.

LWP is good at getting the raw HTML from the server. Postprocessing
the HTML on the client side before, during, and after rendering is
an entirely different kettle of fish.

I certainly would not want to emulate the quirks (features, bugs) of
IE 6 vs IE 5 vs Netscape vs Mozilla vs Opera.
-Joe

specific.

**Richard Bell** · Jul 19 '05, 05:00 AM

Re: LWP questions

No one ever said it would be easy.

I'm now looking into automating Mozilla (let it do the heavy lifting),
possibly from perl, possibly using the Mozilla application
environment. Any ideas where I can get clues/examples/insight into
the issues from the perl side? I've got the O'Reilly book for the app
environment so I'm reasonably armed there.

Richard

On Sat, 20 Mar 2004 22:33:08 GMT, Joe Smith <Joe.Smith@inwa p.com>
wrote:
[color=blue]
>Richard Bell wrote:
>[color=green]
>> Thanks Roel, that was very helpful.
>>
>> For my application, I need something that will do all such things as
>> might happen in a real browser that would create user visible content
>> on the screen. For many of the pages I'll be working with that
>> includes various client side scripts and includes. While LWP gets
>> part of the way, it doesn't seem to go as far as this project needs.[/color]
>
>When LWP requests a page from a server, it is no different than any
>other brower's request, in that the server will process server-side
>includes.
>
>If the HTML returned contains JavaScript, it is up to you to provide
>a JavaScript interpreter. I've seen many JavaScript functions that
>do things like ask the graphic brower it is running in as to the
>size (in pixels) of the currently active window so that it can
>decide on the layout of the text is will be writing to the
>document window. Other JavaScript uses include reading or
>modifying the text being displayed in a field of a form. (Think of
><input type="text" name="clock" value="12:45:00 pm">.)
>
>In other words, to handle a full range of client-side scripts,
>you will have to re-invent a very large wheel: a complete browser
>with graphical display and GUI widgets.
>
>LWP is good at getting the raw HTML from the server. Postprocessing
>the HTML on the client side before, during, and after rendering is
>an entirely different kettle of fish.
>
>I certainly would not want to emulate the quirks (features, bugs) of
>IE 6 vs IE 5 vs Netscape vs Mozilla vs Opera.
> -Joe
>
>specific.[/color]

LWP questions

LWP questions

Comment

Comment

Comment

Comment

Comment

Comment