Saving a web page

**Andy** · Jul 3 '06, 08:15 PM

Re: Saving a web page

You'll have to get the img tags and download them manually; basically,
write some code which normally a browser would do.

So, parse the <imgtags (and <atags, if you like), then use
HttpRequest to get the images.

HTH
Andy

alexey_r@mail.r u wrote:

Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.
>
But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?
>
The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.
>
Thank you in advance!

**Tom Spink** · Jul 3 '06, 08:25 PM

Re: Saving a web page

alexey_r@mail.r u wrote:

Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.
>
But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?
>
The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.
>
Thank you in advance!

Hi,

Unfortunately, there isn't a simple way. The way web-browsers (usually)
work is that they start rendering the page, and download the
images/stylesheets/whatnot as they need them. They're parsing the HTML,
finding an <imgtag, or a <linktag and deciding to download the file
that the tag is referencing.

You'll need to do this; i.e. analyse the HTML you've received, and decide
what needs to be downloaded by looking at the tags.

--
Hope this helps,
Tom Spink

**Michael Nemtsev** · Jul 3 '06, 08:45 PM

Re: Saving a web page

Hello alexey_r@mail.r u,

I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT

PS: This lib could be used for parsing http://www.codeproject.com/csharp/mime_project.asp

Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.
>
But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets
and so on. Is there a simple way to get the entire webpage?
>
The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.
Thank you in advance!
>

---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche

**alexey_r@mail.ru** · Jul 4 '06, 05:25 AM

Re: Saving a web page

Michael Nemtsev wrote:

Hello alexey_r@mail.r u,
>
I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT

Ah, thank you. But how do I save it as MHT?

**alexey_r@mail.ru** · Jul 4 '06, 05:25 AM

Re: Saving a web page

Tom Spink wrote:

alexey_r@mail.r u wrote:
>

Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.

Thank you in advance!

>
Hi,
>
Unfortunately, there isn't a simple way. The way web-browsers (usually)
work is that they start rendering the page, and download the
images/stylesheets/whatnot as they need them. They're parsing the HTML,
finding an <imgtag, or a <linktag and deciding to download the file
that the tag is referencing.
>
You'll need to do this; i.e. analyse the HTML you've received, and decide
what needs to be downloaded by looking at the tags.

Thank you.

**Michael Nemtsev** · Jul 4 '06, 05:55 AM

Re: Saving a web page

Hello alexey_r@mail.r u,

http://groups.google.com/groups/search?q=dotnet+save+mht

Michael Nemtsev wrote:

>I'd save page into MHT (web archive) and then parse it to get images
>BTW images are encoded in the MHT
>>

Ah, thank you. But how do I save it as MHT?
>

---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche

**alexey_r@mail.ru** · Jul 4 '06, 04:35 PM

Re: Saving a web page

Michael Nemtsev wrote:

Hello alexey_r@mail.r u,
>
http://groups.google.com/groups/sear...otnet+save+mht

Thank you again! Looks like it won't work for websites protected by
password, so I am back to plan A.

Michael Nemtsev wrote:

I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT
>

Ah, thank you. But how do I save it as MHT?

**Michael Nemtsev** · Jul 4 '06, 05:25 PM

Re: Saving a web page

Hello alexey_r@mail.r u,

What does "websites protected by password"?
Any example?
Have you tried to save that sites to MHT via IE?

Michael Nemtsev wrote:
>

>Hello alexey_r@mail.r u,
>>
>http://groups.google.com/groups/sear...otnet+save+mht
>>

Thank you again! Looks like it won't work for websites protected by
password, so I am back to plan A.
>

>>Michael Nemtsev wrote:
>>>
>>>I'd save page into MHT (web archive) and then parse it to get
>>>images BTW images are encoded in the MHT
>>>>
>>Ah, thank you. But how do I save it as MHT?
>>>

---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche

Saving a web page

Saving a web page

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment