Get all document contents

**Martin Honnen** · Jul 23 '05, 03:41 PM

Re: Get all document contents

Christopher Benson-Manica wrote:
[color=blue]
> Is there a way to get the entire contents of the current document as a
> string?[/color]

Some browsers like IE or Opera allow you to serialize an element so you
can use
document.docume ntElement.outer HTML
to get the serialized markup of the <HTML> element.
[color=blue]
> I want to send the document contents to a markup validation
> service.[/color]

Send them the URL then, that way they can fetch the contents. outerHTML
will hardly do for validation as browsers apply their own serialization
and that way while your source might be XHTML with lower case tag names
the outerHTML might contain tags in upper case letters.

Of course there are also browser dependant methods to get the source of
the page, see

http://jibbering.com/faq/#FAQ4_38

but XMLHttpRequest' s responseText is known for instance to not handle
ISO-8859-x encodings properly.

Martin Honnen

Attention Required! | Cloudflare

http://JavaScript.FAQTs.com/

**Christopher Benson-Manica** · Jul 23 '05, 03:41 PM

Re: Get all document contents

Martin Honnen <mahotrash@yaho o.de> spoke thus:
[color=blue]
> Send them the URL then, that way they can fetch the contents.[/color]

Obviously that would be the easy solution, but the pages I'd like to
do this with aren't accessible to the validator (users must be logged
in to view these pages).
[color=blue]
> outerHTML
> will hardly do for validation as browsers apply their own serialization
> and that way while your source might be XHTML with lower case tag names
> the outerHTML might contain tags in upper case letters.[/color]

Hm, I see the problem. For the purposes of validation, though, it
should be possible to clean up the string without too much trouble
(convert all characters to lowercase to take care of the tags)
although it seems that attributes lose their enclosing double quotes
as well, which is unfortunate.
[color=blue]
> Of course there are also browser dependant methods to get the source of
> the page, see
> http://jibbering.com/faq/#FAQ4_38
> but XMLHttpRequest' s responseText is known for instance to not handle
> ISO-8859-x encodings properly.[/color]

In what way does it fail to handle such encodings? I'll look into
something like this and see if I can make it work. Thanks.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cybers pace.org | don't, I need to know. Flames welcome.

**Matt Kruse** · Jul 23 '05, 03:41 PM

Re: Get all document contents

Christopher Benson-Manica wrote:[color=blue]
> Hm, I see the problem. For the purposes of validation, though, it
> should be possible to clean up the string without too much trouble
> (convert all characters to lowercase to take care of the tags)
> although it seems that attributes lose their enclosing double quotes
> as well, which is unfortunate.[/color]

In addition to that, browsers will add tags and content where there is none
in the source.
For example, adding <tbody> tags to tables, even if it's not in your source.

Examining the browser's internal representation of your source is inadequate
for validation.

--
Matt Kruse

JavascriptToolbox.com - Domain for sale

http://www.JavascriptToolbox.com

**RobG** · Jul 23 '05, 03:42 PM

Re: Get all document contents

Christopher Benson-Manica wrote:[color=blue]
> Martin Honnen <mahotrash@yaho o.de> spoke thus:
>
>[color=green]
>>Send them the URL then, that way they can fetch the contents.[/color]
>
>
> Obviously that would be the easy solution, but the pages I'd like to
> do this with aren't accessible to the validator (users must be logged
> in to view these pages).[/color]

You can install the W3C validator locally.

Allowing a browser to parse the HTML first and then send it to the
validator will effectively invalidate your validation. AFAIK (but I
may well be wrong), you can't get the doctype declaration which is
fundamental to validating the page.

--
Rob

**Richard Cornford** · Jul 23 '05, 03:42 PM

Re: Get all document contents

RobG wrote:
<snip>[color=blue]
> ... . AFAIK (but I may well be wrong),
> you can't get the doctype declaration which is
> fundamental to validating the page.[/color]

On Mozilla and Opera (recent versions):-

document.doctyp e (object)
document.doctyp e.publicId (string)
document.doctyp e.systemId (string)

- could be used to re-produce it.

The other issues raised about the likely validity of a serialised DOM
makes doing so pointless in this context, but where a serialised DOM has
other uses it can be employed to make the results more complete (along
with maybe iterating the attributes collection of the documentElement in
order to supplement innerHTML with accurate HTML tags).

Richard.

Get all document contents

Get all document contents

Comment

Comment

Comment

Comment

Comment