Alternative to documentElement.innerHTML?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Kyle

    Alternative to documentElement.innerHTML?

    I am presently making use of documentElement .innerHTML to retrieve
    page contents for manipulation, but I've noticed that the sting value
    returned is not identical to the actual page source. Specifically,
    attribute assignments that look like:

    height=100 width=100

    in the real source, look like:

    height="100" width="100"

    in the returned value from documentElement .innerHTML.

    Further complicating things, forms that begin insode a table in this
    manner:

    <table><form ...><tr><td...> <input...></form></td>...

    Are returned as:

    <table><form ...></form><tr><td... ><input...

    If I modify the returned value from documentElement .innerHTML, then
    write it back to documentElement .innerHTML, many of the forms are
    non-functional.

    I am interested in any available alternatives that will function in
    recent Mozilla releases. Thank you,

    -Kyle
  • Randy Webb

    #2
    Re: Alternative to documentElement .innerHTML?

    Kyle wrote:[color=blue]
    > I am presently making use of documentElement .innerHTML to retrieve
    > page contents for manipulation, but I've noticed that the sting value
    > returned is not identical to the actual page source. Specifically,
    > attribute assignments that look like:
    >
    > height=100 width=100
    >
    > in the real source, look like:
    >
    > height="100" width="100"
    >
    > in the returned value from documentElement .innerHTML.
    >
    > Further complicating things, forms that begin insode a table in this
    > manner:
    >
    > <table><form ...><tr><td...> <input...></form></td>...
    >
    > Are returned as:
    >
    > <table><form ...></form><tr><td... ><input...
    >
    > If I modify the returned value from documentElement .innerHTML, then
    > write it back to documentElement .innerHTML, many of the forms are
    > non-functional.
    >
    > I am interested in any available alternatives that will function in
    > recent Mozilla releases. Thank you,[/color]

    Validate your (X)HTML and you will solve a lot of those problems. Along
    with dropping tables for layout.

    Read the group FAQ, it discusses how to read a text file (2 methods),
    which is what you are trying to do.

    --
    Randy
    Chance Favors The Prepared Mind
    comp.lang.javas cript FAQ - http://jibbering.com/faq/

    Comment

    • Dirk Feytons

      #3
      Re: Alternative to documentElement .innerHTML?

      Kyle wrote:
      [color=blue]
      > I am presently making use of documentElement .innerHTML to retrieve
      > page contents for manipulation, but I've noticed that the sting value
      > returned is not identical to the actual page source. Specifically,
      > attribute assignments that look like:[/color]
      [...]

      Take a look at the DOM specification of W3C. Lots of methods to
      manipulate your document.

      --
      Dirk

      (PGP keyID: 0x448BC5DD - http://www.gnupg.org - http://www.pgp.com)

      ..oO° If I say I love you, will you stay or want to wither, fade away. If
      I show you the sun and the night of my past, will a smile cross your
      face but just vanish too fast. °Oo.

      Comment

      • PeEmm

        #4
        Re: Alternative to documentElement .innerHTML?

        Kyle skrev, On 1/25/2004 6:51 AM:[color=blue]
        > I am presently making use of documentElement .innerHTML to retrieve
        > page contents for manipulation, but I've noticed that the sting value
        > returned is not identical to the actual page source. Specifically,
        > attribute assignments that look like:
        >
        > height=100 width=100
        >
        > in the real source, look like:
        >
        > height="100" width="100"
        >
        > in the returned value from documentElement .innerHTML.
        >
        > Further complicating things, forms that begin insode a table in this
        > manner:
        >
        > <table><form ...><tr><td...> <input...></form></td>...
        >
        > Are returned as:
        >
        > <table><form ...></form><tr><td... ><input...
        >
        > If I modify the returned value from documentElement .innerHTML, then
        > write it back to documentElement .innerHTML, many of the forms are
        > non-functional.
        >
        > I am interested in any available alternatives that will function in
        > recent Mozilla releases. Thank you,
        >
        > -Kyle[/color]

        The DOM naturally only functions as expected, if the HTML source is as
        expected, i.e. is valid due to standards. The examples you give above
        are malformed HTML, so the DOM tries to do something about the mishmash.

        --
        /P.M.

        Comment

        • Kyle

          #5
          Re: Alternative to documentElement .innerHTML?

          Randy Webb <hikksnotathome @aol.com> wrote in message news:<4KKdnTApi IG6_47dRVn-vw@comcast.com> ...[color=blue]
          > Kyle wrote:[color=green]
          > > I am presently making use of documentElement .innerHTML to retrieve
          > > page contents for manipulation, but I've noticed that the sting value
          > > returned is not identical to the actual page source. Specifically,
          > > attribute assignments that look like:
          > >
          > > height=100 width=100
          > >
          > > in the real source, look like:
          > >
          > > height="100" width="100"
          > >
          > > in the returned value from documentElement .innerHTML.
          > >
          > > Further complicating things, forms that begin insode a table in this
          > > manner:
          > >
          > > <table><form ...><tr><td...> <input...></form></td>...
          > >
          > > Are returned as:
          > >
          > > <table><form ...></form><tr><td... ><input...
          > >
          > > If I modify the returned value from documentElement .innerHTML, then
          > > write it back to documentElement .innerHTML, many of the forms are
          > > non-functional.
          > >
          > > I am interested in any available alternatives that will function in
          > > recent Mozilla releases. Thank you,[/color]
          >
          > Validate your (X)HTML and you will solve a lot of those problems. Along
          > with dropping tables for layout.[/color]

          This code is resident in a Mozilla extension, not a page that I've
          written. It isn't my HTML that I need to parse so I have no control
          over it's validity.
          [color=blue]
          > Read the group FAQ, it discusses how to read a text file (2 methods),
          > which is what you are trying to do.[/color]

          I don't understand what you mean here. As far as I know, the "file"
          does not exist anywhere in the filesystem so this is untrue. I assume
          this content is somewhere in memory because "View Source" and Sherlock
          plugins make use of the real source without accessing the page a 2nd
          time.

          Thanks for any input.

          --Kyle

          Comment

          • Kyle

            #6
            Re: Alternative to documentElement .innerHTML?

            PeEmm <lars.pm@ebox.t ninet.se> wrote in message news:<bv08hn$na v1@ripley.netsc ape.com>...[color=blue]
            > Kyle skrev, On 1/25/2004 6:51 AM:[color=green]
            > > I am presently making use of documentElement .innerHTML to retrieve
            > > page contents for manipulation, but I've noticed that the sting value
            > > returned is not identical to the actual page source. Specifically,
            > > attribute assignments that look like:
            > >
            > > height=100 width=100
            > >
            > > in the real source, look like:
            > >
            > > height="100" width="100"
            > >
            > > in the returned value from documentElement .innerHTML.
            > >
            > > Further complicating things, forms that begin insode a table in this
            > > manner:
            > >
            > > <table><form ...><tr><td...> <input...></form></td>...
            > >
            > > Are returned as:
            > >
            > > <table><form ...></form><tr><td... ><input...
            > >
            > > If I modify the returned value from documentElement .innerHTML, then
            > > write it back to documentElement .innerHTML, many of the forms are
            > > non-functional.
            > >
            > > I am interested in any available alternatives that will function in
            > > recent Mozilla releases. Thank you,
            > >
            > > -Kyle[/color]
            >
            > The DOM naturally only functions as expected, if the HTML source is as
            > expected, i.e. is valid due to standards. The examples you give above
            > are malformed HTML, so the DOM tries to do something about the mishmash.[/color]

            I should have been more clear. This is a Mozilla Chrome extension, so
            I assume that I should have access to the same methods that Mozilla
            uses to display the source with "View Source" and retrieve the source
            for parsing with Sherlock plugins. Thanks,

            --Kyle

            Comment

            • Randy Webb

              #7
              Re: Alternative to documentElement .innerHTML?

              Kyle wrote:
              [color=blue]
              > Randy Webb <hikksnotathome @aol.com> wrote in message news:<4KKdnTApi IG6_47dRVn-vw@comcast.com> ...
              >[color=green]
              >>Kyle wrote:
              >>[color=darkred]
              >>>I am interested in any available alternatives that will function in
              >>>recent Mozilla releases. Thank you,[/color]
              >>
              >>Validate your (X)HTML and you will solve a lot of those problems. Along
              >>with dropping tables for layout.[/color]
              >
              >
              > This code is resident in a Mozilla extension, not a page that I've
              > written. It isn't my HTML that I need to parse so I have no control
              > over it's validity.[/color]

              Ok.
              [color=blue]
              >[color=green]
              >>Read the group FAQ, it discusses how to read a text file (2 methods),
              >>which is what you are trying to do.[/color]
              >
              >
              > I don't understand what you mean here. As far as I know, the "file"
              > does not exist anywhere in the filesystem so this is untrue. I assume
              > this content is somewhere in memory because "View Source" and Sherlock
              > plugins make use of the real source without accessing the page a 2nd
              > time.[/color]

              My response was in direct relation to the assumption (that is now
              incorrect) that you were trying to read the HTML code of an HTML file,
              and you wanted the original code, not the rendered code (they are
              different).

              If you load a page, and then do
              javascript:aler t(document.docu mentElement.inn erHTML);
              In the address bar, and then view the source of the page, on very very
              few occasions will they be the same code.

              Example:
              When I open IE, it opens to about:blank. (actually, all of my browsers
              are set to open to about:blank)
              View>Source gives this code:
              <HTML></HTML>
              And thats it.
              javascript:aler t(document.docu mentElement.inn erHTML);
              alerts this:
              <HEAD></HEAD>
              <BODY></BODY>

              In Mozilla, about:blank view>Source gives this code:
              <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
              <html>
              <head><title> </title></head>
              <body></body>
              </html>

              I line broke it for readability.

              javascript:aler t(document.docu mentElement.inn erHTML);
              gives this code:

              <head><title> </title></head><body></body>

              Note the missing DTD and HTML tags.

              In order to get the original, written code, of a webpage, into a
              variable that the page's javascript can use, you have to read the file
              from the server. And the only two ways I know of to do that is with an
              HTTPRequestObje ct or a JAVA applet, hence my suggestion to consult the FAQ.

              Whether any of that helps with you trying to read a Mozilla Skin plugin,
              I don't know :(


              --
              Randy
              Chance Favors The Prepared Mind
              comp.lang.javas cript FAQ - http://jibbering.com/faq/

              Comment

              • Lasse Reichstein Nielsen

                #8
                Re: Alternative to documentElement .innerHTML?

                Randy Webb <hikksnotathome @aol.com> writes:
                [color=blue]
                > If you load a page, and then do
                > javascript:aler t(document.docu mentElement.inn erHTML);
                > In the address bar, and then view the source of the page, on very very
                > few occasions will they be the same code.[/color]

                Yes, browsers build the innerHTML structure from the current structure
                of the document, whereas the view-source shows the original source code.
                That means that innerHTML is "unparsing" the DOM tree structure, and
                it would be surpricing if it gave exactly the same formatting as the
                original source, even if the structure was the same.

                ....[color=blue]
                > In Mozilla, about:blank view>Source gives this code:
                > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
                > <html>[/color]
                ....
                [color=blue]
                > javascript:aler t(document.docu mentElement.inn erHTML);
                > gives this code:
                >
                > <head><title> </title></head><body></body>
                >
                > Note the missing DTD and HTML tags.[/color]

                Not surpricing since you ask for the *inner*HTML of the HTML element.
                If Mozilla supported the "outerHTML" property, you could also show
                the HTML tag. The document type element is even harder to find. It
                is the first child of the document element (where the HTML element
                is the second).

                /L
                --
                Lasse Reichstein Nielsen - lrn@hotpop.com
                DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
                'Faith without judgement merely degrades the spirit divine.'

                Comment

                • Randy Webb

                  #9
                  Re: Alternative to documentElement .innerHTML?

                  Lasse Reichstein Nielsen wrote:[color=blue]
                  > Randy Webb <hikksnotathome @aol.com> writes:
                  >
                  >[color=green]
                  >>If you load a page, and then do
                  >>javascript:al ert(document.do cumentElement.i nnerHTML);
                  >>In the address bar, and then view the source of the page, on very very
                  >>few occasions will they be the same code.[/color]
                  >
                  >
                  > Yes, browsers build the innerHTML structure from the current structure
                  > of the document, whereas the view-source shows the original source code.
                  > That means that innerHTML is "unparsing" the DOM tree structure, and
                  > it would be surpricing if it gave exactly the same formatting as the
                  > original source, even if the structure was the same.[/color]

                  Formatting aside, even then there are very very few occasions where the
                  browser will give you what it got. The only way to make them match
                  (aside from the DTD and HTML tags), is to grab it, paste it into your
                  editor and then use that code.
                  [color=blue]
                  > ....
                  >[color=green]
                  >>In Mozilla, about:blank view>Source gives this code:
                  >><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
                  >><html>[/color]
                  >
                  > ....
                  >
                  >[color=green]
                  >>javascript:al ert(document.do cumentElement.i nnerHTML);
                  >>gives this code:
                  >>
                  >><head><title> </title></head><body></body>
                  >>
                  >>Note the missing DTD and HTML tags.[/color]
                  >
                  >
                  > Not surpricing since you ask for the *inner*HTML of the HTML element.[/color]

                  True. But it only serves to reinforce my statement that if you want the
                  complete code of the file, you *must* read it from the server, and skip
                  the parsing. The only two ways I know of to do that is with a java
                  applet (most widely supported) or with an HTTPRequestObje ct.

                  --
                  Randy
                  Chance Favors The Prepared Mind
                  comp.lang.javas cript FAQ - http://jibbering.com/faq/

                  Comment

                  Working...