How does a browser parse an html document?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Viken Karaguesian

    How does a browser parse an html document?

    Hello everyone,

    Me again. Trying to learn some more :>) I hope I got the terminology
    right.

    How does a browser parse (correct term?) an HTML document. I'm sure
    that every browser does it a little differently. Do they simply just
    read a document top-to-bottom and left-to-right and just display
    elements in the order in which they encounter them? Or, do they give
    priority to certain types of content? For instance, would a browser
    display text first, then all the images, then all Javascripts, etc?
    Say, for example, I had a large image (let's say 1MB). The image is
    floated left of my paragraph, so this is my code:

    <img src="image.jpg" style="float: left">
    <p> blah blah blah, yadda yadda yadda </p>

    Would a browser just load these in order, so I would have to wait for
    the 1MB image to load, then I'd see the text?

    I ask because there are times I just can't tell. Of couse, a local copy
    of a website is always fast - it's on your hard drive. But there are
    times I will load the same page over and over again (emptying the cache
    in between reloads) and it seems to load in a different order each
    time. Sometimes I see text first, sometimes I see images first.

    Also, knowing this would also help me optimize pages to load faster.

    Thanks to all who reply. I'm really having fun with this and am
    enjoying learning. I may even go to a local university to see if there
    are any classes available :>)

    Viken K.

  • Beauregard T. Shagnasty

    #2
    Re: How does a browser parse an html document?

    Viken Karaguesian wrote:
    [color=blue]
    > Thanks to all who reply. I'm really having fun with this and am
    > enjoying learning. I may even go to a local university to see if
    > there are any classes available :>)[/color]

    If you missed the thread on alt.www.webmaster about Uni classes, have a
    read of "Web Design Teachers" from a few days ago:

    <http://groups.google.c om/group/alt.www.webmast er/browse_frm/thread/9260ab7b04c2a81 2/2c3e5a604de2f53 2?tvc=1&q=web+d esign+teachers+ alt.www.webmast er&hl=en#2c3e5a 604de2f532>

    Tread carefully with the school. You're likely to learn much more useful
    stuff by lurking and hanging around these newsgroups. Remember to ask
    intelligent questions. <g>

    --
    -bts
    -Warning: I brake for lawn deer

    Comment

    • Benjamin Niemann

      #3
      Re: How does a browser parse an html document?

      Viken Karaguesian wrote:
      [color=blue]
      > How does a browser parse (correct term?) an HTML document. I'm sure
      > that every browser does it a little differently. Do they simply just
      > read a document top-to-bottom and left-to-right and just display
      > elements in the order in which they encounter them? Or, do they give
      > priority to certain types of content? For instance, would a browser
      > display text first, then all the images, then all Javascripts, etc?
      > Say, for example, I had a large image (let's say 1MB). The image is
      > floated left of my paragraph, so this is my code:
      >
      > <img src="image.jpg" style="float: left">
      > <p> blah blah blah, yadda yadda yadda </p>
      >
      > Would a browser just load these in order, so I would have to wait for
      > the 1MB image to load, then I'd see the text?[/color]

      The image will be downloaded in the background while the rest of the
      document is loaded/parsed (same happends for other external files like
      stylesheet, applets... but not JavaScript files AFAIK). If there would be
      WIDTH and HEIGHT attributes for the image, then the browser could reserve
      the right amount of space on the page and the following text can flow
      around it, even before the image is displayed. Without WIDTH/HEIGHT the
      browser will sometimes render the following text as if there is no image
      and then recalculate the layout when it knows the dimensions of the image.
      [color=blue]
      > Also, knowing this would also help me optimize pages to load faster.[/color]

      If you specify WIDTH/HEIGHT in your images, the other stuff on your page
      will not 'jump' around when the browser recalculates the layout. Speed
      improvement might be minimal, but it just looks much smoother.

      --
      Benjamin Niemann
      Email: pink at odahoda dot de
      WWW: http://www.odahoda.de/

      Comment

      • Viken Karaguesian

        #4
        Re: How does a browser parse an html document?

        > If you missed the thread on alt.www.webmaster about Uni classes, have a[color=blue]
        > read of "Web Design Teachers" from a few days ago:
        >
        > <http://groups.google.c om/group/alt.www.webmast er/browse_frm/thread/9260ab7b04c2a81 2/2c3e5a604de2f53 2?tvc=1&q=web+d esign+teachers+ alt.www.webmast er&hl=en#2c3e5a 604de2f532>[/color]

        Yikes! Wow...I'm at a loss for words.

        Viken K.

        Comment

        • Alan J. Flavell

          #5
          Re: How does a browser parse an html document?

          On Wed, 25 Jan 2006, Beauregard T. Shagnasty wrote:
          [color=blue]
          > If you missed the thread on alt.www.webmaster about Uni classes,
          > have a read of "Web Design Teachers" from a few days ago:[/color]

          That's horrible... There was an analogous discussion on
          uk.net.web.auth oring a couple of weeks past, from someone who was
          required to teach a web design module to a bizarre syllabus,
          apparently written by someone who didn't really understand the WWW.

          (An additional air of unreality was that they weren't allowed to use a
          web server nor connect to the Internet. All the design and browsing
          had to be done to the local filesystem.)

          TimBL predicted, ages back, that hand-coding would rapidly go out of
          fashion and be replaced by high-level design tools. Surely he
          couldn't, in his worst nightmares, have imagined the kind of
          preposterous HTML+CSS that would be extruded by (as far as I can see)
          all of the currently widespread commercial tools - except when they
          are in the hands of someone expert enough to keep them in check -
          which, basically, means knowing /how/ to hand-code, even when not
          actually /doing/ it.

          Comment

          • axlq

            #6
            Re: How does a browser parse an html document?

            In article <Pine.LNX.4.62. 0601251651010.1 1871@ppepc56.ph .gla.ac.uk>,
            Alan J. Flavell <flavell@ph.gla .ac.uk> wrote:[color=blue]
            >TimBL predicted, ages back, that hand-coding would rapidly go out of
            >fashion and be replaced by high-level design tools. Surely he
            >couldn't, in his worst nightmares, have imagined the kind of
            >preposterous HTML+CSS that would be extruded by (as far as I can see)
            >all of the currently widespread commercial tools - except when they
            >are in the hands of someone expert enough to keep them in check -
            >which, basically, means knowing /how/ to hand-code, even when not
            >actually /doing/ it.[/color]

            I'm happy to say that all my web design work has been hand coded.
            It's hard to avoid hand-coding when most of the pages are generated
            dynamically by compiled C++ CGI programs or php scripts that perform
            database manipulations.

            The only times I've ever used a web page authoring tool was to try
            something interesting to see what the code looked like, then I would
            use my own version of the cleaned up code in my own pages.

            -A

            Comment

            • Eric Lindsay

              #7
              Re: How does a browser parse an html document?

              In article <Pine.LNX.4.62. 0601251651010.1 1871@ppepc56.ph .gla.ac.uk>,
              "Alan J. Flavell" <flavell@ph.gla .ac.uk> wrote:
              [color=blue]
              > TimBL predicted, ages back, that hand-coding would rapidly go out of
              > fashion and be replaced by high-level design tools. Surely he
              > couldn't, in his worst nightmares, have imagined the kind of
              > preposterous HTML+CSS that would be extruded by (as far as I can see)
              > all of the currently widespread commercial tools - except when they
              > are in the hands of someone expert enough to keep them in check -
              > which, basically, means knowing /how/ to hand-code, even when not
              > actually /doing/ it.[/color]

              Although there are many horrible examples of tools (and I still can't
              find any tools I like except for StyleMaster), I do have the impression
              that there is some gradual improvement. More important, at least some
              of the people writing the tools now seem seriously interested in only
              producing valid (X)HTML, and styling it with CSS. Since I don't want to
              write HTML and CSS myself (and I especially don't want to write my own
              CMS like I am doing at the moment), I looked through around 30 tools,
              hoping to find something that gave results I liked.

              On Macintosh, both Sandvox and Rapid Weaver (themes based, drag and
              drop) apparently produce valid code, and it looks clean. Sandvox is
              still beta, but I have great hopes for it in a few years. Rapid Weaver
              has a longer history. Both produced by either one, or a few people.

              More important, in OS X, the Cocoa HTML generator (and command line
              textutil) that is used by default by most programs to produce HTML seems
              to me to actually be improving from release to release. For instance,
              you can now write a RTF document (the default) with links, lists and
              tables in TextEdit (the default editor) and get clean and valid HTML
              4.01 Strict output styled with CSS. It does a reasonable job with
              title, keywords, description and other meta in the head. It is not
              semantic, has no concept of h1, h2, etc or document outline, and a lot
              of stuff is inline styles. You can't include images, voice or movies in
              your HTML output (they get saved as a web archive instead of HTML). But
              each version has added facilities. I can see how you could add a
              post-processing step to clean and fix most of that up, and use it with
              existing CSS style sheets. So I hope for continued improvement.

              Mind you, I don't know why iWeb output seems such a bloated disaster ...
              but even that is using CSS and will validate, which is a whole heap
              better than a lot of past tools.

              If the web is ever to be full of clean, lean, valid code, it will have
              to come from tool makers being persuaded that that is the way their
              tools need to work. Converting individuals like me (and others who
              chance upon this group) is largely (alas) a waste of effort.

              --
              Eric Lindsay's web sites, featuring Airlie Beach diving, sailing tourist area, Psion Epoc computers, Gegenschein Science fiction fanzine.

              Comment

              • Lachlan Hunt

                #8
                Re: How does a browser parse an html document?

                Viken Karaguesian wrote:[color=blue]
                > How does a browser parse (correct term?) an HTML document. I'm sure
                > that every browser does it a little differently.[/color]

                Well, um... That's like asking the ultimate question of life, the
                universe and everything. No-body really knows for sure, although there
                are some theories on the subject [1] with some relation to the
                Heisenburg uncertainty principle. It varies significantly from browser
                to browser, and then varies even more when you consider quirks mode;
                though one thing we can be sure of is that no browser follows any
                formally defined parsing rules.
                [color=blue]
                > Do they simply just read a document top-to-bottom and left-to-right
                > and just display elements in the order in which they encounter them?[/color]

                They read through the source code from start to finish and attempt to
                build a DOM along the way and as it is being built, each element in the
                DOM is rendered on the screen according to the rules of CSS (in most
                cases). For images and other objects, if the height and width is known
                before it's loaded, the screen real estate is reserved for it. If it's
                unknown, the page will reflow when the intrinsic dimensions are found.
                [color=blue]
                > Or, do they give priority to certain types of content?[/color]

                No, they generally load it as quickly as possible as soon as they
                receive it.
                [color=blue]
                > For instance, would a browser display text first, then all the images,
                > then all Javascripts, etc?[/color]

                JavaScript in a script element is executed as soon as it is parsed
                (except when the defer attribute is used)

                [1] http://ln.hixie.ch/?start=1137740632&count=1


                --
                Lachlan Hunt

                http://GetFirefox.com/ Rediscover the Web
                http://GetThunderbird.com/ Reclaim your Inbox

                Comment

                • Andy Dingley

                  #9
                  Re: How does a browser parse an html document?

                  On Wed, 25 Jan 2006 17:01:16 +0000, "Alan J. Flavell"
                  <flavell@ph.gla .ac.uk> wrote:
                  [color=blue]
                  >TimBL predicted, ages back, that hand-coding would rapidly go out of
                  >fashion and be replaced by high-level design tools.[/color]

                  AFAIR that quote only applied to HTML.. He was thinking of
                  "off-the-shelf" CSS schemes a la ZenGarden that were hand-coded by
                  skilled designers, then massively re-used. The discrepancy seems to be
                  that the emergent web voted overwhelmingly for originality in design
                  rather than quality.


                  Comment

                  • Viken Karaguesian

                    #10
                    Re: How does a browser parse an html document?

                    Lachlan,

                    Thanks for the great answer. It was exactly what wat I was looking for.
                    Thanks to everyone for replying.

                    --
                    Viken K.



                    Comment

                    • Tim

                      #11
                      Re: How does a browser parse an html document?

                      On Wed, 25 Jan 2006 17:01:16 +0000, Alan J. Flavell sent:
                      [color=blue]
                      > TimBL predicted, ages back, that hand-coding would rapidly go out of
                      > fashion and be replaced by high-level design tools. Surely he couldn't,
                      > in his worst nightmares, have imagined the kind of preposterous HTML+CSS
                      > that would be extruded by (as far as I can see) all of the currently
                      > widespread commercial tools - except when they are in the hands of someone
                      > expert enough to keep them in check - which, basically, means knowing
                      > /how/ to hand-code, even when not actually /doing/ it.[/color]

                      I seem to recall reading that it was the *intention* that HTML *would* be
                      machine generated (like most other document formats), but still human
                      readable (unlike most document formats).

                      The idea that you have to know how to build something by hand before you
                      can get a machine to do it for you is nothing new. You've only got to
                      look at how you train engineers, for example. Same for other skills, like
                      woodwork. I think the basic problem is that unskilled people think that
                      they can do skilled tasks just as well as an expert(*). God help us if
                      they take up an interest in first aid...

                      I mean "expert," not "profession al." A professional is merely someone who
                      gets paid to do something, as opposed to an "amateur" who doesn't. An
                      expert knows what they're doing.

                      --
                      If you insist on e-mailing me, use the reply-to address (it's real but
                      temporary). But please reply to the group, like you're supposed to.

                      This message was sent without a virus, please destroy some files yourself.

                      Comment

                      Working...