why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • techguy_chicago@yahoo.com

    why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

    I just noticed on my website, with a link checker, that I have a bunch
    of URL's that reference a directory *above* my document root directory,
    but IE/Firefox/Opera never let on - they just seem to ignore the '../'
    I have in front of my links. Can this behavior be correct?

    So, my page is at this URL:



    And one of the links on that page, which has no 'base href' tags or
    anything else, says:

    <a href="../somedir/somepage.html"> Link here</a>

    My doc root is here:

    /www

    And my 'somedir' is here:

    /www/somedir

    but the URL, that I would expect to be broken, is not - it refers to:

    /somedir

    but the browser ignores the '../' directory references, apparently,
    once it reaches document root, and then dives down. In the case above,
    the initial page was served from document root, so there's no place
    left to go, but down.
    [color=blue]
    >From quick testing, it also seems I can have a link with the following[/color]
    that would *still* work:

    <a href="../../../../../../../../../somedir/somepage.html"> Link
    here</a>

    It just doesn't seem right. Basically, if the URL references something
    higher than document root, then ignore that part of the URL?

    I'm all for leniency, but this just doesn't make any sense to me. Do I
    have it right? That the browsers just say 'ah, we knew what she meant
    anyways'?

  • Ståle Sæbøe

    #2
    Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

    techguy_chicago @yahoo.com wrote:
    [color=blue]
    > I'm all for leniency, but this just doesn't make any sense to me. Do I
    > have it right? That the browsers just say 'ah, we knew what she meant
    > anyways'?
    >[/color]
    Well ... sort of. The browser gets the root directory from the server,
    so it knows where it is relative to that, and it knows how high it can
    go, then translates the URL to a valid one. This is essential if you
    want to create portable web sites without rewriting all the links every
    time you move it. Discarding abundant ../ is practical because your
    server would probably not allow the user to browse your entire
    filesystem anyway. I do not know if a user agent is required to do so or
    should try to browse the server filesystem for a valid path above the
    website root. The latter would probably be serious security flaw in my
    opinion. If you need visitors to access files and folders outside the
    website root folder you can use virtual folders (at least on IIS), but I
    advice against it. It is much more practical to keep all web files in
    your web root.

    I am sure many of the participants here can give you a much more
    detailed, and technical, information about this question.

    Comment

    • John Dunlop

      #3
      Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

      Somebody wrote:
      [color=blue]
      > So, my page is at this URL:
      >
      > http://www.mydomain.com/links.html[/color]

      Please use host names from RFC2606 in example URIs.


      [color=blue]
      > And one of the links on that page, which has no 'base href' tags or
      > anything else, says:
      >
      > <a href="../somedir/somepage.html"> Link here</a>[/color]

      With a base URI of



      the abnormal relative-path reference ../foo/bar resolves to



      RFC3986 sec. 5.2 describes an example algorithm for this; in
      particular, sec. 5.2.4 offers one way of removing 'dot
      segments'. More, sec. 5.4.2 shows abnormal examples, the
      first of which might be of interest to you.


      [color=blue]
      > My doc root[/color]

      You're confusing URI paths with filesystem paths.

      --
      Jock

      Comment

      • Stan Brown

        #4
        Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

        "" wrote in comp.infosystem s.www.authoring.html:[color=blue]
        >So, my page is at this URL:
        >
        > http://www.mydomain.com/links.html[/color]

        Not Found

        The requested URL /links.html was not found on this server.

        --

        Stan Brown, Oak Road Systems, Tompkins County, New York, USA
        Dragon222 adalah situs slot gacor terbaru yang selalu memberikan banyak bonus menarik dan kemenangan JP untuk pemain setia selama bermain di link slot DRAGON222.

        Comment

        • Stan Brown

          #5
          Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

          "John Dunlop" wrote in comp.infosystem s.www.authoring.html:[color=blue]
          >Somebody wrote:
          >[color=green]
          >> So, my page is at this URL:
          >>
          >> http://www.mydomain.com/links.html[/color]
          >
          >Please use host names from RFC2606 in example URIs.[/color]

          Or better yet, post the actual URL!

          --

          Stan Brown, Oak Road Systems, Tompkins County, New York, USA
          Dragon222 adalah situs slot gacor terbaru yang selalu memberikan banyak bonus menarik dan kemenangan JP untuk pemain setia selama bermain di link slot DRAGON222.

          Comment

          • David Ross

            #6
            Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

            John Dunlop wrote:[color=blue]
            >
            > Somebody wrote:
            >[color=green]
            > > So, my page is at this URL:
            > >
            > > http://www.mydomain.com/links.html[/color]
            >
            > Please use host names from RFC2606 in example URIs.
            >
            > http://www.ietf.org/rfc/rfc2606
            >[color=green]
            > > And one of the links on that page, which has no 'base href' tags or
            > > anything else, says:
            > >
            > > <a href="../somedir/somepage.html"> Link here</a>[/color]
            >
            > With a base URI of
            >
            > http://host.invalid/
            >
            > the abnormal relative-path reference ../foo/bar resolves to
            >
            > http://host.invalid/foo/bar
            >
            > RFC3986 sec. 5.2 describes an example algorithm for this; in
            > particular, sec. 5.2.4 offers one way of removing 'dot
            > segments'. More, sec. 5.4.2 shows abnormal examples, the
            > first of which might be of interest to you.
            >
            > http://www.ietf.org/rfc/rfc3986
            >[color=green]
            > > My doc root[/color]
            >
            > You're confusing URI paths with filesystem paths.[/color]

            However, if the reference is from a page NOT at the base, ../ at
            the beginning of a relative path is indeed meaningful. Thus, my
            own <URL:http://www.rossde.com/garden/diary/JanFeb05.html> contains
            the following references:

            <../garden_back.htm l>, which translates as
            <URL:http://www.rossde.com/garden/garden_back.htm l>

            <../../viewing_site.ht ml>, which translates as
            <URL:http://www.rossde.com/viewing_site.ht ml>

            The ../ is ignored only when it would translate to a path higher
            than the base allowed by your server. Thus, there is an implied
            base if you do not specify one.

            --

            David E. Ross
            <URL:http://www.rossde.com/>

            I use Mozilla as my Web browser because I want a browser that
            complies with Web standards. See <URL:http://www.mozilla.org/>.

            Comment

            • Harlan Messinger

              #7
              Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

              John Dunlop wrote:[color=blue]
              > Somebody wrote:
              >
              >[color=green]
              >>So, my page is at this URL:
              >>
              >> http://www.mydomain.com/links.html[/color]
              >
              >
              > Please use host names from RFC2606 in example URIs.
              >
              > http://www.ietf.org/rfc/rfc2606
              >[/color]

              This is good to know--I didn't before--but this person isn't creating a
              test suite that runs the risk of conflicting eventually with a real host
              name on the public internet. It's just a written example.

              [snip][color=blue]
              > With a base URI of
              >
              > http://host.invalid/
              >
              > the abnormal relative-path reference ../foo/bar resolves to
              >
              > http://host.invalid/foo/bar
              >
              > RFC3986 sec. 5.2 describes an example algorithm for this; in
              > particular, sec. 5.2.4 offers one way of removing 'dot
              > segments'. More, sec. 5.4.2 shows abnormal examples, the
              > first of which might be of interest to you.
              >
              > http://www.ietf.org/rfc/rfc3986
              >
              >[color=green]
              >>My doc root[/color]
              >
              > You're confusing URI paths with filesystem paths.
              >[/color]
              I don't know about other servers, but IIS automatically maps URI path
              components to like-named file system path components unless you
              explicitly configure the subpaths otherwise. This applies as well to
              .../, except that IIS can be set either to allow paths to places above
              the host root or not.

              Comment

              • Tim

                #8
                Re: why do browsers compensate for bad URL's w.r.t. DOC_ROOT?

                Somebody wrote:
                [color=blue][color=green][color=darkred]
                >>> So, my page is at this URL:
                >>>
                >>> http://www.mydomain.com/links.html[/color][/color][/color]

                John Dunlop wrote:
                [color=blue][color=green]
                >> Please use host names from RFC2606 in example URIs.
                >>
                >> http://www.ietf.org/rfc/rfc2606[/color][/color]

                Harlan Messinger <hmessinger.rem ovethis@comcast .net> posted:
                [color=blue]
                > This is good to know--I didn't before--but this person isn't creating a
                > test suite that runs the risk of conflicting eventually with a real host
                > name on the public internet. It's just a written example.[/color]

                But what they've done is write an example down somewhere where it'll be
                databased.

                Should someone actually own the allegedly faked domain name (which people
                often don't check whether someone else really owns it), they can end up
                causing unwanted traffic at that website (as robots index the posts, and
                follow any links, as well as people trying out the links in the posts as
                they're reading them).

                The last things the owner of domain.com wants is a few thousand people
                trying some example link to see why it doesn't do what the poster is trying
                to do, when the poster's problem is really somewhere else.
                [color=blue][color=green]
                >> You're confusing URI paths with filesystem paths.[/color][/color]
                [color=blue]
                > I don't know about other servers, but IIS automatically maps URI path
                > components to like-named file system path components unless you
                > explicitly configure the subpaths otherwise. This applies as well to
                > ../, except that IIS can be set either to allow paths to places above
                > the host root or not.[/color]

                Being able to escape from the root is a severe security breach. URIs
                should only map to filepaths in a manner that's strictly controlled by the
                server configuration. You don't want complete strangers being able to
                specify any path that they like on your system, to read any file that they
                like, merely by backing out of the server far enough.

                Anybody reading this thread and contemplating it needs to spend quite some
                time reading about why that's a seriously bad idea until they've been
                convinced not to do it. I can't think of a single example of where it'd be
                a good idea.

                --
                If you insist on e-mailing me, use the reply-to address (it's real but
                temporary). But please reply to the group, like you're supposed to.

                This message was sent without a virus, please delete some files yourself.

                Comment

                Working...