Validating directory and file path

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ted G

    Validating directory and file path

    Hello,

    I'm new wiyh PHP and would like to ask,
    what is the common way to check if
    directory path e.g. in url and file requested
    are in proper format?

    E.g. if I would give my homepage URL
    in format /usr/software/index.php
    it would be ok in my case but e.g.
    /usr//software/index.php
    would of course be wrong.

    I tried those parse_url() etc methods but
    they eat at least everything (or my php-version)
    and does not understand any errors.

    Could regular expression resolve this?
    I haven't use it much so, can anyone say
    how to test and notice that /usr//software/index.php
    is not in proper format?

    Thanks,


  • Alvaro G. Vicario

    #2
    Re: Validating directory and file path

    *** Ted G escribió/wrote (Sun, 20 Feb 2005 14:30:59 +0200):[color=blue]
    > E.g. if I would give my homepage URL
    > in format /usr/software/index.php
    > it would be ok in my case but e.g.
    > /usr//software/index.php
    > would of course be wrong.[/color]

    I can't figure out what you're trying to do but if you are talking about
    file system paths then realpath() can be used to return canonical absolute
    paths or FALSE if file does not exist.



    --
    -+ Álvaro G. Vicario - Burgos, Spain
    +- http://www.demogracia.com (la web de humor barnizada para la intemperie)
    ++ Manda tus dudas al grupo, no a mi buzón
    -+ Send your questions to the group, not to my mailbox
    --

    Comment

    • Ted G

      #3
      Re: Validating directory and file path

      Alvaro G. Vicario wrote:
      [color=blue]
      > *** Ted G escribió/wrote (Sun, 20 Feb 2005 14:30:59 +0200):
      >[color=green]
      >>E.g. if I would give my homepage URL
      >>in format /usr/software/index.php
      >>it would be ok in my case but e.g.
      >>/usr//software/index.php
      >>would of course be wrong.[/color]
      >
      >
      > I can't figure out what you're trying to do but if you are talking about
      > file system paths then realpath() can be used to return canonical absolute
      > paths or FALSE if file does not exist.
      >
      >[/color]
      Okey, I clearify.

      In Web application user has a change to give his/hers homepage
      address. So, I should validate that she/he will give it in
      proper format.
      As an example of error was: /usr//homepages/index.php

      => those // characters

      There might be also other typing errors when you give your URL.

      So, the question was that what is the easiest way in PHP to
      check that URL or it's path part (e.g. /usr/homepages/index.php) is
      written as syntax requires it?

      Br











      Comment

      • Alvaro G. Vicario

        #4
        Re: Validating directory and file path

        *** Ted G escribió/wrote (Sun, 20 Feb 2005 15:07:00 +0200):[color=blue]
        > As an example of error was: /usr//homepages/index.php
        >
        > => those // characters
        >
        > There might be also other typing errors when you give your URL.
        >
        > So, the question was that what is the easiest way in PHP to
        > check that URL or it's path part (e.g. /usr/homepages/index.php) is
        > written as syntax requires it?[/color]

        I don't think it's illegal to have // in the path part of an URL. Perhaps
        the best approach you can use is opening a socket to send a HEAD request
        and check the returned status code. That way you can actually check if the
        page exists and is up.


        --
        -+ Álvaro G. Vicario - Burgos, Spain
        +- http://www.demogracia.com (la web de humor barnizada para la intemperie)
        ++ Manda tus dudas al grupo, no a mi buzón
        -+ Send your questions to the group, not to my mailbox
        --

        Comment

        • Ted G

          #5
          Re: Validating directory and file path

          Alvaro G. Vicario wrote:
          [color=blue]
          > *** Ted G escribió/wrote (Sun, 20 Feb 2005 15:07:00 +0200):
          >[color=green]
          >>As an example of error was: /usr//homepages/index.php
          >>
          >>=> those // characters
          >>
          >>There might be also other typing errors when you give your URL.
          >>
          >>So, the question was that what is the easiest way in PHP to
          >>check that URL or it's path part (e.g. /usr/homepages/index.php) is
          >>written as syntax requires it?[/color]
          >
          >
          > I don't think it's illegal to have // in the path part of an URL. Perhaps
          > the best approach you can use is opening a socket to send a HEAD request
          > and check the returned status code. That way you can actually check if the
          > page exists and is up.
          >[/color]

          What I have usually done, I have used javascript in Browser side and
          Java's features in serverside to validate data.

          Unfortunately I'm not a pro in RegExp or PHP neither so, that's
          why I ask these questions ;)

          Yep, // is not valid in URL string (path).

          Ok, I can check as a string operation if thera are // or other unlegal
          characters in URL string and then use checkdnsrr-method to check if
          the host part is a living/valid host.

          But I thing there are also more elegant way to do that...

          Br,

          Comment

          • Chung Leong

            #6
            Re: Validating directory and file path


            "Ted G" <tg@not.valid.m ail> wrote in message
            news:37ripjF5h0 5hoU1@individua l.net...[color=blue]
            > Alvaro G. Vicario wrote:
            > Yep, // is not valid in URL string (path).
            >
            > Ok, I can check as a string operation if thera are // or other unlegal
            > characters in URL string and then use checkdnsrr-method to check if
            > the host part is a living/valid host.[/color]

            That's not true. Having // in the path part of URL does not make it
            syntatically incorrect. It's at the discretion of the server to interpret
            what the path means. If it chooses to, the server can correct for the
            obvious typo.


            Comment

            • Daniel Tryba

              #7
              Re: Validating directory and file path

              Chung Leong <chernyshevsky@ hotmail.com> wrote:[color=blue][color=green]
              >> Yep, // is not valid in URL string (path).
              >>
              >> Ok, I can check as a string operation if thera are // or other unlegal
              >> characters in URL string and then use checkdnsrr-method to check if
              >> the host part is a living/valid host.[/color]
              >
              > That's not true. Having // in the path part of URL does not make it
              > syntatically incorrect. It's at the discretion of the server to interpret
              > what the path means. If it chooses to, the server can correct for the
              > obvious typo.[/color]

              URI RFC (2396) says otherwise, servers correcting this do that at their
              own peril:

              3.3. Path Component

              The path component contains data, specific to the authority (or the
              scheme if there is no authority component), identifying the resource
              within the scope of that scheme and authority.

              path = [ abs_path | opaque_part ]

              path_segments = segment *( "/" segment )
              segment = *pchar *( ";" param )
              param = *pchar

              pchar = unreserved | escaped |
              ":" | "@" | "&" | "=" | "+" | "$" | ","

              The path may consist of a sequence of path segments separated by a
              single slash "/" character. Within a path segment, the characters
              "/", ";", "=", and "?" are reserved. Each path segment may include a
              sequence of parameters, indicated by the semicolon ";" character.
              The parameters are not significant to the parsing of relative
              references.

              Comment

              • Andy Hassall

                #8
                Re: Validating directory and file path

                On 20 Feb 2005 19:28:42 GMT, Daniel Tryba <spam@tryba.inv alid> wrote:
                [color=blue]
                >Chung Leong <chernyshevsky@ hotmail.com> wrote:[color=green][color=darkred]
                >>> Yep, // is not valid in URL string (path).[/color]
                >>
                >> That's not true. Having // in the path part of URL does not make it
                >> syntatically incorrect.[/color]
                >
                >URI RFC (2396) says otherwise, servers correcting this do that at their
                >own peril:
                >
                >3.3. Path Component
                >
                > The path component contains data, specific to the authority (or the
                > scheme if there is no authority component), identifying the resource
                > within the scope of that scheme and authority.
                >
                > path = [ abs_path | opaque_part ]
                >
                > path_segments = segment *( "/" segment )
                > segment = *pchar *( ";" param )
                > param = *pchar
                >
                > pchar = unreserved | escaped |
                > ":" | "@" | "&" | "=" | "+" | "$" | ","
                >
                > The path may consist of a sequence of path segments separated by a
                > single slash "/" character. Within a path segment, the characters
                > "/", ";", "=", and "?" are reserved. Each path segment may include a
                > sequence of parameters, indicated by the semicolon ";" character.
                > The parameters are not significant to the parsing of relative
                > references.[/color]

                Under section 1.6, the definition of the BNF-like grammar, it's got:

                "elements may be preceded with <n>* to designate n or more repetitions of the
                following element; n defaults to 0."

                Segment's declared as:

                segment = *pchar *( ";" param )

                Doesn't that imply that a segment may be the empty string, consisting of zero
                repetitions of pchar and zero repetitions of ( ";" param ), so "//" is a valid
                production of segment *( "/" segment )? Or am I reading it wrong?

                --
                Andy Hassall / <andy@andyh.co. uk> / <http://www.andyh.co.uk >
                <http://www.andyhsoftwa re.co.uk/space> Space: disk usage analysis tool

                Comment

                • John Dunlop

                  #9
                  parse_url

                  Ted G wrote:
                  [color=blue]
                  > I tried those parse_url() etc methods but
                  > they eat at least everything (or my php-version)
                  > and does not understand any errors.[/color]

                  Should you pass something other than a URI to parse_url, it
                  only 'tries its best'. It does not validate the string.



                  Sadly, parse_url does not parse all URIs properly. Take
                  <http://host.invalid?qu ery>, for example, which parse_url
                  thinks contains a host <host.invalid?q uery>. It doesn't; the
                  host is <host.invalid >, the path is empty though still
                  defined, and the query is <query>.

                  Instead, you can make use of the regular expression given in
                  RFC3986. Change

                  `^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?`

                  to

                  `^(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?`

                  which separates a URI, giving you the scheme name, the
                  authority (the host of an HTTP URI) if present, the path,
                  the query if present, and the fragment identifier if
                  present. It always finds the scheme name and path, which
                  are always defined for every URI.

                  --
                  Jock

                  Comment

                  • John Dunlop

                    #10
                    Re: Validating directory and file path

                    Daniel Tryba wrote:
                    [color=blue]
                    > Chung Leong <chernyshevsky@ hotmail.com> wrote:[/color]
                    [color=blue][color=green]
                    > > That's not true. Having // in the path part of URL does not make it
                    > > syntatically incorrect. It's at the discretion of the server to interpret
                    > > what the path means. If it chooses to, the server can correct for the
                    > > obvious typo.[/color]
                    >
                    > URI RFC (2396) says otherwise, servers correcting this do that at their
                    > own peril:[/color]

                    (Note that 2396 was obsoleted by 3986 over a month ago. The
                    additions and modifications are listed in an appendix.)
                    [color=blue]
                    > 3.3. Path Component[/color]

                    [ ... ]

                    Sorry, but I don't see anything in section 3.3 which
                    contradicts phpSt.Chung. He's right, as usual.

                    <http://host.invalid/path> and <http://host.invalid//path>
                    are both syntactically correct; in other words, they conform
                    to the rules 'URI' in RFC3986 and 'http_URL' in 2616. What
                    each one identifies, however, as Chung said, depends on the
                    server.

                    --
                    Jock

                    Comment

                    Working...