Help with regex to validate URL format

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Seb

    Help with regex to validate URL format

    Hi,

    I am trying to find the right regular expression which would only
    validate a URL with a given number of folders.


    Example:


    http://www.abc.com/folder/page.htm --Valid (4 slahes)


    http://www.abc.com/folder/subfolder/ --not valid (5 slashes)


    Basically, any URL not made of 4 slahes would be invalid.
    However, the URL:


    http://www.abc.com/folder/subfolder --would also be invalid


    Any ideas?


    Thanks,
    Seb

  • usenet+2004@john.dunlop.name

    #2
    Re: Help with regex to validate URL format

    Seb:
    I am trying to find the right regular expression which would only
    validate a URL with a given number of folders.
    URLs don't have, or refer to, folders. The parts in between the
    slashes in URL paths are called path segments, and they might or might
    not correspond to a part of a filesystem.
    http://www.abc.com/folder/page.htm --Valid (4 slahes)
    >
    http://www.abc.com/folder/subfolder/ --not valid (5 slashes)
    >
    Basically, any URL not made of 4 slahes would be invalid.
    Count the number of slashes in the string.
    http://www.abc.com/folder/subfolder --would also be invalid
    How would you distinguish that URL from your first example?

    Now you see the problems arising from the confusion of URL paths and
    filesystem paths.

    --
    Jock

    Comment

    • Seb

      #3
      Re: Help with regex to validate URL format

      Thanks.

      I guess all my actual files would be file extensions (.htm etc) whereas
      a path segment wouldn't.

      The question was around which regular expression I can use to access
      something with 4 slashes, and which does not finish with ".***" or
      ".****".

      Thanks,
      Seb

      usenet+2004@joh n.dunlop.name wrote:
      Seb:
      >
      I am trying to find the right regular expression which would only
      validate a URL with a given number of folders.
      >
      URLs don't have, or refer to, folders. The parts in between the
      slashes in URL paths are called path segments, and they might or might
      not correspond to a part of a filesystem.
      >
      http://www.abc.com/folder/page.htm --Valid (4 slahes)

      http://www.abc.com/folder/subfolder/ --not valid (5 slashes)

      Basically, any URL not made of 4 slahes would be invalid.
      >
      Count the number of slashes in the string.
      >
      http://www.abc.com/folder/subfolder --would also be invalid
      >
      How would you distinguish that URL from your first example?
      >
      Now you see the problems arising from the confusion of URL paths and
      filesystem paths.
      >
      --
      Jock

      Comment

      • Colin Fine

        #4
        Re: Help with regex to validate URL format

        Seb wrote:
        >
        usenet+2004@joh n.dunlop.name wrote:
        >Seb:
        >>
        >>I am trying to find the right regular expression which would only
        >>validate a URL with a given number of folders.
        >URLs don't have, or refer to, folders. The parts in between the
        >slashes in URL paths are called path segments, and they might or might
        >not correspond to a part of a filesystem.
        >>
        >>http://www.abc.com/folder/page.htm --Valid (4 slahes)
        >>>
        >>http://www.abc.com/folder/subfolder/ --not valid (5 slashes)
        >>>
        >>Basically, any URL not made of 4 slahes would be invalid.
        >Count the number of slashes in the string.
        >>
        >>http://www.abc.com/folder/subfolder --would also be invalid
        >How would you distinguish that URL from your first example?
        >>
        >Now you see the problems arising from the confusion of URL paths and
        >filesystem paths.
        >>
        >--
        >Jock
        >
        Thanks.
        >
        I guess all my actual files would be file extensions (.htm etc) whereas
        a path segment wouldn't.
        >
        The question was around which regular expression I can use to access
        something with 4 slashes, and which does not finish with ".***" or
        ".****".
        >
        Thanks,
        Seb
        If that is indeed what you need (and assuming you mean 'does', not 'does
        not'),
        preg_match ('|^http://[^/]+/[^/]+/[^/.]+.[^/.]{3,4}$|', $string)
        will do it.

        But you should be aware that there's nothing in the URL RFC that says
        you can't have a path like:



        Unless you have control over the format of valid URL's, you are not
        entitled to assume that xxx.yyy is the final part of a URL path.

        Incidentally, don't top-post if you don't want to bring down the Wrath
        of Jerry Stuckle. I've fixed yours.

        Colin

        Comment

        • Jerry Stuckle

          #5
          Re: Help with regex to validate URL format

          Colin Fine wrote:
          Seb wrote:
          >
          >>
          >usenet+2004@jo hn.dunlop.name wrote:
          >>
          >>Seb:
          >>>
          >>>I am trying to find the right regular expression which would only
          >>>validate a URL with a given number of folders.
          >>>
          >>URLs don't have, or refer to, folders. The parts in between the
          >>slashes in URL paths are called path segments, and they might or might
          >>not correspond to a part of a filesystem.
          >>>
          >>>http://www.abc.com/folder/page.htm --Valid (4 slahes)
          >>>>
          >>>http://www.abc.com/folder/subfolder/ --not valid (5 slashes)
          >>>>
          >>>Basically, any URL not made of 4 slahes would be invalid.
          >>>
          >>Count the number of slashes in the string.
          >>>
          >>>http://www.abc.com/folder/subfolder --would also be invalid
          >>>
          >>How would you distinguish that URL from your first example?
          >>>
          >>Now you see the problems arising from the confusion of URL paths and
          >>filesystem paths.
          >>>
          >>--
          >>Jock
          >>
          >>
          Thanks.
          >
          I guess all my actual files would be file extensions (.htm etc) whereas
          a path segment wouldn't.
          >
          The question was around which regular expression I can use to access
          something with 4 slashes, and which does not finish with ".***" or
          ".****".
          >
          Thanks,
          Seb
          >
          If that is indeed what you need (and assuming you mean 'does', not 'does
          not'),
          preg_match ('|^http://[^/]+/[^/]+/[^/.]+.[^/.]{3,4}$|', $string)
          will do it.
          >
          But you should be aware that there's nothing in the URL RFC that says
          you can't have a path like:
          >

          >
          Unless you have control over the format of valid URL's, you are not
          entitled to assume that xxx.yyy is the final part of a URL path.
          >
          Incidentally, don't top-post if you don't want to bring down the Wrath
          of Jerry Stuckle. I've fixed yours.
          >
          Colin
          Colin,

          What do you mean "wrath"? I ask very politely, and generally only when
          answering a question.

          --
          =============== ===
          Remove the "x" from my email address
          Jerry Stuckle
          JDS Computer Training Corp.
          jstucklex@attgl obal.net
          =============== ===

          Comment

          • Seb

            #6
            Re: Help with regex to validate URL format

            Thanks!

            Comment

            Working...