URLs and preg_match (again)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Justin Koivisto

    URLs and preg_match (again)

    OK, I found a thread that help out from a while back (Oct 9, 2002) to
    give me this pattern:

    `(((f|ht)tp://)(([^@:]+)([^@]+)?@)?([^:\/])+(:\d+)?(\/[^\s]+)?)`i

    OK, all is well and good with this until the URL is used at the end of a
    sentance. I am assuming that I will need a negative lookahead somehow,
    but I just can't wrap my mind around this one...

    Want to match only the URL...

    $string="... http://www.example.com .";
    $string="... http://www.example.com/.";
    $string="... http://www.example.com/page1.html?";
    $string="... http://www.example.com/info.php?id=4!" ;
    etc...

    The pattern above is pulling the last character from each string when I
    don't want it. Unfortunately, the URL can be _anything_ valid, and I
    don't have control of how it will be input. Can anyone help with this?

    TIA

    --
    Justin Koivisto - spam@koivi.com
    PHP POSTERS: Please use comp.lang.php for PHP related questions,
    alt.php* groups are not recommended.
    SEO Competition League: http://seo.koivi.com/
  • R. Rajesh Jeba Anbiah

    #2
    Re: URLs and preg_match (again)

    Justin Koivisto <spam@koivi.com > wrote in message news:<eoEec.503 $m3.20915@news7 .onvoy.net>...[color=blue]
    > OK, I found a thread that help out from a while back (Oct 9, 2002) to
    > give me this pattern:
    >
    > `(((f|ht)tp://)(([^@:]+)([^@]+)?@)?([^:\/])+(:\d+)?(\/[^\s]+)?)`i
    >
    > OK, all is well and good with this until the URL is used at the end of a
    > sentance. I am assuming that I will need a negative lookahead somehow,
    > but I just can't wrap my mind around this one...
    >
    > Want to match only the URL...
    >
    > $string="... http://www.example.com .";
    > $string="... http://www.example.com/.";
    > $string="... http://www.example.com/page1.html?";
    > $string="... http://www.example.com/info.php?id=4!" ;
    > etc...
    >
    > The pattern above is pulling the last character from each string when I
    > don't want it. Unfortunately, the URL can be _anything_ valid, and I
    > don't have control of how it will be input. Can anyone help with this?[/color]


    It is somewhat shocking to see even experts like Justin are lost in
    regular expressions.

    I must admit the fact that I'm still poor in regular expression even
    though I use two good tools: <http://www.weitz.de/regex-coach/> and
    <http://laurent.riester er.free.fr/regexp/>

    I think, for your requirement this will work fine: (stolen from
    <http://groups.google.c om/groups?selm=4d1 9834f.030307041 8.506bb9a5%40po sting.google.co m>
    ;-) )

    (((http|ftp|htt ps):\/\/[\w]+(.[\w]+)([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?))|(www.)|(ft p.)

    --
    http://www.sendmetoindia.com - Send Me to India!
    Email: rrjanbiah-at-Y!com

    Comment

    • John Dunlop

      #3
      Re: URLs and preg_match (again)

      Justin Koivisto wrote:
      [color=blue]
      > OK, I found a thread that help out from a while back (Oct 9, 2002)[/color]

      If it was http://groups.google.com/groups?th=a48518c6e18574d9 , I
      caution you against blindly following advice from that thread. It's
      clear from Jeff Donnici's original article he wanted to *match* URIs,
      not *parse* them. Why on earth this curiosity was put forward I don't
      know (note, in particular, the delimiter appears unescaped in the
      pattern proper -- a sure sign of insufficient testing):

      | /((f|ht)tp://)(([^@:]+)([^@]+)?@)?([^:\/])+(:\d+)?(\/[^\s]+)?/

      You evidently noticed and mended the delimiter:
      [color=blue]
      > `(((f|ht)tp://)(([^@:]+)([^@]+)?@)?([^:\/])+(:\d+)?(\/[^\s]+)?)`i
      >
      > OK, all is well and good with this[/color]

      Consider the string: "http:// Lorem ipsum ... est laborum".

      Negative character classes match every character not in the class. In
      this case, that includes whitespace characters, which aren't allowed
      in URIs.
      [color=blue]
      > until the URL is used at the end of a sentance.[/color]

      It's not just sentence terminators that rattle that regular
      expression: intra sentence spacing, words and punctuation all wreak
      havoc too. It might be OK for *parsing* URIs [1], I don't know, I
      didn't examine it and I'm not conversant with FTP URI syntax; but its
      URI *matching* rates very poorly.
      [color=blue]
      > I am assuming that I will need a negative lookahead somehow,
      > but I just can't wrap my mind around this one...
      >
      > Want to match only the URL...[/color]

      That depends on the URL, obviously; regular expressions, although
      powerful in some senses, afford no mind-reading capabilities.
      [color=blue]
      > $string="... http://www.example.com .";[/color]

      Any URI parser would recognise <http://www.example.com .> as a URI: an
      HTTP URI with a complete, or absolute, domain name of
      "www.example.co m." (including the final period; the root label).
      [color=blue]
      > $string="... http://www.example.com/.";[/color]

      Again, an HTTP URI, this time with a path segment of ".". The final
      period does not have any special meaning here; it's simply a path
      segment.
      [color=blue]
      > $string="... http://www.example.com/page1.html?";[/color]

      Another HTTP URI, but with a path segment of "page1.html " and an empty
      query component.
      [color=blue]
      > $string="... http://www.example.com/info.php?id=4!" ;[/color]

      Yet another HTTP URI, this time with a path segment of "info.php" and
      a query component of "id=4!".
      [color=blue]
      > The pattern above is pulling the last character from each string when I
      > don't want it.[/color]

      So you know what the URIs are beforehand, right?
      [color=blue]
      > Unfortunately, the URL can be _anything_ valid, and I don't have control
      > of how it will be input.[/color]

      I'm not sure what you mean.
      [color=blue]
      > Can anyone help with this?[/color]

      We'd need more information to offer any help. You might be interested
      in Appendix E of RFC2396, which discusses recommended ways to delimit
      URIs. I'd like to say, in passing, that it makes no mention of the
      increasing use of parentheses ("(" and ")") to delimit URIs; the right
      parenthesis is allowed in a path segment, so the URI

      http://www.php.net/manual/en/)

      results in a 404 (at least it did at the time I wrote this,
      20040414T0703Z) .

      As an example of how involved URI *matching* can be, here's a regular
      expression (PCRE) to match HTTP URIs (please excuse any line wrap):

      `(?:(?i)http)://(?:(?:(?:(?:[a-z\d][a-z\d-]*[a-z\d]|[a-z\d])\.)*(?:[a-
      z][a-z\d-]*[a-z\d]|[a-
      z])\.?)|(?:\d+\.\ d+\.\d+\.\d+))( ?::\d*)?(?:(?:/(?:(?:(?:[a-
      z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})*(?:;(?:[a-z\d_.!~*\'()\-
      :@&=+$,]|%[\da-f]{2})*)*)(?:/(?:(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-
      f]{2})*(?:;(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-
      f]{2})*)*))*))(?: \?(?:[;/?:@&=+$,a-z\d\-_.!~*\'()]|%[\da-f])*)?)?`i

      Splitting it into more manageable chunks:

      $scheme = '(?:(?i)http)';
      $domainlabel = '(?:[a-z\d][a-z\d-]*[a-z\d]|[a-z\d])';
      $toplabel = '(?:[a-z][a-z\d-]*[a-z\d]|[a-z])';
      $hostname = "(?:(?:$domainl abel\.)*$toplab el\.?)";
      $ipv4address = '(?:\d+\.\d+\.\ d+\.\d+)';
      $host = "(?:$hostname|$ ipv4address)";
      $port = '(?::\d*)';
      $pchar = '(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})';
      $param = "$pchar*";
      $segment = "(?:$pchar*(?:; $param)*)";
      $path_segments = "(?:$segmen t(?:/$segment)*)";
      $abspath = "(?:/$path_segments) ";
      $query = '(?:\?(?:[;/?:@&=+$,a-z\d\-_.!~*\'()]|%[\da-f])*)';
      $http_uri = "$scheme://$host$port?(?:$ abspath$query?) ?";
      $pattern = "`$http_uri `i";

      preg_match_all( $pattern,$subje ct,$matches)

      You might care to omit the case sensitive internal option affecting
      the scheme name. Scheme names should be lowercase, but, "[f]or
      resiliency, programs interpreting URI should treat upper case letters
      as equivalent to lower case" (RFC2396, sec. 3.1). Technically, a URI
      with an uppercase scheme name isn't an absolute URI.

      Refs.:

      RFC2396, "Uniform Resource Identifiers (URI): Generic Syntax",


      RFC2616, "Hypertext Transfer Protocol -- HTTP/1.1", section 3.2,


      RFC1738, "Uniform Resource Locators (URL)", section 3.2,



      [1] You'll have read the example POSIX regular expression in RFC2396
      for parsing URI references. From Appendix B:

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?



      --
      Jock

      Comment

      • R. Rajesh Jeba Anbiah

        #4
        Re: URLs and preg_match (again)

        John Dunlop <usenet+2004@jo hn.dunlop.name> wrote in message news:<MPG.1ae6f 892abcbfae69896 e8@News.Individ ual.NET>...[color=blue]
        > Justin Koivisto wrote:[/color]
        <snip>[color=blue]
        > As an example of how involved URI *matching* can be, here's a regular
        > expression (PCRE) to match HTTP URIs (please excuse any line wrap):
        >
        > `(?:(?i)http)://(?:(?:(?:(?:[a-z\d][a-z\d-]*[a-z\d]|[a-z\d])\.)*(?:[a-
        > z][a-z\d-]*[a-z\d]|[a-
        > z])\.?)|(?:\d+\.\ d+\.\d+\.\d+))( ?::\d*)?(?:(?:/(?:(?:(?:[a-
        > z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})*(?:;(?:[a-z\d_.!~*\'()\-
        > :@&=+$,]|%[\da-f]{2})*)*)(?:/(?:(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-
        > f]{2})*(?:;(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-
        > f]{2})*)*))*))(?: \?(?:[;/?:@&=+$,a-z\d\-_.!~*\'()]|%[\da-f])*)?)?`i[/color]
        <snip>

        WOW!! I wonder, how could you swap the whole RFCs in your brain...

        --
        http://www.sendmetoindia.com - Send Me to India!
        Email: rrjanbiah-at-Y!com

        Comment

        • Justin Koivisto

          #5
          Re: URLs and preg_match (again)

          R. Rajesh Jeba Anbiah wrote:
          [color=blue]
          > Justin Koivisto <spam@koivi.com > wrote in message news:<eoEec.503 $m3.20915@news7 .onvoy.net>...
          >[color=green]
          >>OK, I found a thread that help out from a while back (Oct 9, 2002) to
          >>give me this pattern:
          >>
          >>`(((f|ht)tp ://)(([^@:]+)([^@]+)?@)?([^:\/])+(:\d+)?(\/[^\s]+)?)`i
          >>
          >>OK, all is well and good with this until the URL is used at the end of a
          >>sentance. I am assuming that I will need a negative lookahead somehow,
          >>but I just can't wrap my mind around this one...
          >>
          >>Want to match only the URL...
          >>
          >>$string=".. . http://www.example.com .";
          >>$string=".. . http://www.example.com/.";
          >>$string=".. . http://www.example.com/page1.html?";
          >>$string=".. . http://www.example.com/info.php?id=4!" ;
          >>etc...
          >>
          >>The pattern above is pulling the last character from each string when I
          >>don't want it. Unfortunately, the URL can be _anything_ valid, and I
          >>don't have control of how it will be input. Can anyone help with this?[/color]
          >
          > It is somewhat shocking to see even experts like Justin are lost in
          > regular expressions.[/color]

          I'm considered an expert? THANKS FOR THE COMPLIMENT! ;) I'm OK with
          regex, but I have only started really using perl regex in the last year,
          so I have a way to go to learn about it.
          [color=blue]
          > I must admit the fact that I'm still poor in regular expression even
          > though I use two good tools: <http://www.weitz.de/regex-coach/> and
          > <http://laurent.riester er.free.fr/regexp/>[/color]

          Heh, I think I will have to check those out... thanks for the links.
          [color=blue]
          > I think, for your requirement this will work fine: (stolen from
          > <http://groups.google.c om/groups?selm=4d1 9834f.030307041 8.506bb9a5%40po sting.google.co m>
          > ;-) )
          >
          > (((http|ftp|htt ps):\/\/[\w]+(.[\w]+)([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?))|(www.)|(ft p.)[/color]

          hmm... Never even thought of looking in a VB newsgroup...

          I've pasted it in, and it worked for the 2 available tests I had in
          place. I'll let you know if there are any problems with it, and post
          whatever I can come up with for fixes.

          --
          Justin Koivisto - spam@koivi.com
          PHP POSTERS: Please use comp.lang.php for PHP related questions,
          alt.php* groups are not recommended.
          SEO Competition League: http://seo.koivi.com/

          Comment

          • Justin Koivisto

            #6
            Re: URLs and preg_match (again)

            John Dunlop wrote:[color=blue]
            >
            > We'd need more information to offer any help. You might be interested
            > in Appendix E of RFC2396[/color]

            Basically, what I am doing is making hyperlinks out of urls typed into a
            text area. So I am trying to match urls, but need to parse for
            punctuation after them.

            ....
            [color=blue]
            > As an example of how involved URI *matching* can be, here's a regular
            > expression (PCRE) to match HTTP URIs (please excuse any line wrap):
            >
            > $scheme = '(?:(?i)http)';
            > $domainlabel = '(?:[a-z\d][a-z\d-]*[a-z\d]|[a-z\d])';
            > $toplabel = '(?:[a-z][a-z\d-]*[a-z\d]|[a-z])';
            > $hostname = "(?:(?:$domainl abel\.)*$toplab el\.?)";
            > $ipv4address = '(?:\d+\.\d+\.\ d+\.\d+)';
            > $host = "(?:$hostname|$ ipv4address)";
            > $port = '(?::\d*)';
            > $pchar = '(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})';
            > $param = "$pchar*";
            > $segment = "(?:$pchar*(?:; $param)*)";
            > $path_segments = "(?:$segmen t(?:/$segment)*)";
            > $abspath = "(?:/$path_segments) ";
            > $query = '(?:\?(?:[;/?:@&=+$,a-z\d\-_.!~*\'()]|%[\da-f])*)';
            > $http_uri = "$scheme://$host$port?(?:$ abspath$query?) ?";
            > $pattern = "`$http_uri `i";
            >
            > preg_match_all( $pattern,$subje ct,$matches)[/color]

            Honestly, I've never actually read an RFC dealing with internet
            protocols. (The only one I read was on the RTF file format, and gave up
            on that about 1/2 way through.)

            Interesting though that the above pattern makes some odd results:


            I think I will have to play some more with the patterns that were posted
            in this thread and see what I can make of them.

            --
            Justin Koivisto - spam@koivi.com
            PHP POSTERS: Please use comp.lang.php for PHP related questions,
            alt.php* groups are not recommended.
            SEO Competition League: http://seo.koivi.com/

            Comment

            • Justin Koivisto

              #7
              Re: URLs and preg_match (again)

              R. Rajesh Jeba Anbiah wrote:[color=blue]
              > (((http|ftp|htt ps):\/\/[\w]+(.[\w]+)([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?))|(www.)|(ft p.)[/color]

              This was nearly what I was looking for. I ended up editing a bit to come
              up with:
              (((https?|ftp)://[\w]+(\.[\w]+)([\w.,@?^=%&:/~\+#-]*[^?.,!:;\s])?))

              Which seems to be working right now. I'm a little suspicious of the
              "[^?.,!:;\s]" part - I assumed that I'd need some kind of look-ahead to
              do it. Anyway, here's the results:



              The last 7 urls had a space appeneded to the end (see source), so the
              results are what I was expecting to get.

              Thanks to those who replied!

              --
              Justin Koivisto - spam@koivi.com
              PHP POSTERS: Please use comp.lang.php for PHP related questions,
              alt.php* groups are not recommended.
              SEO Competition League: http://seo.koivi.com/

              Comment

              • John Dunlop

                #8
                Re: URLs and preg_match (again)

                Justin Koivisto wrote:
                [color=blue]
                > John Dunlop wrote:[/color]

                [ ... ]
                [color=blue][color=green]
                > > $http_uri = "$scheme://$host$port?(?:$ abspath$query?) ?";[/color][/color]

                [ ... ]
                [color=blue]
                > Interesting though that the above pattern makes some odd results:
                > http://waf.rangenet.com/Edit1.php[/color]

                The pattern on that page isn't identical to my offering. Compare

                $http_uri = "$scheme://$host$port?(?:$ abspath$query?) ?";

                and

                | $http_uri = $scheme.'://'.$host.$port.' ?(?:'.$abspath. $query.'.?)?';
                ----------------------------------------------------------------^

                That final period significantly alters the match. If the URI contains
                a path, it must now end with a query component (or, at least, a "?"
                followed by an empty query component), for $query is no longer
                optional. Therefore, when applied to <http://domain.example/foo>, the
                pattern matches <http://domain.example> only.

                Fix it by removing the offending period.

                --
                Jock

                Comment

                • Chung Leong

                  #9
                  Re: URLs and preg_match (again)

                  "R. Rajesh Jeba Anbiah" <ng4rrjanbiah@r ediffmail.com> wrote in message
                  news:abc4d8b8.0 404132035.64a53 919@posting.goo gle.com...[color=blue]
                  > Justin Koivisto <spam@koivi.com > wrote in message[/color]
                  news:<eoEec.503 $m3.20915@news7 .onvoy.net>...[color=blue]
                  > It is somewhat shocking to see even experts like Justin are lost in
                  > regular expressions.
                  >
                  > I must admit the fact that I'm still poor in regular expression even
                  > though I use two good tools: <http://www.weitz.de/regex-coach/> and
                  > <http://laurent.riester er.free.fr/regexp/>[/color]

                  The camel book sits on my desk ever though I'm programming in PHP.

                  What is needed here is the magical \b meta character:

                  $re = '/(((https?|ftp): \/\/[\w]+(\.[\w]+)([^\s]*)?))(\/|\b)/i';




                  Comment

                  • R. Rajesh Jeba Anbiah

                    #10
                    Re: URLs and preg_match (again)

                    Justin Koivisto <spam@koivi.com > wrote in message news:<XScfc.548 $m3.23401@news7 .onvoy.net>...[color=blue]
                    > John Dunlop wrote:[/color]
                    <snip>[color=blue]
                    >
                    > Interesting though that the above pattern makes some odd results:
                    > http://waf.rangenet.com/Edit1.php
                    >
                    > I think I will have to play some more with the patterns that were posted
                    > in this thread and see what I can make of them.[/color]

                    IMHO, it is better to stay with RFC standards as John says. As John
                    said, I think, the problem might be within splitting the pattern. I
                    have tried all your urls with "The Regex Coach"
                    <http://www.weitz.de/regex-coach/> with the following pattern:

                    (?:(?i)http)://(?:(?:(?:(?:[a-z\d][a-z\d-]*[a-z\d]|[a-z\d])\.)*(?:[a-z][a-z\d-]*[a-z\d]|[a-z])\.?)|(?:\d+\.\ d+\.\d+\.\d+))( ?::\d*)?(?:(?:/(?:(?:(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})*(?:;(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})*)*)(?:/(?:(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})*(?:;(?:[a-z\d_.!~*\'()\-:@&=+$,]|%[\da-f]{2})*)*))*))(?: \?(?:[;/?:@&=+$,a-z\d\-_.!~*\'()]|%[\da-f])*)?)?

                    It is working fine for all your urls *except* for domains with "_"
                    (underscores) eg. http://www.ex_ample.com/ . If I'm right that is also
                    correct. Hope, John will comment on it.

                    --
                    http://www.sendmetoindia.com - Send Me to India!
                    Email: rrjanbiah-at-Y!com

                    Comment

                    • Justin Koivisto

                      #11
                      Re: URLs and preg_match (again)

                      John Dunlop wrote:[color=blue]
                      > The pattern on that page isn't identical to my offering. Compare
                      >
                      > $http_uri = "$scheme://$host$port?(?:$ abspath$query?) ?";
                      >
                      > and
                      >
                      > | $http_uri = $scheme.'://'.$host.$port.' ?(?:'.$abspath. $query.'.?)?';
                      > ----------------------------------------------------------------^
                      >
                      > That final period significantly alters the match. If the URI contains
                      > a path, it must now end with a query component (or, at least, a "?"
                      > followed by an empty query component), for $query is no longer
                      > optional. Therefore, when applied to <http://domain.example/foo>, the
                      > pattern matches <http://domain.example> only.
                      >
                      > Fix it by removing the offending period.[/color]

                      My mistake... I now removed the extra period. However, still picks up
                      the ending punctuation like "!" and "?" for strings like:

                      http://www.example.com/!
                      http://www.example.com/dirname/file.html!
                      http://www.example.com/dirname/file.html?query =My+Test+String &q2=search!

                      I can see that the last one is a bit misleading since that does look
                      like part of the query string.

                      Another one that I noticed is:
                      http://www.ex_ample.com

                      Are underscored not allowed in a domain name, is this part of the RFC,
                      or just an overlook in the pattern?

                      --
                      Justin Koivisto - spam@koivi.com
                      PHP POSTERS: Please use comp.lang.php for PHP related questions,
                      alt.php* groups are not recommended.
                      SEO Competition League: http://seo.koivi.com/

                      Comment

                      • Justin Koivisto

                        #12
                        Re: URLs and preg_match (again)

                        Chung Leong wrote:
                        [color=blue]
                        > "R. Rajesh Jeba Anbiah" <ng4rrjanbiah@r ediffmail.com> wrote in message
                        > news:abc4d8b8.0 404132035.64a53 919@posting.goo gle.com...
                        >[color=green]
                        >>Justin Koivisto <spam@koivi.com > wrote in message[/color]
                        >
                        > news:<eoEec.503 $m3.20915@news7 .onvoy.net>...
                        >[color=green]
                        >> It is somewhat shocking to see even experts like Justin are lost in
                        >>regular expressions.
                        >>
                        >> I must admit the fact that I'm still poor in regular expression even
                        >>though I use two good tools: <http://www.weitz.de/regex-coach/> and
                        >><http://laurent.riester er.free.fr/regexp/>[/color]
                        >
                        > The camel book sits on my desk ever though I'm programming in PHP.
                        >
                        > What is needed here is the magical \b meta character:
                        >
                        > $re = '/(((https?|ftp): \/\/[\w]+(\.[\w]+)([^\s]*)?))(\/|\b)/i';[/color]

                        I had tried using the \b metacharacter, but I see that I didn't use ( /
                        | \b ), and this likely why it didn't work for me. This shortened
                        pattern seems to be doing the trick so far.

                        Thanks to all who have contributed!

                        --
                        Justin Koivisto - spam@koivi.com
                        PHP POSTERS: Please use comp.lang.php for PHP related questions,
                        alt.php* groups are not recommended.
                        SEO Competition League: http://seo.koivi.com/

                        Comment

                        • R. Rajesh Jeba Anbiah

                          #13
                          Re: URLs and preg_match (again)

                          "Chung Leong" <chernyshevsky@ hotmail.com> wrote in message news:<X6OdnX5-7tTsQeDdRVn_iw@ comcast.com>...[color=blue]
                          > "R. Rajesh Jeba Anbiah" <ng4rrjanbiah@r ediffmail.com> wrote in message
                          > news:abc4d8b8.0 404132035.64a53 919@posting.goo gle.com...[/color]

                          <snip>[color=blue]
                          > The camel book sits on my desk ever though I'm programming in PHP.[/color]

                          Yes, camel book is unbeatable. But, most of the time I used to
                          refer this quick table for most of my silly works
                          <http://www.phpedit.net/products/PHPEdit/manual/en/module.FindRegE xp.php#Lv1_7>

                          --
                          http://www.sendmetoindia.com - Send Me to India!
                          Email: rrjanbiah-at-Y!com

                          Comment

                          • John Dunlop

                            #14
                            Re: URLs and preg_match (again)

                            R. Rajesh Jeba Anbiah wrote:

                            [ ... ]
                            [color=blue]
                            > It is working fine for all your urls *except* for domains with "_"
                            > (underscores) eg. http://www.ex_ample.com/ .[/color]

                            Hostnames cannot contain underscores; they may only contain letters
                            ([a-zA-Z]), numbers ([0-9]), hyphens ([-]) and periods ([.]). Further
                            restrictions apply too. RFC952, as updated by RFC1123, specifies the
                            syntax of hostnames; the definition is: one or more names (a letter or
                            digit followed by any number of letters, digits or hyphens and ending
                            with a letter or digit) separated by periods.




                            --
                            Jock

                            Comment

                            • John Dunlop

                              #15
                              Re: URLs and preg_match (again)

                              Justin Koivisto wrote:
                              [color=blue]
                              > However, still picks up the ending punctuation like "!" and "?" for
                              > strings like:
                              > http://www.example.com/?
                              > http://www.example.com/!
                              > http://www.example.com/dirname/file.html!
                              > http://www.example.com/dirname/file.html?query =My+Test+String &q2=search![/color]

                              How do you know the "ending punctuation" isn't part of the URI?

                              --
                              Jock

                              Comment

                              Working...