What regex do I need to use to make this into a hyperlink ?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jeddiki
    Contributor
    • Jan 2009
    • 290

    What regex do I need to use to make this into a hyperlink ?

    Hi

    In my text output file that I will display on my webpage
    I have this included:

    "Located At: Http://pggg-online.org/blog.htm "

    and sometimes :

    "found here www.example.org/james.htm "

    What is the best way to change these into real
    working links eg:

    <a href='http://www.example.org/james.htm' target='_blank' >www.example.or g/james.htm</a>
    I guess that the best way is with a regular expression ?

    Of course I have to make sure that I don't double up the
    http:// if it is already given

    Thanks for any advice
  • dlite922
    Recognized Expert Top Contributor
    • Dec 2007
    • 1586

    #2
    Here's what I have so far:

    Code:
    ([url]http://)?(([/url][-A-Z0-9]+\.)?[-A-Z0-9]+\.[A-Z]{2,10})/?([^\s])+(?=\s)
    but if you check Google, there's plenty of URL regexps you can modify for your own use.


    Dan

    Comment

    • kovik
      Recognized Expert Top Contributor
      • Jun 2007
      • 1044

      #3
      Why is the slash after the domain optional, but the rest of the stuff after it is not? o.O

      Comment

      • dlite922
        Recognized Expert Top Contributor
        • Dec 2007
        • 1586

        #4
        Originally posted by kovik
        Why is the slash after the domain optional, but the rest of the stuff after it is not? o.O
        not sure what you mean, that regex will match example.com

        and these too:

        something.examp le.com
        http://what.how.com/wow.html?#$%^&* something%20%30 else

        Not tested and it does need some work. It was given to you as a start, not as a final solution.




        Dan

        Comment

        • jeddiki
          Contributor
          • Jan 2009
          • 290

          #5
          Hi
          Thanks for suggesting that I Google it !!

          ( doh ! - (thats to myself !!) really, thanks .... it helped )

          I found this, which I think is what I need:

          Code:
          //PHP Example: Automatically link URL's inside text.
          
          $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
          Can anyone spot any problems with using this ?


          PS - I saw this comment:
          Great regex, thanks ! Small thing : the '-' (dash) is missing - URL like this fails (http://web5.uottawa.ca/admingov/regl...-methodes.html)
          Where should I add the "-" ?

          Thanks

          Comment

          • Atli
            Recognized Expert Expert
            • Nov 2006
            • 5062

            #6
            I would guess the comment means it should be:
            Code:
            $text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
            This fixes the problem with the URL posted in the comment, at least.

            Comment

            • jeddiki
              Contributor
              • Jan 2009
              • 290

              #7
              Hi again,

              I tried that regex in my script but it is not doing anything.

              This is my script:

              It is taking the product details out of a database and displaying them.
              The description often contains a url

              Code:
              while($row = mysql_fetch_assoc($result)){
                 extract($row);
                $descript = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.\-]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $descrip);
                $extr = $totearn-$earn;				
                $disp1 = "<b><span style = \"color:maroon;\">$Rctr) $cat</span><br><span style = \"color:darkblue;\">$title</span> ID: $id</b><br>";
                $disp2 = "$descript<br>";
                $disp3 = "<b>$/sale: $earn, +/mth: $rebill, Pop: $pop, Gravity: $grav, Comm%: $comm, Refers: $refer, Total Earn: $totearn</b>";
                echo "<div style=\"width: 800px; text-align: left;\">
                <span>$disp1</span>
                <span>$disp2</span>
                <span>$disp3</span>
               <br><br>
               </div>";
              the result of this can be see live here:
              script-test

              When you see the form just click on the "Analyze Clickbank Now" button
              and you will see a list of results.

              The results have urls - but none of them are converted :(

              Have I done something wrong ?

              PS

              Just a thought - does that regex only work on https urls?
              If so, how do I make it work on all ?

              Comment

              • kovik
                Recognized Expert Top Contributor
                • Jun 2007
                • 1044

                #8
                Originally posted by dlite922
                not sure what you mean, that regex will match example.com

                and these too:

                something.examp le.com
                http://what.how.com/wow.html?#$%^&* something%20%30 else

                Not tested and it does need some work. It was given to you as a start, not as a final solution.




                Dan
                @dlite:

                What I was saying is that your regex also matches this:
                http://what.how.comwow .html?#$%^&*som ething%20%30els e

                You made the slash after the domain optional.

                Comment

                • kovik
                  Recognized Expert Top Contributor
                  • Jun 2007
                  • 1044

                  #9
                  @jeddiki:

                  Firstly, there's no need to surround the entire regex in parentheses. The full regex match exists in $0.

                  Secondly, your current regex requires a http/https protocol. To make it optional, surround the protocol in parentheses and add a question mark after it.

                  Thirdly, the domain portion of your regex allows for 1 character TLDs. It also allows for TLDs with dashes. These do not exist. Change "([-\w\.]+)+" to "([-\w]\.)+(\w{2,})". This gives you multiple strings followed by periods, and then a 2 or more character string (without dashes).

                  The rest looks fine from here.

                  Comment

                  • jeddiki
                    Contributor
                    • Jan 2009
                    • 290

                    #10
                    Hi,

                    Thanks for your help,

                    I tried to follow what you said but I get this error:

                    Warning: preg_replace() [function.preg-replace]: Compilation failed: missing terminating ] for character class at offset 68 in /home/guru54gt5/public_html/sys/cb_search.php on line 208
                    this is the regex:
                    $descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(: \d+)?(/([\w/_\.\]*(\?\S+)?)?)?)@ ', '<a href="$1">$1</a>', $descrip);
                    Can you see where I have gone wrong ?

                    Comment

                    • Dormilich
                      Recognized Expert Expert
                      • Aug 2008
                      • 8694

                      #11
                      the character class opened at offset 44 should be closed at offset 52, but its closing brace is escaped, holding the character class open.

                      Comment

                      • jeddiki
                        Contributor
                        • Jan 2009
                        • 290

                        #12
                        Again, thanks for the input,

                        I have changed the expression to:
                        $descript = preg_replace('@ ( (https?://)? ([-\w]\.)+(\w{2,})+(: \d+)?(/([\w/_\.]*(\?\S+)?)?)?)@ ', '<a href="$1">$1</a>', $descrip);
                        Now it doesn't error - but it does not convert the urls to
                        hyperlinks either :(

                        The result can be seen here: clickbank tool

                        When you click on the big button, you get a list

                        You see on the list in position 2 there is a url:
                        Code:
                        Http://www.conversational-hypnosis.com/affiliate.php.
                        and then in position 4 another one:
                        Code:
                        At MaverickMoneyMakers.com/Bonus.
                        and position 6:
                        Code:
                        www.affilorama.com/affiliates
                        None of them got converted.

                        Comment

                        • kovik
                          Recognized Expert Top Contributor
                          • Jun 2007
                          • 1044

                          #13
                          Firstly, get rid of the whitespace in the regex. You need to explicitly tell it to ignore whitespace if you don't want it to interpret the whitespace as \s characters.

                          Secondly, I made a mistake in my regex correction. Change "([-\w]\.)" to "([-\w]+\.)". Also, I just noticed that your regex doesn't allow dashes anywhere but in the domain. Why is that?

                          Comment

                          • jeddiki
                            Contributor
                            • Jan 2009
                            • 290

                            #14
                            OK,

                            I got rid of the spaces.

                            and added that "+"

                            I also added a couple of "-" s

                            so now I have :
                            $descript = preg_replace('@ ((https?://)?([-\w]+\.)+(-\w{2,})+(:\d+)? (/([-\w/_\.]*(\?\S+)?)?)?)@ ', '<a href="$1">$1</a>', $descrip);
                            But alas, no improvement

                            Comment

                            • kovik
                              Recognized Expert Top Contributor
                              • Jun 2007
                              • 1044

                              #15
                              ... Take the dash out of the TLD. The way you wrote it requires that the first character is a dash, and TLDs don't even have dashes at all.

                              Also, are you aware that you use $descript and $descrip?

                              Comment

                              Working...