How do I get the final url from a redirection?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jeddiki
    Contributor
    • Jan 2009
    • 290

    How do I get the final url from a redirection?

    Hi,

    I want to capture the final url that a website redirects to.

    Here is an example of what I mean:

    www.example.com/sites.php?pd=45

    When you click on that link, the site will redirect you to

    www.Joe-Blogs.com/green/prod1.html?a=52 7

    As you can see they are two different sites.

    What I would like to do is pick the
    www.Joe-Blogs.com/green/prod1.html part of the final url
    and put it in a variable called $final_url.

    So if I have :


    Code:
    $first_url = "www.example.com/sites.php?pd=45";

    What would be the best way to get to that $final_url.

    Should I be using cUrl or would
    file() or get_file_conten ts() be able to get the url ?

    Any ideas on how I can get to my $final_url ?
  • xNephilimx
    Recognized Expert New Member
    • Jun 2007
    • 213

    #2
    Are you trying to make some kind of web proxy? If so, there are quite a few around, like PHProxy http://www.phproxy.org/ (source code: http://sourceforge.net/projects/poxy/).
    There's no need to reinvent the wheel.

    Best regards

    Comment

    • jeddiki
      Contributor
      • Jan 2009
      • 290

      #3
      Thanks,

      but no I am not trying to build a proxy,

      I want to get the final url so that I can use it in another
      web site that does analysis based on the url.

      Comment

      • jeddiki
        Contributor
        • Jan 2009
        • 290

        #4
        If I use cUrl, the code below should get the to the final webpage right?

        Is the final destination in the HEADER info ?

        Code:
        $target_url = "www.example.com/sites.php?pd=45";
        $cef = "curl_err.txt"; 
        $ceh = fopen($cef, 'w');
        
        curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
        curl_setopt($ch, CURLOPT_URL,$target_url);
        curl_setopt($ch, CURLOPT_FAILONERROR, true);
        curl_setopt($ch, CURLOPT_STDERR, $ceh);		
        curl_setopt($ch, CURLOPT_VERBOSE, 1);
        curl_setopt($ch, CURLOPT_HEADER,1); 
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER,true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 10);
        
        $output = curl_exec($ch);
        $info = curl_getinfo($ch);
        How would I extract the final url ?

        Any ideas ?



        .

        Comment

        • jeddiki
          Contributor
          • Jan 2009
          • 290

          #5
          Please HELP !!!

          I am really stuck on this one - surely someone knows how
          to do this ??


          maybe I should be using fsocket() ??

          any ideas ?

          Thanks



          .

          Comment

          • kovik
            Recognized Expert Top Contributor
            • Jun 2007
            • 1044

            #6
            Firstly, why are you doing this? Chances are that you are going about this incorrectly, and we can't help you if you don't give a clear idea of your end-goal.

            Secondly, redirection is not a standardized process. It can be performed via headers, meta-tags, or JavaScript. Do you plan to account for all of these?

            Comment

            • jeddiki
              Contributor
              • Jan 2009
              • 290

              #7
              Hi,

              If it helps I wll give you a specific real example that
              many of us have heard of....

              Take the "hop" link that affiliates of clickbank use.

              If has the format: xxxx.PROD-ID.hop.cklickba nk.net

              When you click on a "hop-link" it does not go to clickbank.net
              but goes to the product sales page: www.hip-new-product.com
              so it redirects via some method ( I don't know what ) to that
              sales page.

              So what I want to do it capture that end url and then use it
              in another place - for example it could be in Alexa.com

              So to get site info from Alexa about a website I need to type in Alexa.com?url=w ww.hip-new-product.com.

              Instead of that I can do Alexa.com?url=$ finalurl

              where $finalurl comes from getting the redirect from cklickbank.net

              Hope that helps explain the process.

              It is true I don't know which type of redirect is being used, all I want it the final
              landing page url.

              Any ideas ?

              Comment

              • kovik
                Recognized Expert Top Contributor
                • Jun 2007
                • 1044

                #8
                I would suggest using cURL, as you have opted, and make sure that you set CURLOPT_FOLLOWL OCATION to true. This may not work for JavaScript redirects, but it is designed to work for server-side redirects. Use curl_getinfo() with the option CURLINFO_EFFECT IVE_URL. The 'url' key of the return value should have the final URL that you are looking for.

                Comment

                • jeddiki
                  Contributor
                  • Jan 2009
                  • 290

                  #9
                  Thanks,

                  I have managed to get it working.

                  The only problem is, it takes nearly four hours to process
                  all 11,000 websites.

                  This equates to one every 1.26 seconds.

                  Do you think there is a quicker method ?

                  May be I should put some of my code into a function - although
                  I don't know how much of it:

                  This is my code:

                  Code:
                  $sql_url = "SELECT id FROM my_temp ORDER BY id";
                  $result_url = mysql_query($sql_url)	or write_error("Could not SELECT id FROM my_temp ".mysql_error()." \r\n"); 	
                  
                  $ctr = 1;
                   
                  while($row_url = mysql_fetch_assoc($result_url)){
                  
                     $my_code = $row_url['id'];
                     $target_url = "http://zzzzz.$my_code.example.com/";  
                     
                     $ch = curl_init();
                     curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
                     curl_setopt($ch, CURLOPT_URL,$target_url);
                     curl_setopt($ch, CURLOPT_FAILONERROR, true);
                     curl_setopt($ch, CURLOPT_STDERR, $ceh);        
                     curl_setopt($ch, CURLOPT_VERBOSE, 1);
                     curl_setopt($ch, CURLOPT_HEADER,1); 
                     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
                     curl_setopt($ch, CURLOPT_AUTOREFERER, true);
                     curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
                     curl_setopt($ch, CURLOPT_TIMEOUT, 20);
                  
                     $output = curl_exec($ch);
                     $info = curl_getinfo($ch);
                  
                     if ($output === FALSE ) {
                        write_log("No cURL data returned for $target_url [". $info['http_code']. "]\r\n ");
                  					
                        if (curl_error($ch))  {
                           write_log($output."CURL error number: curl_errno($ch)\r\n CURL error: curl_error($ch)\r\n");
                    	  }
                        }		
                     else {
                  	$new_url = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL);
                  	$new_url = mysql_real_escape_string($cb_url); 
                  					
                  	$new_dom = GetDomain($new_url);
                  	$new_dom = mysql_real_escape_string($new_dom);
                   
                  	write_log("LOOK UP URL: $ctr) $target_url: $new_url, $new_dom\r\n");
                  
                  	$sql_temp_url = "UPDATE my_temp SET	
                  	url = '$new_url',	
                  	dom = '$new_dom'
                  	WHERE id = '$newcode' ";
                  
                  	$result_temp_url = mysql_query($sql_temp_url)
                  		   or write_error("Could not UPDATE my_temp url # $ctr.".mysql_error()." \r\n"); 		
                  	write_log("WRITEN URL: $ctr) $cb_code: $new_url, $new_dom\r\n");
                  	}
                  
                      curl_close($ch);
                   $ctr++; 
                   }
                  If the curl should go into a function to make the while loop faster,
                  I am not sure how much should go in the function.

                  Would it be just this:

                  Code:
                  function do_curl($target_url) {
                     $ch = curl_init();
                     curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
                     curl_setopt($ch, CURLOPT_URL,$target_url);
                     curl_setopt($ch, CURLOPT_FAILONERROR, true);
                     curl_setopt($ch, CURLOPT_STDERR, $ceh);        
                     curl_setopt($ch, CURLOPT_VERBOSE, 1);
                     curl_setopt($ch, CURLOPT_HEADER,1); 
                     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
                     curl_setopt($ch, CURLOPT_AUTOREFERER, true);
                     curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
                     curl_setopt($ch, CURLOPT_TIMEOUT, 20);
                  
                     $output = curl_exec($ch);
                     return($output);
                  }
                  Then call do_curl($target _url); ?

                  Or should the checking if's go up in the function a well ?

                  Would do you recommend to make it most efficient ?

                  If doing this won't make any difference to the speed of execution, then
                  for readability, I will leave it as it is. :)

                  Would appreciate any input.



                  Thanks.

                  Comment

                  • dlite922
                    Recognized Expert Top Contributor
                    • Dec 2007
                    • 1586

                    #10
                    get a faster internet connection. :)





                    Dan

                    Comment

                    • Curl Guy

                      #11
                      Look at curl_multi_init . I just finished writing a checked that you pass an array of urls to and it checks them all at once. I've tested with up to 1000 urls at a time and the execution time is roughly the same as the slowest responsing url.

                      Mine is based off this: http://www.somacon.com/p537.php

                      Comment

                      Working...