A fast alternative to HTTP::head ?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ming

    A fast alternative to HTTP::head ?

    I have tens of thousands of links. All links will redirect visitors to
    another URL (the real URL). I want to know the real URLs behind those
    redirecting URLs.

    I just wrote a small piece of code to do it:

    $result = HTTP::head($lin k);
    if (PEAR::isError( $result)) {
    echo "Error: " . $result->getMessage()." <br>";
    } else {
    $chunks=split(' %2F',$result['Location']);
    echo "\t".$chunk s['2'].'<br>';
    }

    It works, but it is simply too slow to get the real (destination) URLs
    for tens of thousands of redirecting URLs.

    Is there a fast way to do that?

    Thanks,

  • shimmyshack

    #2
    Re: A fast alternative to HTTP::head ?

    On Jun 23, 6:17 am, Ming <minghu...@gmai l.comwrote:
    I have tens of thousands of links. All links will redirect visitors to
    another URL (the real URL). I want to know the real URLs behind those
    redirecting URLs.
    >
    I just wrote a small piece of code to do it:
    >
    $result = HTTP::head($lin k);
    if (PEAR::isError( $result)) {
    echo "Error: " . $result->getMessage()." <br>";
    } else {
    $chunks=split(' %2F',$result['Location']);
    echo "\t".$chunk s['2'].'<br>';
    }
    >
    It works, but it is simply too slow to get the real (destination) URLs
    for tens of thousands of redirecting URLs.
    >
    Is there a fast way to do that?
    >
    Thanks,
    php doesn't seem your best bet unless you run the code multiple times
    simultaneously to simulate multiple threads.
    what OS are you using? divide up the workload into 20 and run 20
    copies of the code, or 9 if you are xp sp2.

    Comment

    • Toby A Inkster

      #3
      Re: A fast alternative to HTTP::head ?

      Ming wrote:
      It works, but it is simply too slow to get the real (destination) URLs
      for tens of thousands of redirecting URLs.
      Well, one slow server will delay the whole thing. You might want to speed
      it up by using concurrency: i.e. you have a queue of tens of thousands of
      URLs which need "handling", and several "handlers" which each run a loop
      requesting a URL to resolve, resolving it and then storing the result.
      You'll also need one thread to be a "queue manager" and one to be a
      "result storer".

      Overall, as DNS and HTTP can be quite a slow business, I'd recommend about
      12 handlers, one queue manager and one storer. The queue manager and
      storer can be a SQL database server if you like!

      Now, technically PHP is capable of doing this, but some other languages,
      like Perl and C are a bit better for writing multi-threaded applications.

      --
      Toby A Inkster BSc (Hons) ARCS
      [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
      [OS: Linux 2.6.12-12mdksmp, up 3 days, 11:56.]

      A New Look for TobyInkster.co. uk

      Comment

      Working...