If I have random and unpredictable user agent strings containing URLs, what is
the best way to extract the URL?
For example, let's say the string looks like this:
registered NYSE 943 <a href="http://netforex.net"Fo rex Trading Network
Organization </ainfo@netforex.o rg
What's the best way to extract http://netforex.net ?
I have code that checks for identifiable browsers and bots, but when the agent
string has no identifiable information other than a URL, I want to grab the URL.
Here's a first crack at it:
..
..
..
[code omitted]
..
..
..
elseif (eregi("http://", $agent))
{
$agent = stristr($agent, "http://");
$agent = parse_url($agen t);
$agent = $agent['host'];
//check for subdomains
$agent_a = explode(".", $agent);
$agent_r = array_reverse($ agent_a);
$sub = count($agent_r) - 1;
$tld3 = substr($agent_r[0], 0, 3);
if (eregi("^(com|n et|org|edu|biz| gov)$", $tld3)) //common tld's
{
while ($sub 0)
{
$domain = $domain.$agent_ r[$sub].".";
$sub--;
}
$refurl = $domain.$tld3;
}
$referrer = "<a href='".$refurl ."'>".$refurl." </a>";
}
else
{
$referrer = "unknown";
}
Are there any PHP functions that will help here? How to handle sub domains?
International domains?
Thanks in advance.
the best way to extract the URL?
For example, let's say the string looks like this:
registered NYSE 943 <a href="http://netforex.net"Fo rex Trading Network
Organization </ainfo@netforex.o rg
What's the best way to extract http://netforex.net ?
I have code that checks for identifiable browsers and bots, but when the agent
string has no identifiable information other than a URL, I want to grab the URL.
Here's a first crack at it:
..
..
..
[code omitted]
..
..
..
elseif (eregi("http://", $agent))
{
$agent = stristr($agent, "http://");
$agent = parse_url($agen t);
$agent = $agent['host'];
//check for subdomains
$agent_a = explode(".", $agent);
$agent_r = array_reverse($ agent_a);
$sub = count($agent_r) - 1;
$tld3 = substr($agent_r[0], 0, 3);
if (eregi("^(com|n et|org|edu|biz| gov)$", $tld3)) //common tld's
{
while ($sub 0)
{
$domain = $domain.$agent_ r[$sub].".";
$sub--;
}
$refurl = $domain.$tld3;
}
$referrer = "<a href='".$refurl ."'>".$refurl." </a>";
}
else
{
$referrer = "unknown";
}
Are there any PHP functions that will help here? How to handle sub domains?
International domains?
Thanks in advance.
Comment