I'm writing an application to scrape the code from client web sites to
look for links on the pages. I am using file_get_conten ts() function to
grad the code, but I don't know how to control for web sites that may
be down or unavailable. I know file_get_conten ts() returns FALSE on a
failure, but the error message still prints to the screen. How do I
avoid that?
Here a snippet of my code:
function urlcheck($url, $sitelink) {
// grab code from web site
if (file_get_conte nts($url)){
$html = file_get_conten ts($url);
//REGEX to pull the link code out of the array
$relink = "/<a.+?href=[\"\'](.*?)[\"\'].+?\>/i";
// Put the matching link code into an array called links
preg_match_all( $relink, $html, $links);
// loop through links on the page and look for a match
for ($i=0; $i< count($links[0]); $i++) {
if ( strpos($links[1][$i], $sitelink) != false ||
strpos($links[1][$i], $sitelink) === 0 ) {
return $links[0][$i];
break;
}
}
}
else {
print "Doesn't exist";
}
}
Thanks
Clint Pidlubny
look for links on the pages. I am using file_get_conten ts() function to
grad the code, but I don't know how to control for web sites that may
be down or unavailable. I know file_get_conten ts() returns FALSE on a
failure, but the error message still prints to the screen. How do I
avoid that?
Here a snippet of my code:
function urlcheck($url, $sitelink) {
// grab code from web site
if (file_get_conte nts($url)){
$html = file_get_conten ts($url);
//REGEX to pull the link code out of the array
$relink = "/<a.+?href=[\"\'](.*?)[\"\'].+?\>/i";
// Put the matching link code into an array called links
preg_match_all( $relink, $html, $links);
// loop through links on the page and look for a match
for ($i=0; $i< count($links[0]); $i++) {
if ( strpos($links[1][$i], $sitelink) != false ||
strpos($links[1][$i], $sitelink) === 0 ) {
return $links[0][$i];
break;
}
}
}
else {
print "Doesn't exist";
}
}
Thanks
Clint Pidlubny
Comment