Basically i am trying to understand and learn how to make a php spider (yes i know its not efficient but its only for a single website at a time).
Now the problem i am having is with the
function, the specific use is ->
Now this bit of code will find all url's on a page that are like this -> "http://www.google.com"
BUT will not find url's like this -> "/main/index.html",
So to get around this i figured out the pattern for the second type of url ->
Now what i am trying to achieve is to join these to patterns into a single function or something that gets a similar results for example like this ->
Note i have tried this method of trying to use an or (||) operator to join the patterns, it DID NOT WORK.
So any help in figuring out how to get the 2 patterns into a single function would be great! note: that i am pretty bad at understanding how the patterns actually work.
Any help is greatly appreciated in advance, Thanks :D
Now the problem i am having is with the
Code:
preg_match_all ()
Code:
preg_match_all( "#href=\"(https?://[&=a-zA-Z0-9-_./]+)\"#si", $html2, $links2 );
BUT will not find url's like this -> "/main/index.html",
So to get around this i figured out the pattern for the second type of url ->
Code:
preg_match_all( "#href=\"(/[&=a-zA-Z0-9-_./]+)\"#si", $html1, $links1 );
Code:
preg_match_all( "#href=\"(https?://[&=a-zA-Z0-9-_./]+)\"#si" || "#href=\"(/[&=a-zA-Z0-9-_./]+)\"#si", $html1, $links1);
So any help in figuring out how to get the 2 patterns into a single function would be great! note: that i am pretty bad at understanding how the patterns actually work.
Any help is greatly appreciated in advance, Thanks :D
Comment