how to use 2 patterns for preg_match_all function PHP

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chazzy69
    New Member
    • Sep 2007
    • 196

    how to use 2 patterns for preg_match_all function PHP

    Basically i am trying to understand and learn how to make a php spider (yes i know its not efficient but its only for a single website at a time).

    Now the problem i am having is with the
    Code:
    preg_match_all ()
    function, the specific use is ->

    Code:
    preg_match_all( "#href=\"(https?://[&=a-zA-Z0-9-_./]+)\"#si", $html2, $links2 );
    Now this bit of code will find all url's on a page that are like this -> "http://www.google.com"

    BUT will not find url's like this -> "/main/index.html",

    So to get around this i figured out the pattern for the second type of url ->

    Code:
    preg_match_all( "#href=\"(/[&=a-zA-Z0-9-_./]+)\"#si", $html1, $links1 );
    Now what i am trying to achieve is to join these to patterns into a single function or something that gets a similar results for example like this ->

    Code:
    preg_match_all( "#href=\"(https?://[&=a-zA-Z0-9-_./]+)\"#si" || "#href=\"(/[&=a-zA-Z0-9-_./]+)\"#si", $html1, $links1);
    Note i have tried this method of trying to use an or (||) operator to join the patterns, it DID NOT WORK.

    So any help in figuring out how to get the 2 patterns into a single function would be great! note: that i am pretty bad at understanding how the patterns actually work.

    Any help is greatly appreciated in advance, Thanks :D
  • chazzy69
    New Member
    • Sep 2007
    • 196

    #2
    its was painfull but i finally found a good tutorial on the use of 'regular expression' can be found at

    -> 'http://www.tipsntutori als.com/tutorials/PHP/50'

    Anyway here was the solution i managed to work out

    ->
    Code:
    preg_match_all( "#href=\"(((https?://)|(/))[&=a-zA-Z0-9-_./]+)\"#si", $html, $links );
    it was just a matter of using the OR (|) operator correctly, thanks for the help anyways :D

    Comment

    Working...