preg_match_all

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Anthony Smith

    preg_match_all

    I am trying to take a web page and get all of the links. It almost
    works, but I am missing a few links.
    Here is what I am using.
    preg_match_all( '/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
    $s,$matches,PRE G_SET_ORDER);


    It will not pick up links like this:

    <a class="highligh t" href="browse.ph p?region=West
    +Tennessee&amp; zips=38115&amp; mgrp=13&amp;p=2 ">
    <b>Next &gt;</b>
    </a>


    How do I get it to pickup hrefs like the one above?
  • Rik Wasmus

    #2
    Re: preg_match_all

    On Sat, 31 May 2008 16:49:30 +0200, Anthony Smith <mrsmithq@hotma il.com>
    wrote:
    I am trying to take a web page and get all of the links. It almost
    works, but I am missing a few links.
    Here is what I am using.
    preg_match_all( '/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
    $s,$matches,PRE G_SET_ORDER);
    >
    >
    It will not pick up links like this:
    >
    <a class="highligh t" href="browse.ph p?region=West
    +Tennessee&amp; zips=38115&amp; mgrp=13&amp;p=2 ">
    <b>Next &gt;</b>
    </a>
    >
    >
    How do I get it to pickup hrefs like the one above?
    Add the /s modifier
    --
    Rik Wasmus
    ....spamrun finished

    Comment

    • AnrDaemon

      #3
      Re: preg_match_all

      Greetings, Rik Wasmus.
      In reply to Your message dated Saturday, May 31, 2008, 19:08:16,
      >I am trying to take a web page and get all of the links. It almost
      >works, but I am missing a few links.
      >Here is what I am using.
      >preg_match_all ('/href=[\"\']?([^\"\'>]*)[\"\']?[^>]*>(.*?)<\/a>/i',
      > $s,$matches,PRE G_SET_ORDER);
      >>
      >>
      >It will not pick up links like this:
      >>
      > <a class="highligh t" href="browse.ph p?region=West
      >+Tennessee&amp ;zips=38115&amp ;mgrp=13&amp;p= 2">
      > <b>Next &gt;</b>
      > </a>
      >>
      >>
      >How do I get it to pickup hrefs like the one above?
      Add the /s modifier
      That would work, after some deeper think about it...
      But I wish to offer a bit different approach:

      preg_match_all( '#href=(?:([\"\'])([^\"\'>]\S*?)\1[^>]*|([^>\"\']+))>(.*?)</a>#is', $s, $matches, PREG_SET_ORDER) ;

      It have one downside: your URL will be in (2) or (3) depends on the quotes
      around URL.
      So you must pull result with construction like

      $url_link = empty($matches[N][3]) ? $matches[N][2] : $matches[N][3];
      $url_text = $matches[N][4];


      --
      Sincerely Yours, AnrDaemon <anrdaemon@free mail.ru>

      Comment

      • AnrDaemon

        #4
        Re: preg_match_all

        Greetings, AnrDaemon.
        In reply to Your message dated Wednesday, June 4, 2008, 23:00:34,
        preg_match_all( '#href=(?:([\"\'])([^\"\'>]\S*?)\1[^>]*|([^>\"\']+))>(.*?)</a>#is', $s, $matches, PREG_SET_ORDER) ;
        Regexp should be spelled as
        '#href=(?:([\"\'])([^\"\'>]\S*?)\1|([^>\"\'\s]+))[^>]*>(.*?)</a>#is'


        --
        Sincerely Yours, AnrDaemon <anrdaemon@free mail.ru>

        Comment

        Working...