Eregi pattern matching - bit of a challenge I thinks

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • NimP

    Eregi pattern matching - bit of a challenge I thinks

    Hi,. I'm trying to detect any links that are contained within an html page
    using eregi pattern matching. I was wondering if there are any pattern
    matching geniuses out there who could write a pattern that merges all the
    different manners in which a link could be wriiten,

    Current patterns I can think of include:

    <a href=x.com> no spaces betwen href, equals and url, no quotation marks
    around url
    <a href =x.com> space between href and equals, no space between equals and
    url, no quotation marks round url
    <a href= x.com> no space between href and equals, space between equals and
    url, no quotation marks around url
    <a href = x.com> space between href and equals, space between equals and
    url, no quotation marks round url


    <a href='x.com'> no spaces betwen href, equals and url, single quotation
    marks around url
    <a href ='x.com'> space between href and equals, no space between equals and
    url, single quotation marks round url
    <a href= 'x.com'> no space between href and equals, space between equals and
    url, single quotation marks around url
    <a href = 'x.com'> space between href and equals, space between equals and
    url, single quotation marks round url

    <a href="x.com"> no spaces betwen href, equals and url, double quotation
    marks around url
    <a href ="x.com"> space between href and equals, no space between equals and
    url, double quotation marks round url
    <a href= "x.com"> no space between href and equals, space between equals and
    url, double quotation marks around url
    <a href = "x.com"> space between href and equals, space between equals and
    url, double quotation marks round url

    <a href='x.com"> no spaces betwen href, equals and url, mismatched quotation
    marks around url - single open, double to close
    <a href ='x.com"> space between href and equals, no space between equals and
    url, mismatched quotation marks around url - single open, double to close
    <a href= 'x.com"> no space between href and equals, space between equals and
    url,mismatched quotation marks around url - single open, double to close
    <a href = 'x.com"> space between href and equals, space between equals and
    url, mismatched quotation marks around url - single open, double to close

    <a href="x.com'> no spaces betwen href, equals and url, mismatched quotation
    marks around url - double open, single to close
    <a href ="x.com'> space between href and equals, no space between equals and
    url, mismatched quotation marks around url - double open, single to close
    <a href= "x.com'> no space between href and equals, space between equals and
    url,mismatched quotation marks around url - double open, single to close
    <a href = "x.com'> space between href and equals, space between equals and
    url,mismatched quotation marks around url - double open, single to close


    I guess whats needed is something more advanced than

    eregi("href=\"/(.*)\">",string ,$arryaholding_ results))

    I'd appreciate any help you could give,

    Thanks
    NimP






  • Jon Kraft

    #2
    Re: Eregi pattern matching - bit of a challenge I thinks

    "NimP" <stu@sturobbie. co.uk> wrote:
    [color=blue]
    > Hi,. I'm trying to detect any links that are contained within an html
    > page using eregi pattern matching. I was wondering if there are any
    > pattern matching geniuses out there who could write a pattern that
    > merges all the different manners in which a link could be wriiten,[/color]


    I'm sure there is an easier solution out there somewhere, but by going
    through your examples I came up with that (wouldn't validate an URL
    though):

    preg_match("/<a(\s)+href(\s) *=(\s)*(['\"])*([a-z0-9_\-\.])+(['\"])*>/i",
    $string, $matches);

    echo htmlentities($m atches[0]);

    JOn

    Comment

    Working...