Regex question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • remy rakic

    Regex question

    Hi all, i was trying to parse some HTML and found myself in trouble with
    some regex processing (which i have never done before).

    What i am trying to do is to get content between two tags, including any
    html code. I can do stuff like this:
    "<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously only
    gets regular text content but no html tags, i wonder if someone could
    enlighten me on which regex to use in order to get results "<really>Re ally
    not<cool/><at>all</at>" and "Absolutely not" on the string
    "<tag><tag2><a> <really>Reall y
    not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
    not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether
    the site is XHTML compliant or not (as the example is no xml))

    Should i process the content twice, or give up the regex approach for a
    regular 'string index' parsing?
    Thanks in advance


Working...