Regex question

**remy rakic** · Nov 22 '05, 01:39 AM

Hi all, i was trying to parse some HTML and found myself in trouble with
some regex processing (which i have never done before).

What i am trying to do is to get content between two tags, including any
html code. I can do stuff like this:
"<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously only
gets regular text content but no html tags, i wonder if someone could
enlighten me on which regex to use in order to get results "<really>Re ally
not<cool/><at>all</at>" and "Absolutely not" on the string
"<tag><tag2><a> <really>Reall y
not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure whether
the site is XHTML compliant or not (as the example is no xml))

Should i process the content twice, or give up the regex approach for a
regular 'string index' parsing?
Thanks in advance