Regex question

**Ron Bullman** · Nov 22 '05, 01:39 AM

Re: Regex question

remy,

How bout <a>(?<1>.+?)</a>

Ron
"remy rakic" <liquid@spamhol e.com> wrote in message
news:ea5aHMmUDH A.2272@TK2MSFTN GP11.phx.gbl...[color=blue]
> Hi all, i was trying to parse some HTML and found myself in trouble with
> some regex processing (which i have never done before).
>
> What i am trying to do is to get content between two tags, including any
> html code. I can do stuff like this:
> "<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously[/color]
only[color=blue]
> gets regular text content but no html tags, i wonder if someone could
> enlighten me on which regex to use in order to get results "<really>Re ally
> not<cool/><at>all</at>" and "Absolutely not" on the string
> "<tag><tag2><a> <really>Reall y
> not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
> not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure[/color]
whether[color=blue]
> the site is XHTML compliant or not (as the example is no xml))
>
> Should i process the content twice, or give up the regex approach for a
> regular 'string index' parsing?
> Thanks in advance
>
>[/color]

**Ron Bullman** · Nov 22 '05, 01:39 AM

Re: Regex question

remy,

How bout <a>(?<1>.+?)</a>

Ron
"remy rakic" <liquid@spamhol e.com> wrote in message
news:ea5aHMmUDH A.2272@TK2MSFTN GP11.phx.gbl...[color=blue]
> Hi all, i was trying to parse some HTML and found myself in trouble with
> some regex processing (which i have never done before).
>
> What i am trying to do is to get content between two tags, including any
> html code. I can do stuff like this:
> "<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously[/color]
only[color=blue]
> gets regular text content but no html tags, i wonder if someone could
> enlighten me on which regex to use in order to get results "<really>Re ally
> not<cool/><at>all</at>" and "Absolutely not" on the string
> "<tag><tag2><a> <really>Reall y
> not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
> not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure[/color]
whether[color=blue]
> the site is XHTML compliant or not (as the example is no xml))
>
> Should i process the content twice, or give up the regex approach for a
> regular 'string index' parsing?
> Thanks in advance
>
>[/color]

**Ron Bullman** · Nov 22 '05, 01:39 AM

Re: Regex question

remy,

How bout <a>(?<1>.+?)</a>

Ron
"remy rakic" <liquid@spamhol e.com> wrote in message
news:ea5aHMmUDH A.2272@TK2MSFTN GP11.phx.gbl...[color=blue]
> Hi all, i was trying to parse some HTML and found myself in trouble with
> some regex processing (which i have never done before).
>
> What i am trying to do is to get content between two tags, including any
> html code. I can do stuff like this:
> "<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously[/color]
only[color=blue]
> gets regular text content but no html tags, i wonder if someone could
> enlighten me on which regex to use in order to get results "<really>Re ally
> not<cool/><at>all</at>" and "Absolutely not" on the string
> "<tag><tag2><a> <really>Reall y
> not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
> not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure[/color]
whether[color=blue]
> the site is XHTML compliant or not (as the example is no xml))
>
> Should i process the content twice, or give up the regex approach for a
> regular 'string index' parsing?
> Thanks in advance
>
>[/color]

**remy rakic** · Nov 22 '05, 01:40 AM

Re: Regex question

Aaah the non greedy option, now i know what it is used for. Thx ron, it
works like a charm !

"Ron Bullman" <ron.bulman@mai l.com> wrote in message
news:O5wWmeqUDH A.2156@TK2MSFTN GP11.phx.gbl...[color=blue]
> remy,
>
> How bout <a>(?<1>.+?)</a>
>
>
> Ron
> "remy rakic" <liquid@spamhol e.com> wrote in message
> news:ea5aHMmUDH A.2272@TK2MSFTN GP11.phx.gbl...[color=green]
> > Hi all, i was trying to parse some HTML and found myself in trouble with
> > some regex processing (which i have never done before).
> >
> > What i am trying to do is to get content between two tags, including any
> > html code. I can do stuff like this:
> > "<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously[/color]
> only[color=green]
> > gets regular text content but no html tags, i wonder if someone could
> > enlighten me on which regex to use in order to get results[/color][/color]
"<really>Re ally[color=blue][color=green]
> > not<cool/><at>all</at>" and "Absolutely not" on the string
> > "<tag><tag2><a> <really>Reall y
> > not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
> > not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure[/color]
> whether[color=green]
> > the site is XHTML compliant or not (as the example is no xml))
> >
> > Should i process the content twice, or give up the regex approach for a
> > regular 'string index' parsing?
> > Thanks in advance
> >
> >[/color]
>
>[/color]

**remy rakic** · Nov 22 '05, 01:40 AM

Re: Regex question

Aaah the non greedy option, now i know what it is used for. Thx ron, it
works like a charm !

"Ron Bullman" <ron.bulman@mai l.com> wrote in message
news:O5wWmeqUDH A.2156@TK2MSFTN GP11.phx.gbl...[color=blue]
> remy,
>
> How bout <a>(?<1>.+?)</a>
>
>
> Ron
> "remy rakic" <liquid@spamhol e.com> wrote in message
> news:ea5aHMmUDH A.2272@TK2MSFTN GP11.phx.gbl...[color=green]
> > Hi all, i was trying to parse some HTML and found myself in trouble with
> > some regex processing (which i have never done before).
> >
> > What i am trying to do is to get content between two tags, including any
> > html code. I can do stuff like this:
> > "<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously[/color]
> only[color=green]
> > gets regular text content but no html tags, i wonder if someone could
> > enlighten me on which regex to use in order to get results[/color][/color]
"<really>Re ally[color=blue][color=green]
> > not<cool/><at>all</at>" and "Absolutely not" on the string
> > "<tag><tag2><a> <really>Reall y
> > not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
> > not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure[/color]
> whether[color=green]
> > the site is XHTML compliant or not (as the example is no xml))
> >
> > Should i process the content twice, or give up the regex approach for a
> > regular 'string index' parsing?
> > Thanks in advance
> >
> >[/color]
>
>[/color]

**remy rakic** · Nov 22 '05, 01:40 AM

Re: Regex question

Aaah the non greedy option, now i know what it is used for. Thx ron, it
works like a charm !

"Ron Bullman" <ron.bulman@mai l.com> wrote in message
news:O5wWmeqUDH A.2156@TK2MSFTN GP11.phx.gbl...[color=blue]
> remy,
>
> How bout <a>(?<1>.+?)</a>
>
>
> Ron
> "remy rakic" <liquid@spamhol e.com> wrote in message
> news:ea5aHMmUDH A.2272@TK2MSFTN GP11.phx.gbl...[color=green]
> > Hi all, i was trying to parse some HTML and found myself in trouble with
> > some regex processing (which i have never done before).
> >
> > What i am trying to do is to get content between two tags, including any
> > html code. I can do stuff like this:
> > "<a>([\w\s]*)</a>" on "<a>Not cool</a><a>Absolutel y not</a>" obviously[/color]
> only[color=green]
> > gets regular text content but no html tags, i wonder if someone could
> > enlighten me on which regex to use in order to get results[/color][/color]
"<really>Re ally[color=blue][color=green]
> > not<cool/><at>all</at>" and "Absolutely not" on the string
> > "<tag><tag2><a> <really>Reall y
> > not<cool/><at>all</at></a></tag2>...<tag3>< a>Absolutely
> > not</a></tag3></tag>" ? (Notice i can't use Xpath since i'm not sure[/color]
> whether[color=green]
> > the site is XHTML compliant or not (as the example is no xml))
> >
> > Should i process the content twice, or give up the regex approach for a
> > regular 'string index' parsing?
> > Thanks in advance
> >
> >[/color]
>
>[/color]

Regex question

Regex question

Comment

Comment

Comment

Comment

Comment

Comment