Remove HTML tags (except anchor tag) from a string using regularexpressions

**Anand** · Jul 18 '05, 08:56 PM

Re: Remove HTML tags (except anchor tag) from a string using regular expressions

How about...

import re
content = re.sub('<([^!(a>)]([^(/a>)]|\n)*)>', '', content)
Seems to work for me.

HTH

-Anand

**Anand** · Jul 18 '05, 08:56 PM

Re: Remove HTML tags (except anchor tag) from a string using regular expressions

I meant
content = re.sub ('<[^!(a>)]([^>]|\n)*[^!(/a)]>', '', content)

Sorry for the mistake.
However this seems to also print tags like <b>, <p> etc
also.

-Anand

**Max M** · Jul 18 '05, 08:56 PM

Re: Remove HTML tags (except anchor tag) from a string using regularexpressi ons

Nico Grubert wrote:

If it's not to learn, and you simply want it to work, try out this library:

404 Not Found

http://zope.org/Members/chrisw/StripOGram/readme

--

hilsen/regards Max M, Denmark

mxm - IT's Mad Science

http://www.mxm.dk/

IT's Mad Science

**Gabriel Cooper** · Jul 18 '05, 09:00 PM

Re: Remove HTML tags (except anchor tag) from a string using regularexpressi ons

Max M wrote:
[color=blue]
> If it's not to learn, and you simply want it to work, try out this
> library:
>
> http://zope.org/Members/chrisw/StripOGram/readme
>
>[color=green][color=darkred]
>>> stripogram.html 2safehtml('''fi rst > last''',valid_t ags=('i','a','b r'))[/color][/color][/color]
'first > last'[color=blue][color=green][color=darkred]
>>> stripogram.html 2safehtml('''fi rst < last''',valid_t ags=('i','a','b r'))[/color][/color][/color]
'first first '

keeping in mind that bare ">" and "<" are invalid HTML (should be >
and <), why'd it leave the greater than and why are there two "first"'s ?

Remove HTML tags (except anchor tag) from a string using regularexpressions

Remove HTML tags (except anchor tag) from a string using regularexpressions

Comment

Comment

Comment

Comment