Need a Regular expression to remove a char for Unicode text

**harvey.thomas@informa.com** · Oct 13 '06, 11:25 AM

Re: Need a Regular expression to remove a char for Unicode text

à°¶à±à°°à±€à°¨ à°¿à°µà°¾à°¸ wrote:

Hai friends,
Can any one tell me how can i remove a character from a unocode text.
à°•à°²à±â€Œ&à° ¹à°¾à°° is a Telugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '&' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.
>
Thanks and regards
Srinivasa Raju Datla

Don't know anything about Telugu, but is this the approach you want?

>>x=u'\xfe\xf f & \xfe\xff \xfe\xff&\xfe\x ff'
>>noampre = re.compile('(?< !\s)&(?!\s)', re.UNICODE).sub
>>noampre('', x)

u'\xfe\xff & \xfe\xff \xfe\xff\xfe\xf f'

The regular expression has negative look behind and look ahead
assertions to check that there is no whitespace surrounding the '&'
character. Each match then found is then replaced with the empty string

**Sybren Stuvel** · Oct 13 '06, 11:35 AM

Re: Need a Regular expression to remove a char for Unicode text

à°¶à±à°°à±€à°¨ à°¿à°µà°¾à°¸ enlightened us with:

Can any one tell me how can i remove a character from a unocode
text. à°•à°²à±<200c> &à°¹à°¾à°° is a Telugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more
thing, if any whitespaces are there before and after '&' char, the
text should be kept as it is.

So basically, you want to match <200c>& and replace it with <200c>,
but only if it's not surrounded by whitespace, right?

r"(?<!\s)\x200c &(?!\s)" should match. I'm sure you'll be able to take
it from there.

Sybren
--
Sybren StÃ¼vel
StÃ¼vel IT - http://www.stuvel.eu/

**Leo Kislov** · Oct 13 '06, 11:35 AM

Re: Need a Regular expression to remove a char for Unicode text

On Oct 13, 4:44Â am, harvey.tho...@i nforma.com wrote:

à°¶à±à°°à±€à°¨ à°¿à°µà°¾à°¸ wrote:

Hai friends,
Can any one tell me how can i remove a character from a unocode text.
à°•à°²à±â€Œ&à° ¹à°¾à°° is a Telugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '&' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.

>

Thanks and regards
Srinivasa Raju DatlaDon't know anything about Telugu, but is this the approach you want?

>

>x=u'\xfe\xff & \xfe\xff \xfe\xff&\xfe\x ff'
>noampre = re.compile('(?< !\s)&(?!\s)', re.UNICODE).sub
>noampre('', x)

He wants to replace & with zero width joiner so the last call should be
noampre(u"\u200 D", x)

**Leo Kislov** · Oct 13 '06, 11:55 AM

Re: Need a Regular expression to remove a char for Unicode text

On Oct 13, 4:55Â am, "Leo Kislov" <Leo.Kis...@gma il.comwrote:

On Oct 13, 4:44Â am, harvey.tho...@i nforma.com wrote:
>

à°¶à±à°°à±€à°¨ à°¿à°µà°¾à°¸ wrote:

Hai friends,
Can any one tell me how can i remove a character from a unocode text.
à°•à°²à±â€Œ&à° ¹à°¾à°° is aTelugu word in Unicode. Here i want to
remove '&' but not replace with a zero width char. And one more thing,
if any whitespaces are there before and after '&' char, the text should
be kept as it is. Please tell me how can i workout this with regular
expressions.

>

Thanks and regards
Srinivasa Raju DatlaDon't know anything about Telugu, but is this theapproach you want?

>

>>x=u'\xfe\xf f & \xfe\xff \xfe\xff&\xfe\x ff'
>>noampre = re.compile('(?< !\s)&(?!\s)', re.UNICODE).sub
>>noampre('', x)

He wants to replace & with zero width joiner so the last call should be
noampre(u"\u200 D", x)

Pardon my poor reading comprehension, OP doesn't want zero width
joiner. Though I'm confused why he mentioned it at all.

Need a Regular expression to remove a char for Unicode text

Need a Regular expression to remove a char for Unicode text

Comment

Comment

Comment

Comment