Is there a string function in .NET that will remove the accent marks from letters? I know that's a slightly vague request... and that I could implement it by table lookup (and will do so unless something's already there). But can it be accomplished by switching a string among "cultures" or something like that?
Remove accent marks from text?
Collapse
This topic is closed.
X
X
-
MCTags: None
-
=?Utf-8?B?TW9ydGVuIFdlbm5ldmlrIFtDIyBNVlBd?=
RE: Remove accent marks from text?
"MC" wrote:
Is there a string function in .NET that will remove the accent marks from letters? I know that's a slightly vague request... and that I could implement it by table lookup (and will do so unless something's already there). But can it be accomplished by switching a string among "cultures" or something like that?
>
You can remove non spacing characters (and possibly modifier characters)
from the string if you normalize it. This will effectively remove accents
(diacritics) as well.
string normalizedStrin g = regularString.N ormalize(Normal izationForm.For mD);
StringBuilder sb = new StringBuilder(n ormalizedString );
for (int i = 0; i < sb.Length; i++)
{
if (CharUnicodeInf o.GetUnicodeCat egory(sb[i]) ==
UnicodeCategory .NonSpacingMark )
sb.Remove(i, 1);
}
regularString = sb.ToString();
--
Happy Coding!
Morten Wennevik [C# MVP]
-
MC
Re: Remove accent marks from text?
You can remove non spacing characters (and possibly modifier characters)
from the string if you normalize it. This will effectively remove accents
(diacritics) as well.
Comment
-
Jeff Johnson
Re: Remove accent marks from text?
"MC" <for.address.lo ok@www.ai.uga.e du.slash.mcwrot e in message
news:%23QNZyzDO JHA.1908@TK2MSF TNGP04.phx.gbl. ..
>You can remove non spacing characters (and possibly modifier characters)
>from the string if you normalize it. This will effectively remove
>accents
>(diacritics) as well.
Thanks. I should have been clearer. Not only do I want to remove
non-spacing
characters, I also want to change accented letters to the corresponding
unaccented letters. (This is for matching up foreign names... somebody
long
ago decided the database needed to be in plain ASCII.)
German umlauted letters. And will the es-tset get translated to "ss"...?
Comment
-
=?Utf-8?B?TW9ydGVuIFdlbm5ldmlrIFtDIyBNVlBd?=
Re: Remove accent marks from text?
"MC" wrote:
You can remove non spacing characters (and possibly modifier characters)
from the string if you normalize it. This will effectively remove accents
(diacritics) as well.
Thanks. I should have been clearer. Not only do I want to remove non-spacing characters, I also want to change accented letters to the corresponding unaccented letters. (This is for matching up foreign names... somebody long ago decided the database needed to be in plain ASCII
That is exactly what you achieve by first normalizing (using FormD) and then
removing nonspacing characters. The normalized string will contain an ascii
character followed by a non spacing modifier character which when combined
will be the original character. Remove the non spacing characters and all
that remains is the unaccented text.
--
Happy Coding!
Morten Wennevik [C# MVP]
Comment
Comment