Remove accent marks from text?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • MC

    Remove accent marks from text?

    Is there a string function in .NET that will remove the accent marks from letters? I know that's a slightly vague request... and that I could implement it by table lookup (and will do so unless something's already there). But can it be accomplished by switching a string among "cultures" or something like that?
  • =?Utf-8?B?TW9ydGVuIFdlbm5ldmlrIFtDIyBNVlBd?=

    #2
    RE: Remove accent marks from text?


    "MC" wrote:
    Is there a string function in .NET that will remove the accent marks from letters? I know that's a slightly vague request... and that I could implement it by table lookup (and will do so unless something's already there). But can it be accomplished by switching a string among "cultures" or something like that?
    >
    Hi,

    You can remove non spacing characters (and possibly modifier characters)
    from the string if you normalize it. This will effectively remove accents
    (diacritics) as well.

    string normalizedStrin g = regularString.N ormalize(Normal izationForm.For mD);

    StringBuilder sb = new StringBuilder(n ormalizedString );

    for (int i = 0; i < sb.Length; i++)
    {
    if (CharUnicodeInf o.GetUnicodeCat egory(sb[i]) ==
    UnicodeCategory .NonSpacingMark )
    sb.Remove(i, 1);
    }
    regularString = sb.ToString();

    --
    Happy Coding!
    Morten Wennevik [C# MVP]

    Comment

    • MC

      #3
      Re: Remove accent marks from text?

      You can remove non spacing characters (and possibly modifier characters)
      from the string if you normalize it. This will effectively remove accents
      (diacritics) as well.
      Thanks. I should have been clearer. Not only do I want to remove non-spacing characters, I also want to change accented letters to the corresponding unaccented letters. (This is for matching up foreign names... somebody long ago decided the database needed to be in plain ASCII.)

      Comment

      • Jeff Johnson

        #4
        Re: Remove accent marks from text?

        "MC" <for.address.lo ok@www.ai.uga.e du.slash.mcwrot e in message
        news:%23QNZyzDO JHA.1908@TK2MSF TNGP04.phx.gbl. ..
        >You can remove non spacing characters (and possibly modifier characters)
        >from the string if you normalize it. This will effectively remove
        >accents
        >(diacritics) as well.
        >
        Thanks. I should have been clearer. Not only do I want to remove
        non-spacing
        characters, I also want to change accented letters to the corresponding
        unaccented letters. (This is for matching up foreign names... somebody
        long
        ago decided the database needed to be in plain ASCII.)
        Here's hoping no one has used alternate spellings, like <letter>+e for
        German umlauted letters. And will the es-tset get translated to "ss"...?


        Comment

        • =?Utf-8?B?TW9ydGVuIFdlbm5ldmlrIFtDIyBNVlBd?=

          #5
          Re: Remove accent marks from text?


          "MC" wrote:
          You can remove non spacing characters (and possibly modifier characters)
          from the string if you normalize it. This will effectively remove accents
          (diacritics) as well.
          >
          Thanks. I should have been clearer. Not only do I want to remove non-spacing characters, I also want to change accented letters to the corresponding unaccented letters. (This is for matching up foreign names... somebody long ago decided the database needed to be in plain ASCII

          That is exactly what you achieve by first normalizing (using FormD) and then
          removing nonspacing characters. The normalized string will contain an ascii
          character followed by a non spacing modifier character which when combined
          will be the original character. Remove the non spacing characters and all
          that remains is the unaccented text.

          --
          Happy Coding!
          Morten Wennevik [C# MVP]

          Comment

          Working...