How to deal with all of those MS Word Funky characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • gene.ellis@gmail.com

    How to deal with all of those MS Word Funky characters

    Put simply, I have a text box, and people commonly cut + paste
    information into this text box from Microsoft word. The problem is that
    word has all types of funky characters (smart quotes, em-dashes), that
    the system (php-based) doesn't understand. Does anyone know of a way to
    filter out these Microsoft-specific characters? Does PHP have a special
    function for this? Thanks a lot!

  • Leo Andrews

    #2
    Re: How to deal with all of those MS Word Funky characters

    gene.ellis@gmai l.com wrote:[color=blue]
    > Put simply, I have a text box, and people commonly cut + paste
    > information into this text box from Microsoft word. The problem is that
    > word has all types of funky characters (smart quotes, em-dashes), that
    > the system (php-based) doesn't understand. Does anyone know of a way to
    > filter out these Microsoft-specific characters? Does PHP have a special
    > function for this? Thanks a lot!
    >[/color]

    Hooray I can actually be of use to this group for once. Yes, if you look
    in the user notes on php.net for the htmlentities function you will see
    an entry from mail at britlinks dot com (19-May-2004 05:27). I've listed
    it below for reference. Mind you I'm sure the hardcore programmers on
    this group will be able to formulate a one-line regexp for this and we
    look forward to seeing it.

    In the meantime, I hope this helps.


    <?php
    // strips slashes, and converts special characters to HTML equivalents
    for string defined in $var
    function htmlfriendly($v ar,$nl2br = false){
    $chars = array(
    128 => '€',
    130 => '‚',
    131 => 'ƒ',
    132 => '„',
    133 => '…',
    134 => '†',
    135 => '‡',
    136 => 'ˆ',
    137 => '‰',
    138 => 'Š',
    139 => '‹',
    140 => 'Œ',
    142 => 'Ž',
    145 => '‘',
    146 => '’',
    147 => '“',
    148 => '”',
    149 => '•',
    150 => '–',
    151 => '—',
    152 => '˜',
    153 => '™',
    154 => 'š',
    155 => '›',
    156 => 'œ',
    158 => 'ž',
    159 => 'Ÿ');
    $var = str_replace(arr ay_map('chr', array_keys($cha rs)), $chars,
    htmlentities(st ripslashes($var )));
    if($nl2br){
    return nl2br($var);
    } else {
    return $var;
    }
    }
    ?>

    Comment

    Working...