character conversion from MS Word to HTML

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • saul.baizman@gmail.com

    character conversion from MS Word to HTML


    Here's a brief description of the problem. My organization has a
    client who cuts and pastes information from Microsoft Word documents
    into web-based forms, whose contents is then displayed on a website. I
    wish to convert the special characters, such as ellipses and trademark
    symbols (and whatever else Word might throw at us) into a proper HTML
    entity (™) or character reference (®) if the entity does
    not exist.

    Before you make any suggestions, let me share a brief overview of my
    previous attempts at a solution so neither of us wastes his time.
    Right now, I'm using a combination of the character map returned by
    get_html_transl ation_table(HTM L_ENTITIES) and some kludgy code which
    manually maps the Unicode value of an MS Word special character to its
    HTML equivalent. For example,

    $replace_array[chr(226).chr(12 8).chr(152)] = "‘" ;

    I'd like to be able to do the above operation automatically / across
    the board for wacky Word characters. I suspect I may need to use the
    mbstring functions. If you have any advice, I'm happy to send helpful
    folks some chocolate for their troubles.

Working...