Escape to Unicode?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ReGenesis0@aol.com

    Escape to Unicode?

    I've begun dealing with PHP's XML functions (puttup!)

    I shoudl say- php's DEFAULT XML functions, no extensions. Probably not
    5.0. I don't care...

    The POINt is, they choke on funny characters, even encoded funny
    characters. You need to use the unicode. (change ñ to ñ).

    Whatevuh.

    That's why, why- now ignore that part, because it will distract and
    proably cause you to misconstrue the thrust of the question to follow:

    Does PHP have a function that will escape all funny characters in a
    string (encoded, unencoded, both, either...) to their unicode
    equivilants?

    In a string- ignore the XML parts of this question.

    (I'm looking at pre-proscessing the data coming into forms that will
    form the offending XML)

    -Derik

  • Toby Inkster

    #2
    Re: Escape to Unicode?

    ReGenesis0 wrote:
    [color=blue]
    > The POINt is, they choke on funny characters, even encoded funny
    > characters. You need to use the unicode. (change ñ to ñ).[/color]

    There is no such thing as ñ in generic XML. ñ is a purely
    HTML concept. There are only five pre-defined entities which XML parsers
    are expected to know:

    &
    <
    >
    "
    '

    If PHP understood ñ in generic XML it would be behaving
    *incorrectly*. ñ is undefined. (I'm assuming here that you've not
    written a DTD that defines what ñ means, which seems like a
    reasonable assumption.)
    [color=blue]
    > Does PHP have a function that will escape all funny characters in a
    > string (encoded, unencoded, both, either...) to their unicode
    > equivilants?[/color]

    It seems you want some function that converts:

    ñ => ñ
    € => €

    You might be able to do this using html_entity_dec ode() to get everything
    in its raw form (e.g. will convert € to €) and then use a regular
    expression to convert things into numeric character references (e.g. €
    to €). Such a regular expression can be found in soapergem at gmail
    dot com's 10 May 2006 comment here:


    That said, you're better off correcting the root problem -- that ñ
    is not correct XML.

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me ~ http://tobyinkster.co.uk/contact

    Comment

    • ReGenesis0@aol.com

      #3
      Re: Escape to Unicode?

      Toby Inkster wrote:
      [color=blue]
      > You might be able to do this using html_entity_dec ode() to get everything
      > in its raw form (e.g. will convert € to €) and then use a regular
      > expression to convert things into numeric character references (e.g. €
      > to €). Such a regular expression can be found in soapergem at gmail
      > dot com's 10 May 2006 comment here:
      > http://uk2.php.net/manual/en/function.htmlentities.php
      >
      > That said, you're better off correcting the root problem -- that ñ
      > is not correct XML.[/color]

      ....which is precicely what I indent to do-- I want to convert such tags
      as they come in as form inputs before they're sent to become XML files.

      I'm not asking a question sideways of the problem and missing something
      obvious, am I? I hate when that happens...

      -Derik

      Comment

      Working...