Passing extended chars

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Sean O'Dwyer

    Passing extended chars

    I sometimes need to find a set of records via PHP/SQL with
    non-English/extended characters fed to the query via a hyperlink.

    For example, I have a navigation link on a site that tries to pass the
    phrase "Gesundheit und Schönheit" via http with the extended character
    (ö) correctly encoded as an HTML entity (ö)



    However "Gesundheit und Schönheit" is not passed to my variable, only
    "Gesundheit und Sch". I reckon the ampersand is causing trouble.

    If correctly encoding extend chars as html entities isn't working for
    what I want, how can I encode them for storage in XHTML or otherwise get
    the result I want?

    TIA,

    Sean
  • Daniel Tryba

    #2
    Re: Passing extended chars

    Sean O'Dwyer <nospam@spamfre e.dud> wrote:[color=blue]
    > phrase "Gesundheit und Sch?nheit" via http with the extended character
    > (?) correctly encoded as an HTML entity (&ouml;)
    >
    > http://www.domain.com/page.php?indus...Sch&ouml;nheit[/color]
    [snip][color=blue]
    > If correctly encoding extend chars as html entities isn't working for
    > what I want, how can I encode them for storage in XHTML or otherwise get
    > the result I want?[/color]

    You are creating an _URL_ so if you want/need encode a tring for passing
    in the _URL_ you need to _URL encode_ it:
    PHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.

    and


    BTW read the note under exmaple 1 of last url.

    Comment

    • John Dunlop

      #3
      Re: Passing extended chars

      Sean O'Dwyer wrote:
      [color=blue]
      > http://www.domain.com/page.php?indus...Sch&ouml;nheit[/color]

      (Is that an example? If so, please follow RFC2606 and use
      reserved domain names which won't conflict with current or
      future ones; e.g., <http://host.invalid/>.)

      URIs are made up of only a subset of US-ASCII, so after the
      entity &ouml; is replaced, that isn't a URI. You can
      convert that IRI to a URI by converting 'ö' (U+00F6) to its
      UTF-8 encoding and then percent-encode each octet. Thus



      I don't know what happens in the wild, but that's the
      ratified way of encoding characters that are not allowed in
      URIs. See RFC3987 sec. 3.1.



      Here's how the expert Martin Dürst set up his URI:


      [color=blue]
      > However "Gesundheit und Schönheit" is not passed to my variable, only
      > "Gesundheit und Sch". I reckon the ampersand is causing trouble.[/color]

      &ouml; is simply a way to represent the character LATIN
      SMALL LETTER O WITH DIAERESIS in HTML. The trouble is that
      that character is not allowed unencoded in URIs.

      --
      Jock

      Comment

      Working...