UTF-8 garbage characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • lkrubner@geocities.com

    UTF-8 garbage characters

    Pierre Goiffon Oct 6 2004, 4:29 am show options
    Newsgroups: comp.infosystem s.www.authoring.html[color=blue][color=green]
    >> The problem with charset UTF-8 on pages with forms for e.g.
    >> guestbooks, formmail and bloggs is that writing in a non-english
    >> language can give garbage characters from the letters that is not>
    >> represented in the english language. That's because what is writed in
    >> the text box don't get encoded, as text done with HTML editors does.[/color]
    >
    >I really can't understand your post. A server that sends a form to a client
    >with the appropriate charset headers should get in return all the users
    >input encoded in that charset. If the form is sent with a UTF-8 header, you
    >should get all the characters encoded in UTF-8. And so user could input any
    >character included in Unicode.[/color]

    I saw this old post and decided that I did not understand it.

    Suppose I have a form on a webpage and that form has a UTF-8 charset
    header. Suppose there is also a textarea in that form, and a submit
    button. Suppose I write something in Microsoft Word and use lots of
    strange characters, then I copy and paste it into the textarea and hit
    the submit button. At the other end, receiving the form, is a PHP
    script which takes that text and makes it a webpage, with a UTF-8
    charset header.

    If I understand what Pierre Goiffon is saying, then it sounds as if no
    garbage characters will appear on that page, no matter how many strange
    characters I used in the Word document. It sounds to me as if he is
    saying that everything will magically get transformed into a character
    that makes sense in UTF-8.

    Am I missing something? Surely that is not how it works?

  • Tim

    #2
    Re: UTF-8 garbage characters

    On 27 May 2005 12:19:25 -0700,
    lkrubner@geocit ies.com posted:
    [color=blue]
    > Suppose I have a form on a webpage and that form has a UTF-8 charset
    > header. Suppose there is also a textarea in that form, and a submit
    > button. Suppose I write something in Microsoft Word and use lots of
    > strange characters, then I copy and paste it into the textarea and hit
    > the submit button. At the other end, receiving the form, is a PHP
    > script which takes that text and makes it a webpage, with a UTF-8
    > charset header.
    >
    > If I understand what Pierre Goiffon is saying, then it sounds as if no
    > garbage characters will appear on that page, no matter how many strange
    > characters I used in the Word document. It sounds to me as if he is
    > saying that everything will magically get transformed into a character
    > that makes sense in UTF-8.[/color]

    That is what the user's system *should* have done (any conversions as it
    cut and paste, as was necessary), and the data sent properly encoded. With
    the recipient handling it however they do.

    However, *some* computers do not do that. If you copy data from one
    application that was using Windows1252 encoding into something else that
    was using UTF-8, the cut-and-paste function doesn't translate.

    It should, because only it's there as an intermediary, and only it (that
    computer) knows the two different encoding methods being used.

    --
    If you insist on e-mailing me, use the reply-to address (it's real but
    temporary). But please reply to the group, like you're supposed to.

    This message was sent without a virus, please delete some files yourself.

    Comment

    • lkrubner@geocities.com

      #3
      Re: UTF-8 garbage characters

      Thanks. That's good to know.

      Comment

      • Shmuel (Seymour J.) Metz

        #4
        Re: UTF-8 garbage characters

        In <1117221565.722 671.148530@g44g 2000cwa.googleg roups.com>, on
        05/27/2005
        at 12:19 PM, lkrubner@geocit ies.com said:
        [color=blue]
        >If I understand what Pierre Goiffon is saying, then it sounds as if
        >no garbage characters will appear on that page,[/color]

        No. He is saying that garbage characters will only appear if you input
        garbage characters. He said nothing about whether a cut-and-paste from
        m$ word works correctly. If the result of the paste is to place the
        correct characters in the form, then you will not get garbage
        characters. If the effect of the paste is to put garbage characters in
        the form, then the problem is with microsoft, not with the use of
        UTF-8.

        --
        Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

        Unsolicited bulk E-mail subject to legal action. I reserve the
        right to publicly post or ridicule any abusive E-mail. Reply to
        domain Patriot dot net user shmuel+news to contact me. Do not
        reply to spamtrap@librar y.lspace.org

        Comment

        • Shmuel (Seymour J.) Metz

          #5
          Re: UTF-8 garbage characters

          In <1117221565.722 671.148530@g44g 2000cwa.googleg roups.com>, on
          05/27/2005
          at 12:19 PM, lkrubner@geocit ies.com said:
          [color=blue]
          >If I understand what Pierre Goiffon is saying, then it sounds as if
          >no garbage characters will appear on that page,[/color]

          No. He is saying that garbage characters will only appear if you input
          garbage characters. He said nothing about whether a cut-and-paste from
          m$ word works correctly. If the result of the paste is to place the
          correct characters in the form, then you will not get garbage
          characters. If the effect of the paste is to put garbage characters in
          the form, then the problem is with microsoft, not with the use of
          UTF-8.

          --
          Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

          Unsolicited bulk E-mail subject to legal action. I reserve the
          right to publicly post or ridicule any abusive E-mail. Reply to
          domain Patriot dot net user shmuel+news to contact me. Do not
          reply to spamtrap@librar y.lspace.org

          Comment

          Working...