Does PHP send out corrupted string ? (charset issue)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Gulzor
    New Member
    • Jul 2008
    • 27

    Does PHP send out corrupted string ? (charset issue)

    Hi,

    I fetch web pages using Zend_Http (I send out POST data, fetch the results, and so on)
    I have no problem with that.

    I did a mb_detect_encod ing() of the returned HTML and the function says it's UTF-8 encoded.

    Parts of the returned HTML must be send back to the server. I store these parts into PHP strings.

    The problem is that when I send back these PHP strings, all special characters (accents) are truncaded with garbages !

    ---

    (-) The PHP script itself is saved in UTF-8.

    (-) I tried to utf8_encode() the returned HTML before storing data into PHP strings

    Do you have any tips ? Something trivial that I am missing ?

    Thank you
  • coolsti
    Contributor
    • Mar 2008
    • 310

    #2
    I don't know if this helps you at all, but I had an issue with characters when I migrated an application from one server to another. Suddenly I had on the new server difficulties in getting Danish language specific letters to print out correctly.

    A tip on this forum led me to include this line in the header of my HTML output:

    Code:
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    which then apparently specifies to the browser what to do. Apparently the configuration of the older server was such that this line was not necessary.

    Again, maybe this is 100 miles away from your problem!

    Comment

    • Gulzor
      New Member
      • Jul 2008
      • 27

      #3
      I tried this when I used DOM::loadHTML to query the HTML for the data but the problem remains the same :

      When I send back the data to the server (through HTTP POST), it seems that the data are corrupted.

      Note that my script does not print out and executes on the command line.

      Comment

      • Atli
        Recognized Expert Expert
        • Nov 2006
        • 5062

        #4
        If you are sending this via a HTTP request, you may have to specify the charset in the Content-Type header. Like:
        Code:
        Content-Type: text/html; charset=utf-8

        Comment

        • Gulzor
          New Member
          • Jul 2008
          • 27

          #5
          Of course ! Will do. Thank you.

          Comment

          • Gulzor
            New Member
            • Jul 2008
            • 27

            #6
            Didn't work. I really don't know what I can do...

            Comment

            • pbmods
              Recognized Expert Expert
              • Apr 2007
              • 5821

              #7
              Heya, Gulzor.

              mb_detect_encod ing() is very, very timid. It will almost always say 'UTF-8', even when the string is actually not.

              Try this:

              [code=php]
              if( mb_detect_encod ing($str . 'a', 'ISO-8859-1,UTF-8') != 'UTF-8' )
              {
              utf8_encode($st r);
              }
              [/code]

              For more info on why this works, check out my blog:

              Comment

              • Gulzor
                New Member
                • Jul 2008
                • 27

                #8
                I tried but it still doesn't work.

                mb_detect_encod ing($str.'a', 'ISO-8859-1,UTF-8')

                does not return the same value than

                mb_detect_encod ing($str.'a', 'UTF-8,ISO-8859-1')

                When I output debug messages, it looks like that strings that I send back to the server and the strings returned from the server are the same...

                Aaaargh !!! it is getting on my nerves

                Comment

                Working...