Multibyte character?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • vijay

    Multibyte character?

    When I do a readfile or file_get_conten ts on a web page the string I
    get back get corrupted for non-ASCII characters. For instance when do
    a readfile("http://abc/def") "São Paulo" became "São Paulo" on the
    calling page although http://abc/def shows "São Paulo" correctly. Any
    idea on how to fix this problem.

    Let me try to explain it more. I have two pages http://abc/def,
    http://abc/ghi.php and I am trying to read the contents of http://abc/def
    from http://abc/ghi.php.
  • Willem Bogaerts

    #2
    Re: Multibyte character?

    vijay wrote:
    When I do a readfile or file_get_conten ts on a web page the string I
    get back get corrupted for non-ASCII characters. For instance when do
    a readfile("http://abc/def") "São Paulo" became "São Paulo" on the
    calling page although http://abc/def shows "São Paulo" correctly. Any
    idea on how to fix this problem.
    >
    Let me try to explain it more. I have two pages http://abc/def,
    http://abc/ghi.php and I am trying to read the contents of http://abc/def
    from http://abc/ghi.php.
    What you get is exactly right. From your example, it appears that your
    text is utf-8 encoded and that the second page is (probably) latin-1
    encoded. A "readfile" without respecting any encodings is not enough to
    display "human" text.

    If you use curl, you can catch the headers that contain the encoding
    used and use mbstring to convert it. Or if it is always the same page
    you read, you know the encoding beforehand.

    Best regards.
    --
    Willem Bogaerts

    Application smith
    Kratz B.V.

    Comment

    Working...