UTF-8 encoding decoding not working with Danish characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • LarsM

    UTF-8 encoding decoding not working with Danish characters

    Hi all,
    I am new to XML, but I use it for an RSS feed.

    I have one problem, which I have really been struggling with.

    My XML document is generated from the contents of a MySQL database. It is
    UTF-8 encoded.

    However, the Danish special characters appear wrong.

    For example the letter å becomes "Ã¥", the letter ø becomes "ø"

    See an examle here:


    I thought that it could be because the encoding was not set in the document,
    so I added this:
    <?xml version="1.0" encoding="UTF-8" ?>
    However, that did not make any difference, as can be seen here:


    The text decodes correctly on my regular web pages on http://netm.dk/

    What am I doing wrong?

    Regards,
    Lars





  • Malte

    #2
    Re: UTF-8 encoding decoding not working with Danish characters

    LarsM wrote:[color=blue]
    > Hi all,
    > I am new to XML, but I use it for an RSS feed.
    >
    > I have one problem, which I have really been struggling with.
    >
    > My XML document is generated from the contents of a MySQL database. It is
    > UTF-8 encoded.
    >
    > However, the Danish special characters appear wrong.
    >
    > For example the letter å becomes "Ã¥", the letter ø becomes "ø"
    >
    > See an examle here:
    > http://netm.dk/blog/rss/index_rss2.xml
    >
    > I thought that it could be because the encoding was not set in the document,
    > so I added this:
    > <?xml version="1.0" encoding="UTF-8" ?>
    > However, that did not make any difference, as can be seen here:
    > http://netm.dk/blog/rss/test_rss2.xml
    >
    > The text decodes correctly on my regular web pages on http://netm.dk/
    >
    > What am I doing wrong?
    >
    > Regards,
    > Lars
    > www.netm.dk
    >
    >
    >
    >[/color]
    This is not limited to XML. I try to send JavaMail mails. When doing
    this from a Windows PC, Danish characters are garbled, when running the
    exact same program on Linux, the characters get through fine.

    Hope we get rid of thos ¤%@£¥ darned NLS issues sometime in my lifetime,
    but I doubt it.

    Comment

    • Jürgen Kahrs

      #3
      Re: UTF-8 encoding decoding not working with Danish characters

      LarsM wrote:
      [color=blue]
      > My XML document is generated from the contents of a MySQL database. It is
      > UTF-8 encoded.[/color]

      You have to take care that *every* tool in the toolchain
      knows how to handle utf-8 correctly. Maybe you give us
      a list of tools involved ?
      [color=blue]
      > The text decodes correctly on my regular web pages on http://netm.dk/[/color]

      Your web page looks OK to me.
      I bet it is in the database or shortly thereafter.

      Comment

      • LarsM

        #4
        Re: UTF-8 encoding decoding not working with Danish characters


        "Jürgen Kahrs" wrote:[color=blue]
        >
        > Maybe you give us a list of tools involved ?[/color]

        Thanks Jürgen,
        The RSS feed is being generated by the same Blog application
        ("Boastmachine" ), which I use to generate the Web pages. As far as I know it
        accesses the database in the same way as for the "real" pages.
        But I will check up on that.
        -Lars




        Comment

        • Jürgen Kahrs

          #5
          Re: UTF-8 encoding decoding not working with Danish characters

          LarsM wrote:
          [color=blue]
          > The RSS feed is being generated by the same Blog application
          > ("Boastmachine" ), which I use to generate the Web pages. As far as I know it
          > accesses the database in the same way as for the "real" pages.[/color]

          So the problem should be in the Blog application.
          [color=blue]
          > But I will check up on that.[/color]

          Good idea. Maybe there is simply a bug in the RSS
          extraction mechanism.

          Comment

          • Malte

            #6
            Re: UTF-8 encoding decoding not working with Danish characters

            LarsM wrote:[color=blue]
            > Hi all,
            > I am new to XML, but I use it for an RSS feed.
            >
            > I have one problem, which I have really been struggling with.
            >
            > My XML document is generated from the contents of a MySQL database. It is
            > UTF-8 encoded.
            >
            > However, the Danish special characters appear wrong.
            >
            > For example the letter å becomes "Ã¥", the letter ø becomes "ø"
            >
            > See an examle here:
            > http://netm.dk/blog/rss/index_rss2.xml
            >
            > I thought that it could be because the encoding was not set in the document,
            > so I added this:
            > <?xml version="1.0" encoding="UTF-8" ?>
            > However, that did not make any difference, as can be seen here:
            > http://netm.dk/blog/rss/test_rss2.xml
            >
            > The text decodes correctly on my regular web pages on http://netm.dk/
            >
            > What am I doing wrong?
            >
            > Regards,
            > Lars
            > www.netm.dk
            >
            >
            >
            >[/color]
            Pointing my (Linux) Firefox browser at your web site, and having
            encoding set to utf-8, I see you page fine. Setting encoding to
            ISO-8859-1 generates the å stuff. One never knows how the users'
            browsers are setup.

            Look at this page: www.vietbao.com

            Great looking, authentic, Vietnamese fonts with utf-8. Obviously not
            looking good with iso (vn fonts not part of iso..).

            Comment

            • LarsM

              #7
              Re: UTF-8 encoding decoding not working with Danish characters


              "Malte" wrote:[color=blue]
              > Pointing my (Linux) Firefox browser at your web site, and having encoding
              > set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
              > generates the å stuff. One never knows how the users' browsers are setup.[/color]

              Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

              -Lars


              Comment

              • Nick Kew

                #8
                Re: UTF-8 encoding decoding not working with Danish characters

                LarsM wrote:
                [color=blue]
                > Hi all,
                > I am new to XML, but I use it for an RSS feed.
                >
                > I have one problem, which I have really been struggling with.
                >
                > My XML document is generated from the contents of a MySQL database. It is
                > UTF-8 encoded.[/color]

                No. It's ASCII encoded before an agent even looks at the document itself.
                See RFC3023 for details.

                The good news is that the fix is a single line in httpd.conf.

                --
                Nick Kew

                Comment

                • Stanimir Stamenkov

                  #9
                  Re: UTF-8 encoding decoding not working with Danish characters

                  /LarsM/:
                  [color=blue]
                  > My XML document is generated from the contents of a MySQL database. It is
                  > UTF-8 encoded.
                  >
                  > However, the Danish special characters appear wrong.
                  >
                  > For example the letter å becomes "Ã¥", the letter ø becomes "ø"
                  >
                  > See an examle here:
                  > http://netm.dk/blog/rss/index_rss2.xml[/color]

                  Sound like an MySQL configuration issue, to me.

                  --
                  Stanimir

                  Comment

                  • LarsM

                    #10
                    Re: UTF-8 encoding decoding not working with Danish characters


                    "Nick Kew" wrote:[color=blue]
                    >
                    > The good news is that the fix is a single line in httpd.conf.[/color]

                    I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
                    can I make the configuration change, then?

                    -Lars


                    Comment

                    • Henri Sivonen

                      #11
                      Re: UTF-8 encoding decoding not working with Danish characters

                      In article <420b5294$0$486 98$edfadb0f@dre ad15.news.tele. dk>,
                      "LarsM" <mailTAKETHISAW AY@TAKETHISAWAY netm.dk> wrote:
                      [color=blue]
                      > I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
                      > can I make the configuration change, then?[/color]

                      In a .htaccess file if your host allows it. Failing that, you could ask
                      your host to map .xml to application/xml. Failing that, I recommend
                      switching to another host.

                      --
                      Henri Sivonen
                      hsivonen@iki.fi

                      Comment

                      • Malte

                        #12
                        Re: UTF-8 encoding decoding not working with Danish characters

                        LarsM wrote:[color=blue]
                        > "Malte" wrote:
                        >[color=green]
                        >>Pointing my (Linux) Firefox browser at your web site, and having encoding
                        >>set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
                        >>generates the å stuff. One never knows how the users' browsers are setup.[/color]
                        >
                        >
                        > Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?
                        >
                        > -Lars
                        >
                        >[/color]

                        That gives me the funny looking chars as well, regardless of encoding
                        settings in the browser.

                        BTW, solved my JavaMail NLS problem. Had Tomcat start with the
                        -DEncoding parm set.

                        Comment

                        • LarsM

                          #13
                          Re: UTF-8 encoding decoding not working with Danish characters


                          "Henri Sivonen" wrote:[color=blue]
                          >[/color]
                          [color=blue]
                          > In a .htaccess file if your host allows it. Failing that, you could ask
                          > your host to map .xml to application/xml. Failing that, I recommend
                          > switching to another host.[/color]

                          I've been reading through the RFC, but please enlighten me. What would the
                          syntax be for setting this? Please be as specific as possible.

                          Regards,
                          Lars


                          Comment

                          • Rob van der Putten

                            #14
                            Re: UTF-8 encoding decoding not working with Danish characters

                            Hi there


                            Henri Sivonen wrote:
                            [color=blue]
                            > In a .htaccess file if your host allows it. Failing that, you could ask
                            > your host to map .xml to application/xml. Failing that, I recommend
                            > switching to another host.[/color]

                            lynx -head http://netm.dk/blog/rss/index_rss2.xml
                            HTTP/1.0 200 OK
                            Date: Thu, 10 Feb 2005 14:46:31 GMT
                            Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
                            PHP/4.3.9
                            Last-Modified: Tue, 08 Feb 2005 08:03:13 GMT
                            ETag: "bd67c1-1141-42087241"
                            Accept-Ranges: bytes
                            Content-Length: 4417
                            Content-Type: application/xml
                            Age: 704
                            X-Cache: HIT from www.sput.nl
                            X-Cache-Lookup: HIT from www.sput.nl:8080
                            Proxy-Connection: close

                            lynx -head http://netm.dk/blog/rss/test_rss2.xml
                            HTTP/1.0 200 OK
                            Date: Thu, 10 Feb 2005 14:48:18 GMT
                            Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
                            PHP/4.3.9
                            Last-Modified: Mon, 07 Feb 2005 18:45:44 GMT
                            ETag: "11e2dc0-1022-4207b758"
                            Accept-Ranges: bytes
                            Content-Length: 4130
                            Content-Type: application/xml
                            Age: 624
                            X-Cache: HIT from www.sput.nl
                            X-Cache-Lookup: HIT from www.sput.nl:8080
                            Proxy-Connection: close

                            This one on my box;
                            lynx -head http://www.sput.nl/software/leased-line/leased-line.xml
                            HTTP/1.1 200 OK
                            Date: Thu, 10 Feb 2005 15:00:02 GMT
                            Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2
                            Last-Modified: Sun, 30 Jan 2005 07:44:42 GMT
                            ETag: "2787c-4840-41fc906a"
                            Accept-Ranges: bytes
                            Content-Length: 18496
                            Connection: close
                            Content-Type: text/xml; charset=UTF-8

                            However, my browser does consider all these files to be UTF-8 XML.


                            Regards,
                            Rob
                            --
                            +----------------------------------------------------------------------+
                            | The EU constitution will turn the EU into a SU |
                            | Vote against the EU constitution in the referendum |
                            +----------------------------------------------------------------------+

                            Comment

                            • LarsM

                              #15
                              Re: UTF-8 encoding decoding not working with Danish characters

                              Sorry, but excactly how do I set that setting, which Nick Kew and Henry
                              Sivonen suggested?

                              I have been reading through the RFC, but it is not completely clear to me...

                              Cheers,
                              Lars



                              Comment

                              Working...