chinese encoded in UTF-8 and XML

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Knackeback

    chinese encoded in UTF-8 and XML

    Hi, I wrote a XML file with GNU emacs 21.2.2 and with
    chinese character content encoded in UTF-8.
    I wrote something like:

    <?xml version="1.0" encoding="UTF-8"?>
    <test>
    <chinese>¼»</chinese>
    <chinese>ÄÎ</chinese>
    </test>

    and then I used "C-x RET f" and then I choosed utf-8.
    Then I typed "C-x C-s" to save my file.
    I hope this is the right way in emacs to store the content
    as UTF-8 encoded text ?!
    Now I tried to parse the file with xmllint. xmllint is a
    small xml-parser program which comes with libxml2.
    The parser complains that the second "chinese line" is not proper
    UTF-8.

    ==>

    uhu:4: error: Input is not proper UTF-8, indicate encoding !
    <chinese>ÄÎ</chinese>
    ^
    uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F
    <chinese>ÄÎ</chinese>

    It is interesting that the parser only grumbles about the second
    chinese line.

    I'm anxious to see an explanation !



  • Andreas Prilop

    #2
    Re: chinese encoded in UTF-8 and XML

    Knackeback <knackeback@ran dspringer.de> wrote:
    [color=blue]
    > Content-Type: text/plain; charset=big5
    >
    > Hi, I wrote a XML file with GNU emacs 21.2.2 and with
    > chinese character content encoded in UTF-8.
    > [...]
    > I hope this is the right way in emacs to store the content
    > as UTF-8 encoded text ?![/color]

    Probably not.
    [color=blue]
    > uhu:4: error: Input is not proper UTF-8, indicate encoding !
    > uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F[/color]

    It seems your text was Big5-encoded, not UTF-8-encoded.

    Comment

    • Micah Cowan

      #3
      Re: chinese encoded in UTF-8 and XML

      Knackeback <knackeback@ran dspringer.de> writes:
      [color=blue]
      > Hi, I wrote a XML file with GNU emacs 21.2.2 and with
      > chinese character content encoded in UTF-8.
      > I wrote something like:
      >
      > <?xml version="1.0" encoding="UTF-8"?>
      > <test>
      > <chinese>¼»</chinese>
      > <chinese>ÄÎ</chinese>
      > </test>
      >
      > and then I used "C-x RET f" and then I choosed utf-8.
      > Then I typed "C-x C-s" to save my file.
      > I hope this is the right way in emacs to store the content
      > as UTF-8 encoded text ?!
      > Now I tried to parse the file with xmllint. xmllint is a
      > small xml-parser program which comes with libxml2.
      > The parser complains that the second "chinese line" is not proper
      > UTF-8.
      >
      > ==>[/color]

      FWICT, Emacs doesn't have a chinese input method which supports
      unicode output... :-( ...I've had similar troubles with
      Japanese. I've also noted that, e.g. for greek, there are input
      methods which explicitly support unicode, and others which do
      not.

      -Micah

      Comment

      • Stefan Monnier

        #4
        Re: chinese encoded in UTF-8 and XML

        >> and then I used "C-x RET f" and then I choosed utf-8.[color=blue][color=green]
        >> Then I typed "C-x C-s" to save my file.[/color][/color]
        [...][color=blue]
        > FWICT, Emacs doesn't have a chinese input method which supports
        > unicode output... :-( ...I've had similar troubles with[/color]

        But since he specified utf-8, Emacs should have complained rather than
        silently use some other coding-system.
        Please report the bug with M-x report-emacs-bug.


        Stefan

        Comment

        • Albert Chun-Chieh Huang

          #5
          Re: chinese encoded in UTF-8 and XML

          Knackeback <knackeback@ran dspringer.de> writes:
          [color=blue]
          > Hi, I wrote a XML file with GNU emacs 21.2.2 and with
          > chinese character content encoded in UTF-8.
          > I wrote something like:
          >
          > <?xml version="1.0" encoding="UTF-8"?>
          > <test>
          > <chinese>¼»</chinese>
          > <chinese>ÄÎ</chinese>
          > </test>[/color]

          In my Gnus on Emacs 21.3, I saw the Chinese characters in BIG5.
          Maybe you should download MULE-UCS package and install it. With the
          package, I can just enter BIG5 encoded Chinese characters, and specify
          coding to utf-8, and I got utf-8 encoding text file.
          Download mule-ucs from ftp://ftp.m17n.org, and add the lines below
          to your .emacs file. The function of BIG5 to UTF-8 conversion is
          defined in big5c-ucs.el, which is located in mule-ucs/lisp/big5conv

          (add-to-list 'load-path "/path/to/your/mule-ucs/")
          (add-to-list 'load-path "/path/to/your/mule-ucs/lisp")

          (require 'un-define)
          (require 'big5c-ucs)

          --
          Chun-Chieh Huang, aka Albert | E-mail: jjhuang AT cm.nctu.edu.tw
          ¶À«T³Ç |
          Department of Computer Science |
          National Tsing Hua University | MIME/ASCII/PDF/PostScript are welcome!
          HsinChu, Taiwan | NO MS WORD DOC FILE, PLEASE!

          Comment

          Working...