How to get the ascii code of Chinese characters?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • many_years_after

    How to get the ascii code of Chinese characters?

    Hi,everyone:

    Have you any ideas?

    Say whatever you know about this.


    thanks.

  • Philippe Martin

    #2
    Re: How to get the ascii code of Chinese characters?

    many_years_afte r wrote:
    Hi,everyone:
    >
    Have you any ideas?
    >
    Say whatever you know about this.
    >
    >
    thanks.
    Hi,

    You mean unicode I assume:


    Regards,

    Philippe

    Comment

    • John Machin

      #3
      Re: How to get the ascii code of Chinese characters?

      many_years_afte r wrote:
      Hi,everyone:
      >
      Have you any ideas?
      >
      Say whatever you know about this.
      >
      Perhaps you had better explain what you mean by "ascii code of Chinese
      characters". Chinese characters ("hanzi") can be represented in many
      ways on a computer, in Unicode as well as many different "legacy"
      encodings, such as GB, GBK, big5, two different 4-digit telegraph
      codes, etc etc. They can also be spelled out in "roman" letters with or
      without tone indications (digits or "accents") in the pinyin system --
      is that what you mean by "ascii code"?

      Perhaps you might like to tell us what you want to do in Python with
      hanzi and "ascii codes", so that we can give you a specific answer.
      With examples, please -- like what are the "ascii codes" for the two
      characters in the common greeting that comes across in toneless pinyin
      as "ni hao"?

      Cheers,
      John

      Comment

      • Philippe Martin

        #4
        Re: How to get the ascii code of Chinese characters?

        Philippe Martin wrote:
        many_years_afte r wrote:
        >
        >Hi,everyone:
        >>
        > Have you any ideas?
        >>
        > Say whatever you know about this.
        >>
        >>
        >thanks.
        Hi,
        >
        You mean unicode I assume:

        >
        Regards,
        >
        Philippe
        Hi,

        I have received a personnal email on this:

        Kanji is indeed a Japanese subset of the Chinese Character set.

        I just thought it would be relevant as it includes ~47000 characters.

        If I hurt any feeling, sorry.

        Regards,

        Philippe

        Comment

        • many_years_after

          #5
          Re: How to get the ascii code of Chinese characters?

          hi:

          what I want to do is just to make numbers as people input some Chinese
          character(hanzi ,i mean).The same character will create the same
          number.So I think ascii code can do this very well.

          John Machin wrote:
          many_years_afte r wrote:
          Hi,everyone:

          Have you any ideas?

          Say whatever you know about this.
          >
          Perhaps you had better explain what you mean by "ascii code of Chinese
          characters". Chinese characters ("hanzi") can be represented in many
          ways on a computer, in Unicode as well as many different "legacy"
          encodings, such as GB, GBK, big5, two different 4-digit telegraph
          codes, etc etc. They can also be spelled out in "roman" letters with or
          without tone indications (digits or "accents") in the pinyin system --
          is that what you mean by "ascii code"?
          >
          Perhaps you might like to tell us what you want to do in Python with
          hanzi and "ascii codes", so that we can give you a specific answer.
          With examples, please -- like what are the "ascii codes" for the two
          characters in the common greeting that comes across in toneless pinyin
          as "ni hao"?
          >
          Cheers,
          John

          Comment

          • Marc 'BlackJack' Rintsch

            #6
            Re: How to get the ascii code of Chinese characters?

            In <1155999554.486 951.303480@i42g 2000cwa.googleg roups.com>,
            many_years_afte r wrote:
            what I want to do is just to make numbers as people input some Chinese
            character(hanzi ,i mean).The same character will create the same
            number.So I think ascii code can do this very well.
            No it can't. ASCII doesn't contain Chinese characters.



            Ciao,
            Marc 'BlackJack' Rintsch

            Comment

            • Thorsten Kampe

              #7
              Re: How to get the ascii code of Chinese characters?

              * many_years_afte r (2006-08-19 12:18 +0100)
              Hi,everyone:
              >
              Have you any ideas?
              >
              Say whatever you know about this.
              contradictio in adiecto

              Comment

              • Gerhard Fiedler

                #8
                Re: How to get the ascii code of Chinese characters?

                On 2006-08-19 12:42:31, Marc 'BlackJack' Rintsch wrote:
                many_years_afte r wrote:
                >
                >what I want to do is just to make numbers as people input some Chinese
                >character(hanz i,i mean).The same character will create the same
                >number.So I think ascii code can do this very well.
                >
                No it can't. ASCII doesn't contain Chinese characters.
                Well, ASCII can represent the Unicode numerically -- if that is what the OP
                wants. For example, "U+81EC" (all ASCII) is one possible -- not very
                readable though <g-- representation of a Hanzi character (see
                http://www.cojak.org/index.php?funct...kup&term=81EC).

                (I don't know anything about Hanzi or Mandarin... But that's Unicode, so
                this works :)

                Gerhard

                Comment

                • Peter Maas

                  #9
                  Re: How to get the ascii code of Chinese characters?

                  Gerhard Fiedler wrote:
                  Well, ASCII can represent the Unicode numerically -- if that is what the OP
                  wants.
                  No. ASCII characters range is 0..127 while Unicode characters range is
                  at least 0..65535.
                  For example, "U+81EC" (all ASCII) is one possible -- not very
                  readable though <g-- representation of a Hanzi character (see
                  http://www.cojak.org/index.php?funct...kup&term=81EC).
                  U+81EC means a Unicode character which is represented by the number
                  0x81EC. There are some encodings defined which map Unicode sequences
                  to byte sequences: UTF-8 maps Unicode strings to sequences of bytes in
                  the range 0..255, UTF-7 maps Unicode strings to sequences of bytes in
                  the range 0..127. You *could* read the latter as ASCII sequences
                  but this is not correct.

                  How to do it in Python? Let chinesePhrase be a Unicode string with
                  Chinese content. Then

                  chinesePhrase_7 bit = chinesePhrase.e ncode('utf-7')

                  will produce a sequences of bytes in the range 0..127 representing
                  chinesePhrase and *looking like* a (meaningless) ASCII sequence.

                  chinesePhrase_1 6bit = chinesePhrase.e ncode('utf-16be')

                  will produce a sequence with Unicode numbers packed in a byte
                  string in big endian order. This is probably closest to what
                  the OP wants.

                  Peter Maas, Aachen

                  Comment

                  • John Machin

                    #10
                    Re: How to get the ascii code of Chinese characters?


                    many_years_afte r wrote:
                    John Machin wrote:
                    many_years_afte r wrote:
                    Hi,everyone:
                    >
                    Have you any ideas?
                    >
                    Say whatever you know about this.
                    >
                    Perhaps you had better explain what you mean by "ascii code of Chinese
                    characters". Chinese characters ("hanzi") can be represented in many
                    ways on a computer, in Unicode as well as many different "legacy"
                    encodings, such as GB, GBK, big5, two different 4-digit telegraph
                    codes, etc etc. They can also be spelled out in "roman" letters with or
                    without tone indications (digits or "accents") in the pinyin system --
                    is that what you mean by "ascii code"?

                    Perhaps you might like to tell us what you want to do in Python with
                    hanzi and "ascii codes", so that we can give you a specific answer.
                    With examples, please -- like what are the "ascii codes" for the two
                    characters in the common greeting that comes across in toneless pinyin
                    as "ni hao"?

                    Cheers,
                    John
                    hi:
                    >
                    what I want to do is just to make numbers as people input some Chinese
                    character(hanzi ,i mean).The same character will create the same
                    number.So I think ascii code can do this very well.
                    >
                    *What* characters make *what* numbers? Stop thinking and give us some
                    *examples*

                    Comment

                    • Gerhard Fiedler

                      #11
                      Re: How to get the ascii code of Chinese characters?

                      On 2006-08-19 16:54:36, Peter Maas wrote:
                      Gerhard Fiedler wrote:
                      >Well, ASCII can represent the Unicode numerically -- if that is what the OP
                      >wants.
                      >
                      No. ASCII characters range is 0..127 while Unicode characters range is
                      at least 0..65535.
                      Actually, Unicode goes beyond 65535. But right in this sentence, you
                      represented the number 65535 with ASCII characters, so it doesn't seem to
                      be impossible.
                      >For example, "U+81EC" (all ASCII) is one possible -- not very
                      >readable though <g-- representation of a Hanzi character (see
                      >http://www.cojak.org/index.php?funct...kup&term=81EC).
                      >
                      U+81EC means a Unicode character which is represented by the number
                      0x81EC.
                      Exactly. Both versions represented in ASCII right in your message :)
                      UTF-8 maps Unicode strings to sequences of bytes in the range 0..255,
                      UTF-7 maps Unicode strings to sequences of bytes in the range 0..127.
                      You *could* read the latter as ASCII sequences but this is not correct.
                      Of course not "correct". I guess the only "correct" representation is the
                      original Chinese character. But the OP doesn't seem to want this... so a
                      non-"correct" representation is necessary anyway.
                      How to do it in Python? Let chinesePhrase be a Unicode string with
                      Chinese content. Then
                      >
                      chinesePhrase_7 bit = chinesePhrase.e ncode('utf-7')
                      >
                      will produce a sequences of bytes in the range 0..127 representing
                      chinesePhrase and *looking like* a (meaningless) ASCII sequence.
                      Actually, no. There are quite a few code positions in the range 0..127 that
                      don't "look like" anything (non-printable). And, as you say, this is rather
                      meaningless.
                      chinesePhrase_1 6bit = chinesePhrase.e ncode('utf-16be')
                      >
                      will produce a sequence with Unicode numbers packed in a byte
                      string in big endian order. This is probably closest to what
                      the OP wants.
                      That's what you think... but it's not really ASCII. If you want this in
                      ASCII, and readable, I still suggest to transform this sequence of 2-byte
                      values (for Chinese characters it will be 2 bytes per character) into a
                      sequence of something like U+81EC (or 0x81EC if you are a C fan or 81EC if
                      you can imply the rest)... that's where we come back to my original
                      suggestion :)

                      Gerhard

                      Comment

                      • John Machin

                        #12
                        Re: How to get the ascii code of Chinese characters?


                        many_years_afte r wrote:
                        hi:
                        >
                        what I want to do is just to make numbers as people input some Chinese
                        character(hanzi ,i mean).The same character will create the same
                        number.So I think ascii code can do this very well.
                        >
                        Possibly you have "create" upside-down. Could you possibly be talking
                        about an "input method", in which people type in ascii letters (and
                        maybe numbers) and the *result* is a Chinese character? In other words,
                        what *everybody* uses to input Chinese characters?

                        Perhaps you could ask on the Chinese Python newsgroup.

                        *GIVE* *EXAMPLES* of what you want to do.

                        Comment

                        • many_years_after

                          #13
                          Re: How to get the ascii code of Chinese characters?


                          John Machin wrote:
                          many_years_afte r wrote:
                          hi:

                          what I want to do is just to make numbers as people input some Chinese
                          character(hanzi ,i mean).The same character will create the same
                          number.So I think ascii code can do this very well.
                          >
                          Possibly you have "create" upside-down. Could you possibly be talking
                          about an "input method", in which people type in ascii letters (and
                          maybe numbers) and the *result* is a Chinese character? In other words,
                          what *everybody* uses to input Chinese characters?
                          >
                          Perhaps you could ask on the Chinese Python newsgroup.
                          >
                          *GIVE* *EXAMPLES* of what you want to do.
                          Well, people may input from keyboard. They input some Chinese
                          characters, then, I want to create a number. The same number will be
                          created if they input the same Chinese characters.

                          Comment

                          • Ben Finney

                            #14
                            Re: How to get the ascii code of Chinese characters?

                            "many_years_aft er" <shuanyu@gmail. comwrites:
                            Well, people may input from keyboard. They input some Chinese
                            characters, then, I want to create a number. The same number will be
                            created if they input the same Chinese characters.
                            You seem to be looking for a hash.

                            <URL:http://docs.python.org/lib/module-md5>
                            <URL:http://docs.python.org/lib/module-sha>

                            If not, please tell us what your *purpose* is. It's not at all clear
                            from your questions what you are trying to achieve.

                            --
                            \ "I was in a bar the other night, hopping from barstool to |
                            `\ barstool, trying to get lucky, but there wasn't any gum under |
                            _o__) any of them." -- Emo Philips |
                            Ben Finney

                            Comment

                            • Fredrik Lundh

                              #15
                              Re: How to get the ascii code of Chinese characters?

                              Gerhard Fiedler wrote:
                              >No. ASCII characters range is 0..127 while Unicode characters range is
                              >at least 0..65535.
                              >
                              Actually, Unicode goes beyond 65535.
                              you may want to look up "at least" in a dictionary.

                              </F>

                              Comment

                              Working...