How to decode a string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Lad

    How to decode a string

    To be able to decode a string successfully, I need to know what coding
    it is in.
    The string can be coded in utf8 or in windows-1250 or in another
    coding.
    Is there a method how to find out the string coding.
    Thank you for help
    L.

  • Fredrik Lundh

    #2
    Re: How to decode a string

    Lad wrote:
    To be able to decode a string successfully, I need to know what coding
    it is in.
    ask whoever provided the string.
    The string can be coded in utf8 or in windows-1250 or in another
    coding. Is there a method how to find out the string coding.
    in general, no. if you have enough text, you may guess, but the right
    approach for that depends on the application.

    </F>

    Comment

    • Lad

      #3
      Re: How to decode a string


      Fredrik Lundh wrote:
      Lad wrote:
      >
      To be able to decode a string successfully, I need to know what coding
      it is in.
      >
      ask whoever provided the string.
      >
      The string can be coded in utf8 or in windows-1250 or in another
      coding. Is there a method how to find out the string coding.
      >
      in general, no. if you have enough text, you may guess, but the right
      approach for that depends on the application.
      >
      </F>
      Fredrik,
      Thank you for your reply
      The text is from Mysql table field that uses utf8_czech_ci collation,
      but when I try
      `RealName`.deco de('utf8'),wher e RealName is that field of MySQL

      I will get:
      UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xc3 in position 3:
      ordinal
      not in range(128)

      Can you please suggest the solution?
      Thank you
      L.

      Comment

      • Marc 'BlackJack' Rintsch

        #4
        Re: How to decode a string

        In <1156175385.580 508.302330@m73g 2000cwd.googleg roups.com>, Lad wrote:
        The text is from Mysql table field that uses utf8_czech_ci collation,
        but when I try
        `RealName`.deco de('utf8'),wher e RealName is that field of MySQL
        >
        I will get:
        UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xc3 in position 3:
        ordinal
        not in range(128)
        >
        Can you please suggest the solution?
        Do you get this from converting the value from the database or from trying
        to print the unicode string? Can you give us the output of

        print repr(RealName)

        Ciao,
        Marc 'BlackJack' Rintsch

        Comment

        • Lad

          #5
          Re: How to decode a string


          Marc 'BlackJack' Rintsch wrote:
          In <1156175385.580 508.302330@m73g 2000cwd.googleg roups.com>, Lad wrote:
          >
          The text is from Mysql table field that uses utf8_czech_ci collation,
          but when I try
          `RealName`.deco de('utf8'),wher e RealName is that field of MySQL

          I will get:
          UnicodeDecodeEr ror: 'ascii' codec can't decode byte 0xc3 in position 3:
          ordinal
          not in range(128)

          Can you please suggest the solution?
          >
          Do you get this from converting the value from the database or from trying
          to print the unicode string? Can you give us the output of
          >
          print repr(RealName)
          >
          Ciao,
          Marc 'BlackJack' Rintsch

          for
          print repr(RealName) command
          I will get

          P?ibylov\xe1 Ludmila
          where instead of ? should be also a character
          Thank you for help
          L.

          Comment

          • Fredrik Lundh

            #6
            Re: How to decode a string

            Lad wrote:
            for
            print repr(RealName) command
            I will get
            >
            P?ibylov\xe1 Ludmila
            where instead of ? should be also a character
            that's not very likely; repr() always includes quotes, always escapes
            non-ASCII characters, and optionally includes a Unicode prefix.

            please try this

            print "*", repr(RealName), type(RealName), "*"

            and post the entire output; that is, *everything* between the asterisks.

            </F>

            Comment

            • Lad

              #7
              Re: How to decode a string

              Fredrik Lundh wrote:
              Lad wrote:
              >
              for
              print repr(RealName) command
              I will get

              P?ibylov\xe1 Ludmila
              where instead of ? should be also a character
              >
              that's not very likely; repr() always includes quotes, always escapes
              non-ASCII characters, and optionally includes a Unicode prefix.
              >
              please try this
              >
              print "*", repr(RealName), type(RealName), "*"
              >
              and post the entire output; that is, *everything* between the asterisks.
              >
              The result of print "*", repr(RealName), type(RealName), "*" is

              * 'Fritschov\xe1 Laura' <type 'str'*


              Best regards,
              L

              Comment

              • Fredrik Lundh

                #8
                Re: How to decode a string

                "Lad" wrote:
                The result of print "*", repr(RealName), type(RealName), "*" is
                >
                * 'Fritschov\xe1 Laura' <type 'str'*
                looks like the MySQL interface is returning 8-bit strings using ISO-8859-1
                encoding (or some variation of that; \xE1 is "LATIN SMALL LETTER A
                WITH ACUTE" in 8859-1).

                have you tried passing "use_unicode=Tr ue" to the connect() call ?

                </F>



                Comment

                • Lad

                  #9
                  Re: How to decode a string


                  Fredrik Lundh wrote:
                  "Lad" wrote:
                  >
                  The result of print "*", repr(RealName), type(RealName), "*" is

                  * 'Fritschov\xe1 Laura' <type 'str'*
                  >
                  looks like the MySQL interface is returning 8-bit strings using ISO-8859-1
                  encoding (or some variation of that; \xE1 is "LATIN SMALL LETTER A
                  WITH ACUTE" in 8859-1).
                  >
                  have you tried passing "use_unicode=Tr ue" to the connect() call ?
                  >
                  </F>
                  Frederik,
                  Thank you for your reply.
                  I found out that if I do not decode the string at all, it looks
                  correct. But I do not know why it is ok without decoding.
                  I use Django and I do not use use_unicode=Tru e" to the connect() call.

                  Comment

                  Working...