sqlite utf8 encoding error

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Greg Miller

    sqlite utf8 encoding error

    I have an application that uses sqlite3 to store job/error data. When
    I log in as a German user the error codes generated are translated into
    German. The error code text is then stored in the db. When I use the
    fetchall() to retrieve the data to generate a report I get the
    following error:

    Traceback (most recent call last):
    File "c:\Pest3\Gloss er\baseApp\repo rtGen.py", line 199, in
    OnGenerateButto nNow
    self.OnGenerate Button(event)
    File "c:\Pest3\Gloss er\baseApp\repo rtGen.py", line 243, in
    OnGenerateButto n
    warningresult = messagecursor1. fetchall()
    UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
    unsupported Unicode code range

    does anyone have any idea on what could be going wrong? The string
    that I store in the database table is:

    'Keinen Text für Übereinstimmung sfehler gefunden'

    I thought that all strings were stored in unicode in sqlite.

    Greg Miller

  • Fredrik Lundh

    #2
    Re: sqlite utf8 encoding error

    Greg Miller wrote:
    [color=blue]
    > UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
    > unsupported Unicode code range
    >
    > does anyone have any idea on what could be going wrong? The string
    > that I store in the database table is:
    >
    > 'Keinen Text für Übereinstimmung sfehler gefunden'[/color]

    $ more test.py
    # -*- coding: iso-8859-1 -*-
    u = u'Keinen Text für Übereinstimmung sfehler gefunden'
    s = u.encode("iso-8859-1")
    u = s.decode("utf-8") # <-- this gives an error

    $ python test.py
    Traceback (most recent call last):
    File "test.py", line 4, in ?
    u = s.decode("utf-8") # <-- this gives an error
    File "lib/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_de code(input, errors, True)
    UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
    unsupported Unicode code range
    [color=blue]
    > I thought that all strings were stored in unicode in sqlite.[/color]

    did you pass in a Unicode string or an 8-bit string when you stored the text ?

    </F>



    Comment

    • Fredrik Lundh

      #3
      Re: sqlite utf8 encoding error

      Greg Miller wrote:
      [color=blue]
      > UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
      > unsupported Unicode code range
      >
      > does anyone have any idea on what could be going wrong? The string
      > that I store in the database table is:
      >
      > 'Keinen Text für Übereinstimmung sfehler gefunden'[/color]

      $ more test.py
      # -*- coding: iso-8859-1 -*-
      u = u'Keinen Text für Übereinstimmung sfehler gefunden'
      s = u.encode("iso-8859-1")
      u = s.decode("utf-8") # <-- this gives an error

      $ python test.py
      Traceback (most recent call last):
      File "test.py", line 4, in ?
      u = s.decode("utf-8") # <-- this gives an error
      File "lib/encodings/utf_8.py", line 16, in decode
      return codecs.utf_8_de code(input, errors, True)
      UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
      unsupported Unicode code range
      [color=blue]
      > I thought that all strings were stored in unicode in sqlite.[/color]

      did you pass in a Unicode string or an 8-bit string when you stored the text ?

      </F>



      Comment

      • Sybren Stuvel

        #4
        Re: sqlite utf8 encoding error

        Greg Miller enlightened us with:[color=blue]
        > 'Keinen Text für Übereinstimmung sfehler gefunden'[/color]

        You posted it as "Keinen Text f<FC>r ...", which is Latin-1, not
        UTF-8.
        [color=blue]
        > I thought that all strings were stored in unicode in sqlite.[/color]

        Only if you put them into the DB as such. Make sure you're inserting
        UTF-8 text, since the DB won't do character conversion for you.

        Sybren
        --
        The problem with the world is stupidity. Not saying there should be a
        capital punishment for stupidity, but why don't we just take the
        safety labels off of everything and let the problem solve itself?
        Frank Zappa

        Comment

        • Sybren Stuvel

          #5
          Re: sqlite utf8 encoding error

          Greg Miller enlightened us with:[color=blue]
          > 'Keinen Text für Übereinstimmung sfehler gefunden'[/color]

          You posted it as "Keinen Text f<FC>r ...", which is Latin-1, not
          UTF-8.
          [color=blue]
          > I thought that all strings were stored in unicode in sqlite.[/color]

          Only if you put them into the DB as such. Make sure you're inserting
          UTF-8 text, since the DB won't do character conversion for you.

          Sybren
          --
          The problem with the world is stupidity. Not saying there should be a
          capital punishment for stupidity, but why don't we just take the
          safety labels off of everything and let the problem solve itself?
          Frank Zappa

          Comment

          • Jarek Zgoda

            #6
            Re: sqlite utf8 encoding error

            Fredrik Lundh napisa³(a):
            [color=blue][color=green]
            >>UnicodeDecode Error: 'utf8' codec can't decode bytes in position 13-18:
            >>unsupported Unicode code range
            >>
            >>does anyone have any idea on what could be going wrong? The string
            >>that I store in the database table is:
            >>
            >>'Keinen Text für Übereinstimmung sfehler gefunden'[/color]
            >
            > $ more test.py
            > # -*- coding: iso-8859-1 -*-
            > u = u'Keinen Text für Übereinstimmung sfehler gefunden'
            > s = u.encode("iso-8859-1")
            > u = s.decode("utf-8") # <-- this gives an error
            >
            > $ python test.py
            > Traceback (most recent call last):
            > File "test.py", line 4, in ?
            > u = s.decode("utf-8") # <-- this gives an error
            > File "lib/encodings/utf_8.py", line 16, in decode
            > return codecs.utf_8_de code(input, errors, True)
            > UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
            > unsupported Unicode code range[/color]

            I cann't wait for the moment when encoded strings go away from Python.
            The more I program in this language, the more confusion this difference
            is causing. Now most of functions and various object's methods accept
            strings and unicode, making it hard to find sources of Unicode*Errors.

            --
            Jarek Zgoda

            Comment

            • Jarek Zgoda

              #7
              Re: sqlite utf8 encoding error

              Fredrik Lundh napisa³(a):
              [color=blue][color=green]
              >>UnicodeDecode Error: 'utf8' codec can't decode bytes in position 13-18:
              >>unsupported Unicode code range
              >>
              >>does anyone have any idea on what could be going wrong? The string
              >>that I store in the database table is:
              >>
              >>'Keinen Text für Übereinstimmung sfehler gefunden'[/color]
              >
              > $ more test.py
              > # -*- coding: iso-8859-1 -*-
              > u = u'Keinen Text für Übereinstimmung sfehler gefunden'
              > s = u.encode("iso-8859-1")
              > u = s.decode("utf-8") # <-- this gives an error
              >
              > $ python test.py
              > Traceback (most recent call last):
              > File "test.py", line 4, in ?
              > u = s.decode("utf-8") # <-- this gives an error
              > File "lib/encodings/utf_8.py", line 16, in decode
              > return codecs.utf_8_de code(input, errors, True)
              > UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
              > unsupported Unicode code range[/color]

              I cann't wait for the moment when encoded strings go away from Python.
              The more I program in this language, the more confusion this difference
              is causing. Now most of functions and various object's methods accept
              strings and unicode, making it hard to find sources of Unicode*Errors.

              --
              Jarek Zgoda

              Comment

              • Serge Orlov

                #8
                Re: sqlite utf8 encoding error

                Jarek Zgoda wrote:[color=blue]
                > Fredrik Lundh napisa³(a):
                >[color=green][color=darkred]
                > >>UnicodeDecode Error: 'utf8' codec can't decode bytes in position 13-18:
                > >>unsupported Unicode code range
                > >>
                > >>does anyone have any idea on what could be going wrong? The string
                > >>that I store in the database table is:
                > >>
                > >>'Keinen Text für Übereinstimmung sfehler gefunden'[/color]
                > >
                > > $ more test.py
                > > # -*- coding: iso-8859-1 -*-
                > > u = u'Keinen Text für Übereinstimmung sfehler gefunden'
                > > s = u.encode("iso-8859-1")
                > > u = s.decode("utf-8") # <-- this gives an error
                > >
                > > $ python test.py
                > > Traceback (most recent call last):
                > > File "test.py", line 4, in ?
                > > u = s.decode("utf-8") # <-- this gives an error
                > > File "lib/encodings/utf_8.py", line 16, in decode
                > > return codecs.utf_8_de code(input, errors, True)
                > > UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
                > > unsupported Unicode code range[/color]
                >
                > I cann't wait for the moment when encoded strings go away from Python.
                > The more I program in this language, the more confusion this difference
                > is causing. Now most of functions and various object's methods accept
                > strings and unicode, making it hard to find sources of Unicode*Errors.[/color]

                Library writers can speed up the transition by hiding 8bit interface,
                for example:

                import sqlite
                sqlite.I_promis e_to_pass_8bit_ string_only_in_ utf8_encoding(m y_signature="si g.gif")

                if you don't call this function 8bit strings will not be accepted :)
                IMHO if libraries keep on excepting both str and unicode till python
                3.0, it will just prolong the confusion of unicode newbies instead of
                guiding them in the right direction _right now_.

                Comment

                • Serge Orlov

                  #9
                  Re: sqlite utf8 encoding error

                  Jarek Zgoda wrote:[color=blue]
                  > Fredrik Lundh napisa³(a):
                  >[color=green][color=darkred]
                  > >>UnicodeDecode Error: 'utf8' codec can't decode bytes in position 13-18:
                  > >>unsupported Unicode code range
                  > >>
                  > >>does anyone have any idea on what could be going wrong? The string
                  > >>that I store in the database table is:
                  > >>
                  > >>'Keinen Text für Übereinstimmung sfehler gefunden'[/color]
                  > >
                  > > $ more test.py
                  > > # -*- coding: iso-8859-1 -*-
                  > > u = u'Keinen Text für Übereinstimmung sfehler gefunden'
                  > > s = u.encode("iso-8859-1")
                  > > u = s.decode("utf-8") # <-- this gives an error
                  > >
                  > > $ python test.py
                  > > Traceback (most recent call last):
                  > > File "test.py", line 4, in ?
                  > > u = s.decode("utf-8") # <-- this gives an error
                  > > File "lib/encodings/utf_8.py", line 16, in decode
                  > > return codecs.utf_8_de code(input, errors, True)
                  > > UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position 13-18:
                  > > unsupported Unicode code range[/color]
                  >
                  > I cann't wait for the moment when encoded strings go away from Python.
                  > The more I program in this language, the more confusion this difference
                  > is causing. Now most of functions and various object's methods accept
                  > strings and unicode, making it hard to find sources of Unicode*Errors.[/color]

                  Library writers can speed up the transition by hiding 8bit interface,
                  for example:

                  import sqlite
                  sqlite.I_promis e_to_pass_8bit_ string_only_in_ utf8_encoding(m y_signature="si g.gif")

                  if you don't call this function 8bit strings will not be accepted :)
                  IMHO if libraries keep on excepting both str and unicode till python
                  3.0, it will just prolong the confusion of unicode newbies instead of
                  guiding them in the right direction _right now_.

                  Comment

                  • Manlio Perillo

                    #10
                    Re: sqlite utf8 encoding error

                    On 17 Nov 2005 03:47:00 -0800, "Greg Miller" <et1ssgmiller@g mail.com>
                    wrote:
                    [color=blue]
                    >I have an application that uses sqlite3 to store job/error data. When
                    >I log in as a German user the error codes generated are translated into
                    >German. The error code text is then stored in the db. When I use the
                    >fetchall() to retrieve the data to generate a report I get the
                    >following error:
                    >
                    >Traceback (most recent call last):
                    > File "c:\Pest3\Gloss er\baseApp\repo rtGen.py", line 199, in
                    >OnGenerateButt onNow
                    > self.OnGenerate Button(event)
                    > File "c:\Pest3\Gloss er\baseApp\repo rtGen.py", line 243, in
                    >OnGenerateButt on
                    > warningresult = messagecursor1. fetchall()
                    >UnicodeDecodeE rror: 'utf8' codec can't decode bytes in position 13-18:
                    >unsupported Unicode code range
                    >
                    >does anyone have any idea on what could be going wrong? The string
                    >that I store in the database table is:
                    >
                    >'Keinen Text für Übereinstimmung sfehler gefunden'
                    >
                    >I thought that all strings were stored in unicode in sqlite.
                    >[/color]


                    No, they are stored as UTF-8 in sqlite and pysqlite has no way to make
                    sure the string you insert into the database is really encoded in
                    UTF-8 (the only secure way is to use Unicode strings).

                    How did you insert that string?

                    As a partial solution, try to disable automatic conversion of text
                    fields in Unicode strings:


                    def convert_text(s) :
                    # XXX do not use Unicode
                    return s


                    # Register the converter with SQLite
                    sqlite.register _converter("TEX T", convert_text)


                    ....connect(".. .",
                    detect_types=sq lite.PARSE_DECL TYPES|sqlite.PA RSE_COLNAMES
                    )




                    Regards Manlio Perillo

                    Comment

                    • Manlio Perillo

                      #11
                      Re: sqlite utf8 encoding error

                      On 17 Nov 2005 03:47:00 -0800, "Greg Miller" <et1ssgmiller@g mail.com>
                      wrote:
                      [color=blue]
                      >I have an application that uses sqlite3 to store job/error data. When
                      >I log in as a German user the error codes generated are translated into
                      >German. The error code text is then stored in the db. When I use the
                      >fetchall() to retrieve the data to generate a report I get the
                      >following error:
                      >
                      >Traceback (most recent call last):
                      > File "c:\Pest3\Gloss er\baseApp\repo rtGen.py", line 199, in
                      >OnGenerateButt onNow
                      > self.OnGenerate Button(event)
                      > File "c:\Pest3\Gloss er\baseApp\repo rtGen.py", line 243, in
                      >OnGenerateButt on
                      > warningresult = messagecursor1. fetchall()
                      >UnicodeDecodeE rror: 'utf8' codec can't decode bytes in position 13-18:
                      >unsupported Unicode code range
                      >
                      >does anyone have any idea on what could be going wrong? The string
                      >that I store in the database table is:
                      >
                      >'Keinen Text für Übereinstimmung sfehler gefunden'
                      >
                      >I thought that all strings were stored in unicode in sqlite.
                      >[/color]


                      No, they are stored as UTF-8 in sqlite and pysqlite has no way to make
                      sure the string you insert into the database is really encoded in
                      UTF-8 (the only secure way is to use Unicode strings).

                      How did you insert that string?

                      As a partial solution, try to disable automatic conversion of text
                      fields in Unicode strings:


                      def convert_text(s) :
                      # XXX do not use Unicode
                      return s


                      # Register the converter with SQLite
                      sqlite.register _converter("TEX T", convert_text)


                      ....connect(".. .",
                      detect_types=sq lite.PARSE_DECL TYPES|sqlite.PA RSE_COLNAMES
                      )




                      Regards Manlio Perillo

                      Comment

                      • Greg Miller

                        #12
                        Re: sqlite utf8 encoding error

                        Thank you for all your suggestions. I ended up casting the string to
                        unicode prior to inserting into the database.

                        Greg Miller

                        Comment

                        • Greg Miller

                          #13
                          Re: sqlite utf8 encoding error

                          Thank you for all your suggestions. I ended up casting the string to
                          unicode prior to inserting into the database.

                          Greg Miller

                          Comment

                          • Manlio Perillo

                            #14
                            Re: sqlite utf8 encoding error

                            On 18 Nov 2005 09:09:24 -0800, "Greg Miller" <et1ssgmiller@g mail.com>
                            wrote:
                            [color=blue]
                            >Thank you for all your suggestions. I ended up casting the string to
                            >unicode prior to inserting into the database.
                            >[/color]

                            Don't do it by hand if it can be done by an automated system.

                            Try with:

                            from pysqlite2 import dbapi2 as sqlite

                            def adapt_str(s):
                            # if you have declared this encoding at begin of the module
                            return s.decode("iso-8859-1")

                            sqlite.register _adapter(str, adapt_str)


                            Read pysqlite documentation for more informations:




                            Regards Manlio Perillo

                            Comment

                            • Manlio Perillo

                              #15
                              Re: sqlite utf8 encoding error

                              On 18 Nov 2005 09:09:24 -0800, "Greg Miller" <et1ssgmiller@g mail.com>
                              wrote:
                              [color=blue]
                              >Thank you for all your suggestions. I ended up casting the string to
                              >unicode prior to inserting into the database.
                              >[/color]

                              Don't do it by hand if it can be done by an automated system.

                              Try with:

                              from pysqlite2 import dbapi2 as sqlite

                              def adapt_str(s):
                              # if you have declared this encoding at begin of the module
                              return s.decode("iso-8859-1")

                              sqlite.register _adapter(str, adapt_str)


                              Read pysqlite documentation for more informations:




                              Regards Manlio Perillo

                              Comment

                              Working...