Yet another UnicodeDecodeError problem

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ashitpro
    Recognized Expert Contributor
    • Aug 2007
    • 542

    Yet another UnicodeDecodeError problem

    Code:
    getResponse="HTTP/1.1 200 OK\r\nContent-Length: %d\r\nCache-Control: no-cache\r\n\r\n%s" % (payloadLength,byteString)
    getResponse = unicode(getResponse)
    At second line I get UnicodeDecodeEr ror.
    I am trying to replace "200 OK" from getResponse string with "500", using getResponse.rep lace(...)

    Internally it convert string to unicode, I have just made it explicitly for better understanding.

    Any help?
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    What version of Python? Is your default encoding "ascii"? When converting from a standard string to a unicode string, a UnicodeError exception may be raised if a character that cannot be converted is encountered. A full traceback may provide the direct cause of the exception.
    Code:
    >>> s = "abcdef%s" % ("\xfc")
    >>> print s
    abcdefü
    >>> unicode(s)
    Traceback (most recent call last):
      File "<interactive input>", line 1, in ?
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 6: ordinal not in range(128)
    >>>

    Comment

    • ashitpro
      Recognized Expert Contributor
      • Aug 2007
      • 542

      #3
      I am using python 2.6
      Default encoding is "ascii"..

      How do you tackle this problem....?

      Let me make this more straight..

      getResponse = getResponse + (some binary data)

      getResponse.Rep lace("this_str" ,"that_str")

      Obviously, in second statement, it will try to decode to 'ascii' and throw exception.

      Is there any standard way to deal with binary data in string, which could solve my problem?

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        Check out the struct module.

        Comment

        • dwblas
          Recognized Expert Contributor
          • May 2008
          • 626

          #5
          Someday I will have to take the time to learn Unicode, or just switch to Python3.X. A workaround until then is to drop down to decimal values.
          Code:
          s="abcdef%s200 OK\r\notherstuff" % ("\xfc")
          to_find = [ord(ltr) for ltr in "200 OK"]
          new_str_list = []
          len_s = len(s)
          for ctr in range(len_s):
              found = False
              ltr=s[ctr]
              if ord(ltr) == to_find[0]:     ## first characters match
                  ## assumes the range will not go past end of line
                  found = True
                  for x in range(len(to_find)):
                      if ord(s[ctr+x]) != to_find[x]:
                          found = False         ## not a match
              if found :
                  new_str_list.append("5")      ## replace "2"
              else:        
                  new_str_list.append(ltr)
          print "".join(new_str_list)

          Comment

          • ashitpro
            Recognized Expert Contributor
            • Aug 2007
            • 542

            #6
            I don't have problem with switching to python 3.x

            Will that make any difference? If yes, how?

            Comment

            • dwblas
              Recognized Expert Contributor
              • May 2008
              • 626

              #7
              "In Python 3, all strings are sequences of Unicode characters."
              See Section 4.3 at Dive Into Python 3 for more info.

              Comment

              Working...