Unicode characters, XML/RSS

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Adam W.

    Unicode characters, XML/RSS

    So I wrote a little video podcast downloading script that checks a
    list of RSS feeds and downloads any new videos. Every once in a while
    it find a character that is out of the 128 range in the feed and my
    script blows up:

    Traceback (most recent call last):
    File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 88, in <module>
    mainloop()
    File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 75, in mainloop
    update()
    File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 69, in update
    couldhave = getshowlst(x[1],episodecnt)
    File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 30, in getshowlst
    masterlist = XMLWorkspace.pa rsexml(url)
    File "C:\Users\Adam\ Desktop\Rev3 DL\XMLWorkspace .py", line 54, in
    parsexml
    parse(url, FeedHandlerInst )
    File "C:\Python25\li b\xml\sax\__ini t__.py", line 33, in parse
    parser.parse(so urce)
    File "C:\Python25\li b\xml\sax\expat reader.py", line 107, in parse
    xmlreader.Incre mentalParser.pa rse(self, source)
    File "C:\Python25\li b\xml\sax\xmlre ader.py", line 123, in parse
    self.feed(buffe r)
    File "C:\Python25\li b\xml\sax\expat reader.py", line 207, in feed
    self._parser.Pa rse(data, isFinal)
    File "C:\Users\Adam\ Desktop\Rev3 DL\XMLWorkspace .py", line 51, in
    characters
    self.data.appen d(string)
    UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xe8' in
    position 236: ordinal not in range(128)


    Now its my understanding that XML can contain upper Unicode characters
    as long as the encoding is specified, which it is (UTF-8). The feed
    validates every validator I've ran it through, every program I open it
    with seems to be ok with it, except my python script. Why? Here is
    the URL of the feed in question: http://revision3.com/winelibraryreserve/
    My script is complaining of the fancy e in Mourvèdre

    At first glance I though it was the data.append(str ing) that was un
    accepting of the Unicode, but even if I put a return in the Character
    handler loop, it still breaks. What am I doing wrong?
  • Stefan Behnel

    #2
    Re: Unicode characters, XML/RSS

    Adam W. wrote:
    File "C:\Python25\li b\xml\sax\expat reader.py", line 207, in feed
    self._parser.Pa rse(data, isFinal)
    File "C:\Users\Adam\ Desktop\Rev3 DL\XMLWorkspace .py", line 51, in
    characters
    self.data.appen d(string)
    UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xe8' in
    position 236: ordinal not in range(128)
    You seem to be doing an implicit conversion from a unicode string to a byte
    string, maybe by concatenating ('+' operator) strings of different types or by
    writing it out into a file (or printing it, or ...) - I don't know what
    self.data is or does, since you didn't provide any code.

    Stefan

    Comment

    Working...