Unicode characters, XML/RSS

Adam W.
#1

Unicode characters, XML/RSS

Jul 31 '08, 04:45 AM

So I wrote a little video podcast downloading script that checks a
list of RSS feeds and downloads any new videos. Every once in a while
it find a character that is out of the 128 range in the feed and my
script blows up:

Traceback (most recent call last):
File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 88, in <module>
mainloop()
File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 75, in mainloop
update()
File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 69, in update
couldhave = getshowlst(x[1],episodecnt)
File "C:\Users\Adam\ Desktop\Rev3 DL\Rev3.py", line 30, in getshowlst
masterlist = XMLWorkspace.pa rsexml(url)
File "C:\Users\Adam\ Desktop\Rev3 DL\XMLWorkspace .py", line 54, in
parsexml
parse(url, FeedHandlerInst )
File "C:\Python25\li b\xml\sax\__ini t__.py", line 33, in parse
parser.parse(so urce)
File "C:\Python25\li b\xml\sax\expat reader.py", line 107, in parse
xmlreader.Incre mentalParser.pa rse(self, source)
File "C:\Python25\li b\xml\sax\xmlre ader.py", line 123, in parse
self.feed(buffe r)
File "C:\Python25\li b\xml\sax\expat reader.py", line 207, in feed
self._parser.Pa rse(data, isFinal)
File "C:\Users\Adam\ Desktop\Rev3 DL\XMLWorkspace .py", line 51, in
characters
self.data.appen d(string)
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xe8' in
position 236: ordinal not in range(128)

Now its my understanding that XML can contain upper Unicode characters
as long as the encoding is specified, which it is (UTF-8). The feed
validates every validator I've ran it through, every program I open it
with seems to be ok with it, except my python script. Why? Here is
the URL of the feed in question: http://revision3.com/winelibraryreserve/
My script is complaining of the fancy e in Mourvèdre

At first glance I though it was the data.append(str ing) that was un
accepting of the Unicode, but even if I put a return in the Character
handler loop, it still breaks. What am I doing wrong?
Tags: None
Stefan Behnel
#2

Jul 31 '08, 06:05 AM

Re: Unicode characters, XML/RSS

Adam W. wrote:

File "C:\Python25\li b\xml\sax\expat reader.py", line 207, in feed
self._parser.Pa rse(data, isFinal)
File "C:\Users\Adam\ Desktop\Rev3 DL\XMLWorkspace .py", line 51, in
characters
self.data.appen d(string)
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xe8' in
position 236: ordinal not in range(128)

You seem to be doing an implicit conversion from a unicode string to a byte
string, maybe by concatenating ('+' operator) strings of different types or by
writing it out into a file (or printing it, or ...) - I don't know what
self.data is or does, since you didn't provide any code.

Stefan
Comment

Unicode characters, XML/RSS

Unicode characters, XML/RSS

Comment