Help me optimize my feed script.

**Carl Banks** · Jun 27 '08, 04:30 PM

Re: Help me optimize my feed script.

On Jun 26, 3:30 pm, bsag...@gmail.c om wrote:

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill
>
# UTF-8
import feedparser
>
rss = [
'http://feeds.feedburne r.com/typepad/alleyinsider/
silicon_alley_i nsider',
'http://www.techmeme.co m/index.xml',
'http://feeds.feedburne r.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopul ar.rss',
'http://rss.news.yahoo. com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head> \n<title>C:/x/test.htm</title>\n'
>
s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n' \
'a.x{color:blac k}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'
>
s += '</head>\n<body>\n <br />\n'
>
for url in rss:
d = feedparser.pars e(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
# aldaily.com has weird feed
if link.find('alda ily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('alda ily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'
>
s += '<br /><br />\n</body>\n</html>'
>
f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close
>
print
print 'myFeeds.htm written'

Using the += operator on strings is a common bottleneck in programs.
First thing you should try is to get rid of that. (Recent versions of
Python have taken steps to optimize it, but still it sometimes doesn't
work, such as if you have more than one reference to the string
alive.)

Instead, create a list like this:

s = []

And append substrings to the list, like this:

s.append('</head>\n<body>\n <br />\n')

Then, when writing the string out (or otherwise using it), join all
the substrings with the str.join method:

f.write(''.join (s))

Carl Banks

**Jason Scheirer** · Jun 27 '08, 04:30 PM

Re: Help me optimize my feed script.

On Jun 26, 12:30 pm, bsag...@gmail.c om wrote:

I wrote my own feed reader using feedparser.py but it takes about 14
seconds to process 7 feeds (on a windows box), which seems slow on my
DSL line. Does anyone see how I can optimize the script below? Thanks
in advance, Bill
>
# UTF-8
import feedparser
>
rss = [
'http://feeds.feedburne r.com/typepad/alleyinsider/
silicon_alley_i nsider',
'http://www.techmeme.co m/index.xml',
'http://feeds.feedburne r.com/slate-97504',
'http://rss.cnn.com/rss/money_mostpopul ar.rss',
'http://rss.news.yahoo. com/rss/tech',
'http://www.aldaily.com/rss/rss.xml',
'http://ezralevant.com/atom.xml'
]
s = '<html>\n<head> \n<title>C:/x/test.htm</title>\n'
>
s += '<style>\n'\
'h3{margin:10px 0 0 0;padding:0}\n' \
'a.x{color:blac k}'\
'p{margin:5px 0 0 0;padding:0}'\
'</style>\n'
>
s += '</head>\n<body>\n <br />\n'
>
for url in rss:
d = feedparser.pars e(url)
title = d.feed.title
link = d.feed.link
s += '\n<h3><a href="'+ link +'" class="x">'+ title+'</a></h3>\n'
# aldaily.com has weird feed
if link.find('alda ily.com') != -1:
description = d.entries[0].description
s += description + '\n'
for x in range(0,3):
if link.find('alda ily.com') != -1:
continue
title = d.entries[x].title
link = d.entries[x].link
s += '<a href="'+ link +'">'+ title +'</a><br />\n'
>
s += '<br /><br />\n</body>\n</html>'
>
f = open('c:/scripts/myFeeds.htm', 'w')
f.write(s)
f.close
>
print
print 'myFeeds.htm written'

I can 100% guarantee you that the extended run time is network I/O
bound. Investigate using a thread pool to load the feeds in parallel.
Some code you might be able to shim in:

# Extra imports
import threading
import Queue

# Function that fetches and pushes
def parse_and_put(u rl, queue_):
parsed_feed = feedparser.pars e(url)
queue_.put(pars ed_feed)

# Set up some variables
my_queue = Queue.Queue()
threads = []

# Set up a thread for fetching each URL
for url in rss:
url_thread = threading.Threa d(target=parse_ and_put, name=url,
args=(url, my_queue))
threads.append( url_thread)
url_thread.setD aemonic(False)
url_thread.star t()

# Wait for threads to finish
for thread in threads:
thread.join()

# Push the results into a list
feeds_list = []
while not my_queue.empty( ):
feeds_list.appe nd(my_queue.get ())

# Do what you were doing before, replacing the for url in rss with for
d in feedS_list
for d in feeds_list:
title = d.feed.title
link = d.feed.link

Help me optimize my feed script.

Help me optimize my feed script.

Comment

Comment