Help me optimize my feed script.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bsagert@gmail.com

    Help me optimize my feed script.

    I wrote my own feed reader using feedparser.py but it takes about 14
    seconds to process 7 feeds (on a windows box), which seems slow on my
    DSL line. Does anyone see how I can optimize the script below? Thanks
    in advance, Bill

    # UTF-8
    import feedparser

    rss = [
    'http://feeds.feedburne r.com/typepad/alleyinsider/
    silicon_alley_i nsider',
    'http://www.techmeme.co m/index.xml',
    'http://feeds.feedburne r.com/slate-97504',
    'http://rss.cnn.com/rss/money_mostpopul ar.rss',
    'http://rss.news.yahoo. com/rss/tech',
    'http://www.aldaily.com/rss/rss.xml',
    'http://ezralevant.com/atom.xml'
    ]
    s = '<html>\n<head> \n<title>C:/x/test.htm</title>\n'

    s += '<style>\n'\
    'h3{margin:10px 0 0 0;padding:0}\n' \
    'a.x{color:blac k}'\
    'p{margin:5px 0 0 0;padding:0}'\
    '</style>\n'

    s += '</head>\n<body>\n <br />\n'

    for url in rss:
    d = feedparser.pars e(url)
    title = d.feed.title
    link = d.feed.link
    s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
    # aldaily.com has weird feed
    if link.find('alda ily.com') != -1:
    description = d.entries[0].description
    s += description + '\n'
    for x in range(0,3):
    if link.find('alda ily.com') != -1:
    continue
    title = d.entries[x].title
    link = d.entries[x].link
    s += '<a href="'+ link +'">'+ title +'</a><br />\n'

    s += '<br /><br />\n</body>\n</html>'

    f = open('c:/scripts/myFeeds.htm', 'w')
    f.write(s)
    f.close

    print
    print 'myFeeds.htm written'
  • Carl Banks

    #2
    Re: Help me optimize my feed script.

    On Jun 26, 3:30 pm, bsag...@gmail.c om wrote:
    I wrote my own feed reader using feedparser.py but it takes about 14
    seconds to process 7 feeds (on a windows box), which seems slow on my
    DSL line. Does anyone see how I can optimize the script below? Thanks
    in advance, Bill
    >
    # UTF-8
    import feedparser
    >
    rss = [
    'http://feeds.feedburne r.com/typepad/alleyinsider/
    silicon_alley_i nsider',
    'http://www.techmeme.co m/index.xml',
    'http://feeds.feedburne r.com/slate-97504',
    'http://rss.cnn.com/rss/money_mostpopul ar.rss',
    'http://rss.news.yahoo. com/rss/tech',
    'http://www.aldaily.com/rss/rss.xml',
    'http://ezralevant.com/atom.xml'
    ]
    s = '<html>\n<head> \n<title>C:/x/test.htm</title>\n'
    >
    s += '<style>\n'\
    'h3{margin:10px 0 0 0;padding:0}\n' \
    'a.x{color:blac k}'\
    'p{margin:5px 0 0 0;padding:0}'\
    '</style>\n'
    >
    s += '</head>\n<body>\n <br />\n'
    >
    for url in rss:
    d = feedparser.pars e(url)
    title = d.feed.title
    link = d.feed.link
    s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
    # aldaily.com has weird feed
    if link.find('alda ily.com') != -1:
    description = d.entries[0].description
    s += description + '\n'
    for x in range(0,3):
    if link.find('alda ily.com') != -1:
    continue
    title = d.entries[x].title
    link = d.entries[x].link
    s += '<a href="'+ link +'">'+ title +'</a><br />\n'
    >
    s += '<br /><br />\n</body>\n</html>'
    >
    f = open('c:/scripts/myFeeds.htm', 'w')
    f.write(s)
    f.close
    >
    print
    print 'myFeeds.htm written'
    Using the += operator on strings is a common bottleneck in programs.
    First thing you should try is to get rid of that. (Recent versions of
    Python have taken steps to optimize it, but still it sometimes doesn't
    work, such as if you have more than one reference to the string
    alive.)

    Instead, create a list like this:

    s = []

    And append substrings to the list, like this:

    s.append('</head>\n<body>\n <br />\n')

    Then, when writing the string out (or otherwise using it), join all
    the substrings with the str.join method:

    f.write(''.join (s))


    Carl Banks

    Comment

    • Jason Scheirer

      #3
      Re: Help me optimize my feed script.

      On Jun 26, 12:30 pm, bsag...@gmail.c om wrote:
      I wrote my own feed reader using feedparser.py but it takes about 14
      seconds to process 7 feeds (on a windows box), which seems slow on my
      DSL line. Does anyone see how I can optimize the script below? Thanks
      in advance, Bill
      >
      # UTF-8
      import feedparser
      >
      rss = [
      'http://feeds.feedburne r.com/typepad/alleyinsider/
      silicon_alley_i nsider',
      'http://www.techmeme.co m/index.xml',
      'http://feeds.feedburne r.com/slate-97504',
      'http://rss.cnn.com/rss/money_mostpopul ar.rss',
      'http://rss.news.yahoo. com/rss/tech',
      'http://www.aldaily.com/rss/rss.xml',
      'http://ezralevant.com/atom.xml'
      ]
      s = '<html>\n<head> \n<title>C:/x/test.htm</title>\n'
      >
      s += '<style>\n'\
           'h3{margin:10px 0 0 0;padding:0}\n' \
           'a.x{color:blac k}'\
           'p{margin:5px 0 0 0;padding:0}'\
           '</style>\n'
      >
      s += '</head>\n<body>\n <br />\n'
      >
      for url in rss:
              d = feedparser.pars e(url)
              title = d.feed.title
              link = d.feed.link
              s += '\n<h3><a href="'+ link +'" class="x">'+ title+'</a></h3>\n'
              # aldaily.com has weird feed
              if link.find('alda ily.com') != -1:
                      description = d.entries[0].description
                      s += description + '\n'
              for x in range(0,3):
                      if link.find('alda ily.com') != -1:
                              continue
                      title = d.entries[x].title
                      link = d.entries[x].link
                      s += '<a href="'+ link +'">'+ title +'</a><br />\n'
      >
      s += '<br /><br />\n</body>\n</html>'
      >
      f = open('c:/scripts/myFeeds.htm', 'w')
      f.write(s)
      f.close
      >
      print
      print 'myFeeds.htm written'
      I can 100% guarantee you that the extended run time is network I/O
      bound. Investigate using a thread pool to load the feeds in parallel.
      Some code you might be able to shim in:

      # Extra imports
      import threading
      import Queue

      # Function that fetches and pushes
      def parse_and_put(u rl, queue_):
      parsed_feed = feedparser.pars e(url)
      queue_.put(pars ed_feed)

      # Set up some variables
      my_queue = Queue.Queue()
      threads = []

      # Set up a thread for fetching each URL
      for url in rss:
      url_thread = threading.Threa d(target=parse_ and_put, name=url,
      args=(url, my_queue))
      threads.append( url_thread)
      url_thread.setD aemonic(False)
      url_thread.star t()

      # Wait for threads to finish
      for thread in threads:
      thread.join()

      # Push the results into a list
      feeds_list = []
      while not my_queue.empty( ):
      feeds_list.appe nd(my_queue.get ())

      # Do what you were doing before, replacing the for url in rss with for
      d in feedS_list
      for d in feeds_list:
      title = d.feed.title
      link = d.feed.link

      Comment

      Working...