Any python scripts to do parallel downloading?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Frank Potter

    Any python scripts to do parallel downloading?

    I want to find a multithreaded downloading lib in python,
    can someone recommend one for me, please?
    Thanks~

  • Michele Simionato

    #2
    Re: Any python scripts to do parallel downloading?

    On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
    I want to find a multithreaded downloading lib in python,
    can someone recommend one for me, please?
    Thanks~
    Why do you want to use threads for that? Twisted is the
    obvious solution for your problem, but you may use any
    asynchronous framework, as for instance the good ol
    Tkinter:

    """
    Example of asynchronous programming with Tkinter. Download 10 times
    the same URL.
    """

    import sys, urllib, itertools, Tkinter

    URL = 'http://docs.python.org/dev/lib/module-urllib.html'

    class Downloader(obje ct):
    chunk = 1024

    def __init__(self, urls, frame):
    self.urls = urls
    self.downloads = [self.download(i ) for i in range(len(urls) )]
    self.tkvars = []
    self.tklabels = []
    for url in urls:
    var = Tkinter.StringV ar(frame)
    lbl = Tkinter.Label(f rame, textvar=var)
    lbl.pack()
    self.tkvars.app end(var)
    self.tklabels.a ppend(lbl)
    frame.pack()

    def download(self, i):
    src = urllib.urlopen( self.urls[i])
    size = int(src.info()['Content-Length'])
    for block in itertools.count ():
    chunk = src.read(self.c hunk)
    if not chunk: break
    percent = block * self.chunk * 100/size
    msg = '%s: downloaded %2d%% of %s K' % (
    self.urls[i], percent, size/1024)
    self.tkvars[i].set(msg)
    yield None
    self.tkvars[i].set('Downloade d %s' % self.urls[i])

    if __name__ == '__main__':
    root = Tkinter.Tk()
    frame = Tkinter.Frame(r oot)
    downloader = Downloader([URL] * 10, frame)
    def next(cycle):
    try:
    cycle.next().ne xt()
    except StopIteration:
    pass
    root.after(50, next, cycle)
    root.after(0, next, itertools.cycle (downloader.dow nloads))
    root.mainloop()


    Michele Simionato

    Comment

    • Carl J. Van Arsdall

      #3
      Re: Any python scripts to do parallel downloading?

      Michele Simionato wrote:
      On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
      >
      >I want to find a multithreaded downloading lib in python,
      >can someone recommend one for me, please?
      >Thanks~
      >>
      >
      Why do you want to use threads for that? Twisted is the
      obvious solution for your problem, but you may use any
      asynchronous framework, as for instance the good ol
      >
      Well, since it will be io based, why not use threads? They are easy to
      use and it would do the job just fine. Then leverage some other
      technology on top of that.

      You could go as far as using wget via os.system() in a thread, if the
      app is simple enough.

      def getSite(site):
      os.system('wget %s',site)

      threadList =[]
      for site in websiteList:
      threadList.appe nd(threading.Th read( target=getSite, args=(site,)))

      for thread in threadList:
      thread.start()

      for thread in threadList:
      thread.join()
      Tkinter:
      >
      """
      Example of asynchronous programming with Tkinter. Download 10 times
      the same URL.
      """
      >
      import sys, urllib, itertools, Tkinter
      >
      URL = 'http://docs.python.org/dev/lib/module-urllib.html'
      >
      class Downloader(obje ct):
      chunk = 1024
      >
      def __init__(self, urls, frame):
      self.urls = urls
      self.downloads = [self.download(i ) for i in range(len(urls) )]
      self.tkvars = []
      self.tklabels = []
      for url in urls:
      var = Tkinter.StringV ar(frame)
      lbl = Tkinter.Label(f rame, textvar=var)
      lbl.pack()
      self.tkvars.app end(var)
      self.tklabels.a ppend(lbl)
      frame.pack()
      >
      def download(self, i):
      src = urllib.urlopen( self.urls[i])
      size = int(src.info()['Content-Length'])
      for block in itertools.count ():
      chunk = src.read(self.c hunk)
      if not chunk: break
      percent = block * self.chunk * 100/size
      msg = '%s: downloaded %2d%% of %s K' % (
      self.urls[i], percent, size/1024)
      self.tkvars[i].set(msg)
      yield None
      self.tkvars[i].set('Downloade d %s' % self.urls[i])
      >
      if __name__ == '__main__':
      root = Tkinter.Tk()
      frame = Tkinter.Frame(r oot)
      downloader = Downloader([URL] * 10, frame)
      def next(cycle):
      try:
      cycle.next().ne xt()
      except StopIteration:
      pass
      root.after(50, next, cycle)
      root.after(0, next, itertools.cycle (downloader.dow nloads))
      root.mainloop()
      >
      >
      Michele Simionato
      >
      >

      --

      Carl J. Van Arsdall
      cvanarsdall@mvi sta.com
      Build and Release
      MontaVista Software

      Comment

      • Carl Banks

        #4
        Re: Any python scripts to do parallel downloading?

        Michele Simionato wrote:
        On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
        I want to find a multithreaded downloading lib in python,
        can someone recommend one for me, please?
        Thanks~
        >
        Why do you want to use threads for that? Twisted is the
        obvious solution for your problem,
        Overkill? Just to download a few web pages? You've got to be
        kidding.
        but you may use any
        asynchronous framework, as for instance the good ol
        Tkinter:
        Well, of all the things you can use threads for, this is probably the
        simplest, so I don't see any reason to prefer asynchronous method
        unless you're used to it. One Queue for dispatching should be enough
        to synchronize everything; maybe a Queue or simple lock at end as well
        depending on the need.

        The OP might not even care whether it's threaded or asynchronous.


        Carl Banks

        Comment

        • Michele Simionato

          #5
          Re: Any python scripts to do parallel downloading?

          On Jan 31, 9:24 pm, "Carl Banks" <pavlovevide... @gmail.comwrote :
          Well, of all the things you can use threads for, this is probably the
          simplest, so I don't see any reason to prefer asynchronous method
          unless you're used to it.
          Well, actually there is a reason why I prefer the asynchronous
          approach even for the simplest things:
          I can stop my program at any time with CTRL-C. When developing a
          threaded program, or I implement a
          mechanism for stopping the threads (which should be safe enough to
          survive the bugs introduced
          while I develop, BTW), or I have to resort to kill -9, and I *hate*
          that. Especially since kill -9 does not
          honor try .. finally statements.
          In short, I prefer to avoid threads, *especially* for the simplest
          things.
          I use threads only when I am forced to, typically when I am using a
          multithreaded framework
          interacting with a database.

          Michele Simionato

          Comment

          • Michele Simionato

            #6
            Re: Any python scripts to do parallel downloading?

            On Jan 31, 8:31 pm, "Carl J. Van Arsdall" <cvanarsd...@mv ista.com>
            wrote:
            >
            Well, since it will be io based, why not use threads? They are easy to
            use and it would do the job just fine. Then leverage some other
            technology on top of that.
            >
            You could go as far as using wget via os.system() in a thread, if the
            app is simple enough.
            Calling os.system in a thread look really perverse to me, you would
            loose CTRL-C without any benefit.
            Why not to use subprocess.Pope n instead?

            I am unhappy with the current situation in Python. Whereas for most
            things Python is such that the simplest
            things look simple, this is not the case for threads. Unfortunately we
            have a threading module in the
            standard library, but not a "Twisted for pedestrian" module, so people
            overlook the simplest solution
            in favor of the complex one.
            Another thing I miss is a facility to run an iterator in the Tkinter
            mainloop: since Tkinter is not thread-safe,
            writing a multiple-download progress bar in Tkinter using threads is
            definitely less obvious than running
            an iterator in the main loop, as I discovered the hard way. Writing a
            facility to run iterators in Twisted
            is a three-liner, but it is not already there, nor standard :-(

            Michele Simionato

            Comment

            • Michele Simionato

              #7
              Re: Any python scripts to do parallel downloading?

              On Feb 1, 1:43 pm, Jean-Paul Calderone <exar...@divmod .comwrote:
              On 31 Jan 2007 22:02:36 -0800, Michele Simionato <michele.simion ...@gmail.comwr ote:
              Another thing I miss is a facility to run an iterator in the Tkinter
              mainloop: since Tkinter is not thread-safe,
              writing a multiple-download progress bar in Tkinter using threads is
              definitely less obvious than running
              an iterator in the main loop, as I discovered the hard way. Writing a
              facility to run iterators in Twisted
              is a three-liner, but it is not already there, nor standard :-(
              >
              Have you seen the recently introduced twisted.interne t.task.coiterat e()?
              It sounds like it might be what you're after.
              Ops! There is a misprint here, I meant "writing a facility to run
              iterators in TKINTER",
              not in Twisted. Twisted has already everything, even too much. I would
              like to have
              a better support for asynchronous programming in the standard library,
              for people
              not needing the full power of Twisted. I also like to keep my
              dependencies at a minimum.

              Michele Simionato

              Comment

              • Carl Banks

                #8
                Re: Any python scripts to do parallel downloading?

                On Jan 31, 3:37 pm, Jean-Paul Calderone <exar...@divmod .comwrote:
                On 31 Jan 2007 12:24:21 -0800, Carl Banks <pavlovevide... @gmail.comwrote :
                >
                Michele Simionato wrote:
                On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
                I want to find a multithreaded downloading lib in python,
                can someone recommend one for me, please?
                Thanks~
                >
                Why do you want to use threads for that? Twisted is the
                obvious solution for your problem,
                >
                Overkill? Just to download a few web pages? You've got to be
                kidding.
                >
                Better "overkill" (whatever that is) than wasting time re-implementing
                the same boring thing over and over for no reason.
                "I need to download some web pages in parallel."

                "Here's tremendously large and complex framework. Download, install,
                and learn this large and complex framework. Then you can write your
                very simple throwaway script with ease."

                Is the twisted solution even shorter? Doing this with threads I'm
                thinking would be on the order of 20 lines of code.


                Carl Banks

                Comment

                • Carl Banks

                  #9
                  Re: Any python scripts to do parallel downloading?

                  On Feb 1, 9:20 am, Jean-Paul Calderone <exar...@divmod .comwrote:
                  On 1 Feb 2007 06:14:40 -0800, Carl Banks <pavlovevide... @gmail.comwrote :
                  >
                  >
                  >
                  On Jan 31, 3:37 pm, Jean-Paul Calderone <exar...@divmod .comwrote:
                  On 31 Jan 2007 12:24:21 -0800, Carl Banks <pavlovevide... @gmail.comwrote :
                  >
                  Michele Simionato wrote:
                  On Jan 31, 5:23 pm, "Frank Potter" <could....@gmai l.comwrote:
                  I want to find a multithreaded downloading lib in python,
                  can someone recommend one for me, please?
                  Thanks~
                  >
                  Why do you want to use threads for that? Twisted is the
                  obvious solution for your problem,
                  >
                  Overkill? Just to download a few web pages? You've got to be
                  kidding.
                  >
                  Better "overkill" (whatever that is) than wasting time re-implementing
                  the same boring thing over and over for no reason.
                  >
                  "I need to download some web pages in parallel."
                  >
                  "Here's tremendously large and complex framework. Download, install,
                  and learn this large and complex framework. Then you can write your
                  very simple throwaway script with ease."
                  >
                  Is the twisted solution even shorter? Doing this with threads I'm
                  thinking would be on the order of 20 lines of code.
                  >
                  The /already written/ solution I linked to in my original response was five
                  lines shorter than that.
                  And I suppose "re-implementing the same boring thing over and over" is
                  ok if it's 15 lines but is too much to bear if it's 20 (irrespective
                  of the additional large framework the former requires).


                  Carl Banks

                  Comment

                  • Carl Banks

                    #10
                    Re: Any python scripts to do parallel downloading?

                    On Feb 1, 12:40 am, "Michele Simionato" <michele.simion ...@gmail.com>
                    wrote:
                    On Jan 31, 9:24 pm, "Carl Banks" <pavlovevide... @gmail.comwrote :
                    >
                    Well, of all the things you can use threads for, this is probably the
                    simplest, so I don't see any reason to prefer asynchronous method
                    unless you're used to it.
                    >
                    Well, actually there is a reason why I prefer the asynchronous
                    approach even for the simplest things:
                    I can stop my program at any time with CTRL-C. When developing a
                    threaded program, or I implement a
                    mechanism for stopping the threads (which should be safe enough to
                    survive the bugs introduced
                    while I develop, BTW), or I have to resort to kill -9, and I *hate*
                    that. Especially since kill -9 does not
                    honor try .. finally statements.
                    In short, I prefer to avoid threads, *especially* for the simplest
                    things.
                    I use threads only when I am forced to, typically when I am using a
                    multithreaded framework
                    interacting with a database.
                    Fair enough.

                    I'm just saying that just because something is good for funded,
                    important, enterprise tasks, it doesn't mean very simple stuff
                    automatically has to use it as well. For Pete's sake, even Perl works
                    for simple scripts.


                    Carl Banks

                    Comment

                    Working...