multiprocessing eats memory

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Max Ivanov

    multiprocessing eats memory

    I'm playing with pyprocessing module and found that it eats lot's of
    memory. I've made small test case to show it. I pass ~45mb of data to
    worker processes and than get it back slightly modified. At any time
    in main process there are shouldn't be no more than two copies of data
    (one original data and one result). I run it on 8-core server and top
    shows me that main process eats ~220 Mb and worker processes eats 90
    -150 mb. Isn't it too much?

    Small test-case is uploaded to pastebin: http://pastebin.ca/1210523
  • Istvan Albert

    #2
    Re: multiprocessing eats memory

    On Sep 25, 8:40 am, "Max Ivanov" <ivanov.ma...@g mail.comwrote:
    At any time in main process there are shouldn't be no more than two copies of data
    (one original data and one result).
    From the looks of it you are storing a lots of references to various
    copies of your data via the async set.

    Comment

    • redbaron

      #3
      Re: multiprocessing eats memory

      On 26 ÓÅÎÔ, 04:20, Istvan Albert <istvan.alb...@ gmail.comwrote:
      On Sep 25, 8:40šam, "Max Ivanov" <ivanov.ma...@g mail.comwrote:
      >
      At any time in main process there are shouldn't be no more than two copies of data
      (one original data and one result).
      >
      From the looks of it you are storing a lots of references to various
      copies of your data via the async set.
      How could I avoid of storing them? I need something to check does it
      ready or not and retrieve results if ready. I couldn't see the way to
      achieve same result without storing asyncs set.

      Comment

      • MRAB

        #4
        Re: multiprocessing eats memory

        On Sep 26, 9:52 am, redbaron <ivanov.ma...@g mail.comwrote:
        On 26 ÓÅÎÔ, 04:20, Istvan Albert <istvan.alb...@ gmail.comwrote:
        >
        On Sep 25, 8:40šam, "Max Ivanov" <ivanov.ma...@g mail.comwrote:
        >
        At any time in main process there are shouldn't be no more than two copies of data
        (one original data and one result).
        >
        From the looks of it you are storing a lots of references to various
        copies of your data via the async set.
        >
        How could I avoid of storing them? I need something to check does it
        ready or not and retrieve results if ready. I couldn't see the way to
        achieve same result without storing asyncs set.
        You could give each worker process an ID and then have them put the ID
        into a queue to signal to the main process when finished.

        BTW, your test-case modifies the asyncs set while iterating over it,
        which is a bad idea.

        Comment

        • redbaron

          #5
          Re: multiprocessing eats memory

          On 26 сент, 17:03, MRAB <goo...@mrabarn ett.plus.comwro te:
          On Sep 26, 9:52 am, redbaron <ivanov.ma...@g mail.comwrote:
          >
          On 26 ÓÅÎÔ, 04:20, Istvan Albert <istvan.alb...@ gmail.comwrote:
          >
          On Sep 25, 8:40šam, "Max Ivanov" <ivanov.ma...@g mail.comwrote:
          >
          At any time in main process there are shouldn't be no more than twocopies of data
          (one original data and one result).
          >
          From the looks of it you are storing a lots of references to various
          copies of your data via the async set.
          >
          How could I avoid of storing them? I need something to check does it
          ready or not and retrieve results if ready. I couldn't see the way to
          achieve same result without storing asyncs set.
          >
          You could give each worker process an ID and then have them put the ID
          into a queue to signal to the main process when finished.
          And how could I retrieve result from worker process without async?
          >
          BTW, your test-case modifies the asyncs set while iterating over it,
          which is a bad idea.
          My fault, there was list(asyncs) originally.

          Comment

          • Istvan Albert

            #6
            Re: multiprocessing eats memory

            On Sep 26, 4:52 am, redbaron <ivanov.ma...@g mail.comwrote:
            How could I avoid of storing them? I need something to check does it
            ready or not and retrieve results if ready. I couldn't see the way to
            achieve same result without storing asyncs set.
            It all depends on what you are trying to do. The issue that you
            originally brought up is that of memory consumption.

            When processing data in parallel you will use up as much memory as
            many datasets you are processing at any given time. If you need to
            reduce memory use then you need to start fewer processes and use some
            mechanism to distribute the work on them as they become free. (see
            recommendation that uses Queues)

            Comment

            • redbaron

              #7
              Re: multiprocessing eats memory

              When processing data in parallel you will use up as muchmemoryas
              many datasets you are processing at any given time.
              Worker processes eats 2-4 times more than I pass to them.

              >If you need to
              reducememoryuse then you need to start fewer processes and use some
              mechanism to distribute the work on them as they become free. (see
              recommendation that uses Queues)
              I don't understand how could I use Queue here? If worker process
              finish computing, it puts its' id into Queue, in main process I
              retrieve that id and how could I retrieve result from worker process
              then?

              Comment

              Working...