Looping-related Memory Leak

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Tom Davis

    Looping-related Memory Leak

    I am having a problem where a long-running function will cause a
    memory leak / balloon for reasons I cannot figure out. Essentially, I
    loop through a directory of pickled files, load them, and run some
    other functions on them. In every case, each function uses only local
    variables and I even made sure to use `del` on each variable at the
    end of the loop. However, as the loop progresses the amount of memory
    used steadily increases.

    I had a related problem before where I would loop through a very large
    data-set of files and cache objects that were used to parse or
    otherwise operate on different files in the data-set. Once again,
    only local variables were used in the cached object's methods. After
    a while it got to the point where simply running these methods on the
    data took so long that I had to terminate the process (think, first
    iteration .01sec, 1000th iteration 10sec). The solution I found was
    to cause the cached objects to become "stale" after a certain number
    of uses and be deleted and re-instantiated.

    However, in the current case, there is no caching being done at all.
    Only local variables are involved. It would seem that over time
    objects take up more memory even when there are no attributes being
    added to them or altered. Has anyone experienced similar anomalies?
    Is this behavior to be expected for some other reason? If not, is
    there a common fix for it, i.e. manual GC or something?




  • Carl Banks

    #2
    Re: Looping-related Memory Leak

    On Jun 26, 5:19 am, Tom Davis <binju...@gmail .comwrote:
    I am having a problem where a long-running function will cause a
    memory leak / balloon for reasons I cannot figure out. Essentially, I
    loop through a directory of pickled files, load them, and run some
    other functions on them. In every case, each function uses only local
    variables and I even made sure to use `del` on each variable at the
    end of the loop. However, as the loop progresses the amount of memory
    used steadily increases.
    Do you happen to be using a single Unpickler instance? If so, change
    it to use a different instance each time. (If you just use the module-
    level load function you are already using a different instance each
    time.)

    Unpicklers hold a reference to everything they've seen, which prevents
    objects it unpickles from being garbage collected until it is
    collected itself.


    Carl Banks

    Comment

    • Peter Otten

      #3
      Re: Looping-related Memory Leak

      Tom Davis wrote:
      I am having a problem where a long-running function will cause a
      memory leak / balloon for reasons I cannot figure out. Essentially, I
      loop through a directory of pickled files, load them, and run some
      other functions on them. In every case, each function uses only local
      variables and I even made sure to use `del` on each variable at the
      end of the loop. However, as the loop progresses the amount of memory
      used steadily increases.
      >
      I had a related problem before where I would loop through a very large
      data-set of files and cache objects that were used to parse or
      otherwise operate on different files in the data-set. Once again,
      only local variables were used in the cached object's methods. After
      a while it got to the point where simply running these methods on the
      data took so long that I had to terminate the process (think, first
      iteration .01sec, 1000th iteration 10sec). The solution I found was
      to cause the cached objects to become "stale" after a certain number
      of uses and be deleted and re-instantiated.
      Here the alleged "memory leak" is clearly the cache, and the slowdown is
      caused by garbage collector. The solution is to turn it off with
      gc.disable() during phases where your programm allocates huge amounts of
      objects with the intent of keeping them for a longer time.
      However, in the current case, there is no caching being done at all.
      Only local variables are involved. It would seem that over time
      objects take up more memory even when there are no attributes being
      added to them or altered. Has anyone experienced similar anomalies?
      Is this behavior to be expected for some other reason? If not, is
      there a common fix for it, i.e. manual GC or something?
      Unless you post a script demonstrating the leak I will assume you are
      overlooking a reference that keeps your data alive -- whether it's a true
      global or within a long-running function doesn't really matter.

      Peter

      Comment

      • Tom Davis

        #4
        Re: Looping-related Memory Leak

        On Jun 26, 5:38 am, Carl Banks <pavlovevide... @gmail.comwrote :
        On Jun 26, 5:19 am, Tom Davis <binju...@gmail .comwrote:
        >
        I am having a problem where a long-running function will cause a
        memory leak / balloon for reasons I cannot figure out. Essentially, I
        loop through a directory of pickled files, load them, and run some
        other functions on them. In every case, each function uses only local
        variables and I even made sure to use `del` on each variable at the
        end of the loop. However, as the loop progresses the amount of memory
        used steadily increases.
        >
        Do you happen to be using a single Unpickler instance? If so, change
        it to use a different instance each time. (If you just use the module-
        level load function you are already using a different instance each
        time.)
        >
        Unpicklers hold a reference to everything they've seen, which prevents
        objects it unpickles from being garbage collected until it is
        collected itself.
        >
        Carl Banks
        Carl,

        Yes, I was using the module-level unpickler. I changed it with little
        effect. I guess perhaps this is my misunderstandin g of how GC works.
        For instance, if I have `a = Obj()` and run `a.some_method( )` which
        generates a highly-nested local variable that cannot be easily garbage
        collected, it was my assumption that either (1) completing the method
        call or (2) deleting the object instance itself would automatically
        destroy any variables used by said method. This does not appear to be
        the case, however. Even when a variable/object's scope is destroyed,
        it would seem t hat variables/objects created within that scope cannot
        always be reclaimed, depending on their complexity.

        To me, this seems illogical. I can understand that the GC is
        reluctant to reclaim objects that have many connections to other
        objects and so forth, but once those objects' scopes are gone, why
        doesn't it force a reclaim? For instance, I can use timeit to create
        an object instance, run a method of it, then `del` the variable used
        to store the instance, but each loop thereafter continues to require
        more memory and take more time. 1000 runs may take .27 usec/pass
        whereas 100000 takes 2 usec/pass (Average).

        Comment

        • Marc 'BlackJack' Rintsch

          #5
          Re: Looping-related Memory Leak

          On Mon, 30 Jun 2008 10:55:00 -0700, Tom Davis wrote:
          To me, this seems illogical. I can understand that the GC is
          reluctant to reclaim objects that have many connections to other
          objects and so forth, but once those objects' scopes are gone, why
          doesn't it force a reclaim? For instance, I can use timeit to create
          an object instance, run a method of it, then `del` the variable used
          to store the instance, but each loop thereafter continues to require
          more memory and take more time. 1000 runs may take .27 usec/pass
          whereas 100000 takes 2 usec/pass (Average).
          `del` just removes the name and one reference to that object. Objects are
          only deleted when there's no reference to them anymore. Your example
          sounds like you keep references to objects somehow that are accumulating.
          Maybe by accident. Any class level bound mutables or mutable default
          values in functions in that source code? Would be my first guess.

          Ciao,
          Marc 'BlackJack' Rintsch

          Comment

          • Carl Banks

            #6
            Re: Looping-related Memory Leak

            On Jun 30, 1:55 pm, Tom Davis <binju...@gmail .comwrote:
            On Jun 26, 5:38 am, Carl Banks <pavlovevide... @gmail.comwrote :
            >
            >
            >
            On Jun 26, 5:19 am, Tom Davis <binju...@gmail .comwrote:
            >
            I am having a problem where a long-running function will cause a
            memory leak / balloon for reasons I cannot figure out. Essentially, I
            loop through a directory of pickled files, load them, and run some
            other functions on them. In every case, each function uses only local
            variables and I even made sure to use `del` on each variable at the
            end of the loop. However, as the loop progresses the amount of memory
            used steadily increases.
            >
            Do you happen to be using a single Unpickler instance? If so, change
            it to use a different instance each time. (If you just use the module-
            level load function you are already using a different instance each
            time.)
            >
            Unpicklers hold a reference to everything they've seen, which prevents
            objects it unpickles from being garbage collected until it is
            collected itself.
            >
            Carl Banks
            >
            Carl,
            >
            Yes, I was using the module-level unpickler. I changed it with little
            effect. I guess perhaps this is my misunderstandin g of how GC works.
            For instance, if I have `a = Obj()` and run `a.some_method( )` which
            generates a highly-nested local variable that cannot be easily garbage
            collected, it was my assumption that either (1) completing the method
            call or (2) deleting the object instance itself would automatically
            destroy any variables used by said method. This does not appear to be
            the case, however. Even when a variable/object's scope is destroyed,
            it would seem t hat variables/objects created within that scope cannot
            always be reclaimed, depending on their complexity.
            >
            To me, this seems illogical. I can understand that the GC is
            reluctant to reclaim objects that have many connections to other
            objects and so forth, but once those objects' scopes are gone, why
            doesn't it force a reclaim?

            Are your objects involved in circular references, and do you have any
            objects with a __del__ method? Normally objects are reclaimed when
            the reference count goes to zero, but if there are cycles then the
            reference count never reaches zero, and they remain alive until the
            generational garbage collector makes a pass to break the cycle.
            However, the generational collector doesn't break cycles that involve
            objects with a __del__method.

            Are you calling any C extensions that might be failing to decref an
            object? There could be a memory leak.

            Are you keeping a reference around somewhere. For example, appending
            results to a list, and the result keeps a reference to all of your
            unpickled data for some reason.


            You know, we can throw out all these scenarios, but these suggestions
            are just common pitfalls. If it doesn't look like one of these
            things, you're going to have to do your own legwork to help isolate
            what's causing the behavior. Then if needed you can come back to us
            with more detailed information.

            Start with your original function, and slowly remove functionality
            from it until the bad behavior goes away. That will give you a clue
            what's causing it.


            Carl Banks

            Comment

            • Tom Davis

              #7
              Re: Looping-related Memory Leak

              On Jun 30, 3:12 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
              On Mon, 30 Jun 2008 10:55:00 -0700, Tom Davis wrote:
              To me, this seems illogical. I can understand that the GC is
              reluctant to reclaim objects that have many connections to other
              objects and so forth, but once those objects' scopes are gone, why
              doesn't it force a reclaim? For instance, I can use timeit to create
              an object instance, run a method of it, then `del` the variable used
              to store the instance, but each loop thereafter continues to require
              more memory and take more time. 1000 runs may take .27 usec/pass
              whereas 100000 takes 2 usec/pass (Average).
              >
              `del` just removes the name and one reference to that object. Objects are
              only deleted when there's no reference to them anymore. Your example
              sounds like you keep references to objects somehow that are accumulating.
              Maybe by accident. Any class level bound mutables or mutable default
              values in functions in that source code? Would be my first guess.
              >
              Ciao,
              Marc 'BlackJack' Rintsch
              Marc,

              Thanks for the tips. A quick confirmation:

              I took "class level bound mutables" to mean something like:

              Class A(object):
              SOME_MUTABLE = [1,2]
              ...

              And "mutable default values" to mean:

              ...
              def a(self, arg=[1,2]):
              ...

              If this is correct, I have none of these. I understand your point
              about the references, but in my `timeit` example the statement is as
              simple as this:

              import MyClass
              a = MyClass()
              del a

              So, yes, it would seem that object references are piling up and not
              being removed. This is entirely by accident. Is there some kind of
              list somewhere that says "If your class has any of these attributes
              (mutable defaults, class-level mutables, etc.) it may not be properly
              dereferenced:"? My obvious hack around this is to only do X loops at a
              time and make a cron to run the script over and over until all the
              files have been processed, but I'd much prefer to make the code run as
              intended. I ran a test overnight last night and found that at first a
              few documents were handled per second, but when I woke up it had
              slowed down so much that it took over an hour to process a single
              document! The RAM usage went from 20mb at the start to over 300mb when
              it should actually never use more than about 20mb because everything
              is handled with local variables and new objects are instantiated for
              each document. This is a serious problem.

              Thanks,

              Tom

              Comment

              • Tom Davis

                #8
                Re: Looping-related Memory Leak

                On Jun 30, 8:24 pm, Carl Banks <pavlovevide... @gmail.comwrote :
                On Jun 30, 1:55 pm, Tom Davis <binju...@gmail .comwrote:
                >
                >
                >
                On Jun 26, 5:38 am, Carl Banks <pavlovevide... @gmail.comwrote :
                >
                On Jun 26, 5:19 am, Tom Davis <binju...@gmail .comwrote:
                >
                I am having a problem where a long-running function will cause a
                memory leak / balloon for reasons I cannot figure out.  Essentially, I
                loop through a directory of pickled files, load them, and run some
                other functions on them.  In every case, each function uses only local
                variables and I even made sure to use `del` on each variable at the
                end of the loop.  However, as the loop progresses the amount of memory
                used steadily increases.
                >
                Do you happen to be using a single Unpickler instance?  If so, change
                it to use a different instance each time.  (If you just use the module-
                level load function you are already using a different instance each
                time.)
                >
                Unpicklers hold a reference to everything they've seen, which prevents
                objects it unpickles from being garbage collected until it is
                collected itself.
                >
                Carl Banks
                >
                Carl,
                >
                Yes, I was using the module-level unpickler.  I changed it with little
                effect.  I guess perhaps this is my misunderstandin g of how GC works.
                For instance, if I have `a = Obj()` and run `a.some_method( )` which
                generates a highly-nested local variable that cannot be easily garbage
                collected, it was my assumption that either (1) completing the method
                call or (2) deleting the object instance itself would automatically
                destroy any variables used by said method.  This does not appear to be
                the case, however.  Even when a variable/object's scope is destroyed,
                it would seem t hat variables/objects created within that scope cannot
                always be reclaimed, depending on their complexity.
                >
                To me, this seems illogical.  I can understand that the GC is
                reluctant to reclaim objects that have many connections to other
                objects and so forth, but once those objects' scopes are gone, why
                doesn't it force a reclaim?
                >
                Are your objects involved in circular references, and do you have any
                objects with a __del__ method?  Normally objects are reclaimed when
                the reference count goes to zero, but if there are cycles then the
                reference count never reaches zero, and they remain alive until the
                generational garbage collector makes a pass to break the cycle.
                However, the generational collector doesn't break cycles that involve
                objects with a __del__method.
                There are some circular references, but these are produced by objects
                created by BeautifulSoup. I try to decompose all of them, but if
                there's one part of the code to blame it's almost certainly this. I
                have no objects with __del__ methods, at least none that I wrote.
                Are you calling any C extensions that might be failing to decref an
                object?  There could be a memory leak.
                Perhaps. Yet another thing to look into.
                Are you keeping a reference around somewhere.  For example, appending
                results to a list, and the result keeps a reference to all of your
                unpickled data for some reason.
                No.
                You know, we can throw out all these scenarios, but these suggestions
                are just common pitfalls.  If it doesn't look like one of these
                things, you're going to have to do your own legwork to help isolate
                what's causing the behavior.  Then if needed you can come back to us
                with more detailed information.
                >
                Start with your original function, and slowly remove functionality
                from it until the bad behavior goes away.  That will give you a clue
                what's causing it.
                I realize this and thank you folks for your patience. I thought
                perhaps there was something simple I was overlooking, but in this case
                it would seem that there are dozens of things outside of my direct
                control that could be causing this, most likely from third-party
                libraries I am using. I will continue to try to debug this on my own
                and see if I can figure anything out. Memory leaks and failing GC and
                so forth are all new concerns for me.

                Thanks Again,

                Tom

                Comment

                Working...