How does "new" work in a loop?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Barry Kelly

    #46
    Re: How does "new&qu ot; work in a loop?

    Barry Kelly <barry.j.kelly@ gmail.comwrote:
    One possible implementation: the C# compiler compiles to IL, and the JIT
    produces the actual code. The IL contains ldloc and stloc for locals,
    and thus the JIT can make a note of where the last use of a variable
    occurs for each basic block. Hence it can produce tables which indicate
    which stack locations / registers are valid roots for given instruction
    pointer ranges.
    There's another reason why it needs this info: so it can adjust all
    pointers to relocated objects after a GC has just finished.

    -- Barry

    --

    Comment

    • Göran Andersson

      #47
      Re: How does &quot;new&qu ot; work in a loop?

      Barry Kelly wrote:
      Göran Andersson <guffa@guffa.co mwrote:
      >
      >But you yourself said that:
      >>
      >"Scope is a lexical concept that exists only at compile time."
      >>
      >I guess that's not really so, then.
      >
      I think you're confusing scope with GC reachability.
      >
      In compiler theory, the word "scope" is overloaded. It can either refer
      to (i) the extent of source code for which the identifier is valid ("the
      scope of a variable") or (ii) the set of identifiers which are valid for
      the current position while parsing the source code ("the variable isn't
      in scope").
      >
      It is implemented with the compiler's symbol table. After the compiler
      has finished parsing and has resolved identifiers, scope no longer
      exists. The information may be carried forward to the PDB for debugging,
      but that's the end of it.
      >
      GC reachability is the set of rules by which objects in a graph are
      determined to be alive or eligible for collection. Reachability is
      typically defined by (i) a set of object roots and (ii) the transitive
      closure of objects referenced by these roots.
      >
      The point is that the set of object roots at a particular location in
      compiled code does not necessarily correspond exactly with the variables
      which are lexically in scope at that location in the original source
      code.
      >
      A variable being lexically "in scope" does not imply that it is GC
      reachable.
      >
      -- Barry
      >
      No, I'm not at all confused about what scope is. It's a bit surprising
      how much the GC knows about it, though. Even if it doesn't know the
      "available" scope of the variables, it seems to know the "utilized"
      scope, or the active lifetime of the variables (which may be shorter
      than the physical lifetime).

      Is there any information that supports the theory that the GC knows when
      a reference is no longer reachable? Can we trust that it will always be
      able to collect objects that won't be used?

      Does the scope matter? Will there be a difference between:

      for (int i=0; i<1000; i++) {
      byte[] buffer = new byte[10000];
      }

      and:

      byte[] buffer;
      for (int i=0; i<1000; i++) {
      buffer = new byte[10000];
      }

      Will it with certainty always be able to collect the previous buffer?
      Will it never differ from this?

      byte[] buffer;
      for (int i=0; i<1000; i++) {
      buffer = new byte[10000];
      buffer = null;
      }

      Comment

      • Barry Kelly

        #48
        Re: How does &quot;new&qu ot; work in a loop?

        Göran Andersson <guffa@guffa.co mwrote:
        Barry Kelly wrote:
        Göran Andersson <guffa@guffa.co mwrote:
        But you yourself said that:
        >
        "Scope is a lexical concept that exists only at compile time."
        >
        I guess that's not really so, then.
        I think you're confusing scope with GC reachability.
        >
        No, I'm not at all confused about what scope is.
        I don't understand how you could have quoted me in this context without
        you being mistaken.
        It's a bit surprising
        how much the GC knows about it, though.
        The GC doesn't know anything about scope. That's what I've been trying
        to explain to you. The scope information is *LOST* after compile time.
        The thing that the GC knows about IS NOT SCOPE.
        it seems to know the "utilized"
        scope, or the active lifetime of the variables (which may be shorter
        than the physical lifetime).
        Like I said in the other messages, the JIT needs this info (variable
        lifetime - not scope) for enregistering and stack reuse, and so it
        calculates it, and the GC needs this data for adjusting pointers after a
        collection.
        Is there any information that supports the theory that the GC knows when
        a reference is no longer reachable?
        The JIT can only detect the last use of a given variable definition (in
        the Single Static Assignment (SSA) model of "variable definition"). It's
        the JIT compiler that is doing the analysis, not the GC.

        I recommend that you Google up on:

        * Use-Definition chain, Definition-Use chain (ud-chain, du-chain)
        * Single Static Assignment (SSA - this is a more modern approach)

        Alternatively, you can look up use-def / def-use chains in the Dragon
        book (Compilers: Principles, Techniques and Tools, by Aho, Sethi &
        Ullman).
        Can we trust that it will always be
        able to collect objects that won't be used?
        No, it can only collect objects which aren't used. For example:

        ---8<---
        object x = new object();
        Halting_Problem (); // might not return
        Console.WriteLi ne(x);
        --->8---

        The JIT clearly can't determine that x is dead at the point of calling
        Halting_Problem (), so the GC can't collect x.
        Does the scope matter?
        THE SCOPE DOESN'T EXIST IN IL. The scope is GONE, GONE, GONE, ALL GONE,
        after the C# compiler has produced IL. Use ILDASM to decompile an
        assembly some time. You will notice that THERE IS NO SCOPE INFORMATION
        in the dump. There is only a list of local variables per method.
        Will there be a difference between:
        >
        for (int i=0; i<1000; i++) {
        byte[] buffer = new byte[10000];
        }
        >
        and:
        >
        byte[] buffer;
        for (int i=0; i<1000; i++) {
        buffer = new byte[10000];
        }
        >
        Will it with certainty always be able to collect the previous buffer?
        Will it never differ from this?
        It's entirely implementation defined, based on how smart the JIT is at
        recognizing that variables are no longer needed. It's a function of the
        sophistication of the compiler. It's in the JIT's interest to discover
        when variables are no longer needed, because that creates room for other
        variables to be enregistered, or stack space minimized.

        -- Barry

        --

        Comment

        • Göran Andersson

          #49
          Re: How does &quot;new&qu ot; work in a loop?

          Barry Kelly wrote:
          Göran Andersson <guffa@guffa.co mwrote:
          >
          >Barry Kelly wrote:
          >>Göran Andersson <guffa@guffa.co mwrote:
          >>>
          >>>But you yourself said that:
          >>>>
          >>>"Scope is a lexical concept that exists only at compile time."
          >>>>
          >>>I guess that's not really so, then.
          >>I think you're confusing scope with GC reachability.
          >No, I'm not at all confused about what scope is.
          >
          I don't understand how you could have quoted me in this context without
          you being mistaken.
          No, I can see that. Hopefully it will dawn on you.
          >It's a bit surprising
          >how much the GC knows about it, though.
          >
          The GC doesn't know anything about scope. That's what I've been trying
          to explain to you. The scope information is *LOST* after compile time.
          The thing that the GC knows about IS NOT SCOPE.
          If you read more than just one sentence at a time, perhaps you would
          understand what I am saying, instead of gettings stuck on a single word.
          >it seems to know the "utilized"
          >scope, or the active lifetime of the variables (which may be shorter
          >than the physical lifetime).
          >
          Like I said in the other messages, the JIT needs this info (variable
          lifetime - not scope) for enregistering and stack reuse, and so it
          calculates it, and the GC needs this data for adjusting pointers after a
          collection.
          No, it doesn't. It uses the same information for that as it did to
          determine which objects can be collected. If it used different
          information in the phases, it would mess up the references.
          >Can we trust that it will always be
          >able to collect objects that won't be used?
          >
          No, it can only collect objects which aren't used. For example:
          >
          ---8<---
          object x = new object();
          Halting_Problem (); // might not return
          Console.WriteLi ne(x);
          --->8---
          >
          The JIT clearly can't determine that x is dead at the point of calling
          Halting_Problem (), so the GC can't collect x.
          Well, that is obvious, isn't it?

          Ok, let me rephrase the question a bit more precise:

          Can we trust that it will always be able to collect objects that
          possibly can't be used later in the execution?
          >Does the scope matter?
          >
          THE SCOPE DOESN'T EXIST IN IL. The scope is GONE, GONE, GONE, ALL GONE,
          after the C# compiler has produced IL. Use ILDASM to decompile an
          assembly some time. You will notice that THERE IS NO SCOPE INFORMATION
          in the dump. There is only a list of local variables per method.
          Will you PLEASE STOP SHOUTING!

          I wasn't asking if the scope was existing in the IL code. I was asking
          if the scope mattered.

          If you please try to look beyond your hangup on this word, maybe you
          could try to understand the question?

          Comment

          • Jon Skeet [C# MVP]

            #50
            Re: How does &quot;new&qu ot; work in a loop?

            Göran Andersson <guffa@guffa.co mwrote:

            <snip>
            Ok, let me rephrase the question a bit more precise:
            >
            Can we trust that it will always be able to collect objects that
            possibly can't be used later in the execution?
            On the current implementations , in release mode? I believe so, in
            simple cases. The JIT doesn't do complex analysis, so if you had:

            bool first=true;

            object bigObject = (...);

            for (int i=0; i < 100000; i++)
            {
            if (first)
            {
            useObject (bigObject);
            first = false;
            }

            // Code not using bigObject
            }

            then the JIT wouldn't work out that first could never become true after
            the first iteration and bigObject would therefore never be used after
            that point. That's one of the few situations where it might make sense
            to set a local variable to null.


            I looked at the CLR spec and I found *very* little about garbage
            collection. No guarantees about this kind of thing at all. Normally I'm
            a spec hound in terms of only coding to the spec, but the ramifications
            of only trusting to the spec in this case are so horrible that I
            believe it makes more sense to go with what happens in reality.

            --
            Jon Skeet - <skeet@pobox.co m>
            http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
            If replying to the group, please do not mail me too

            Comment

            • John J. Hughes II

              #51
              Re: How does &quot;new&qu ot; work in a loop?

              Jon,

              Thanks for the debate, at this point I have not changed my mind but it does
              give me food for thought. The next time I have some slow time I will
              research the matter further taking you points and other points in this
              thread into consideration.

              But as a last comment: In one of you other messages in this thread you made
              the following comment:
              then the JIT wouldn't work out that first could never become true after
              the first iteration and bigObject would therefore never be used after
              that point. That's one of the few situations where it might make sense
              to set a local variable to null.
              Since I have some rather long running threads which create some long lived
              variables it's possible that setting some of them to null cause memory to be
              returned which was being held before.

              Regards,
              John

              "Jon Skeet [C# MVP]" <skeet@pobox.co mwrote in message
              news:MPG.1f18df ac673567f098d2d d@msnews.micros oft.com...
              John J. Hughes II <no@invalid.com wrote:
              >I do agree the memory is not marked... poor verbiage on my part.
              >>
              >I don't think your example really proves anything since you are calling
              >garbage collection.
              >
              Well, I can make an example which ends up garbage collecting due to
              other activity if you want. It'll do the same thing. Just change the
              call to GC.Collect() to
              >
              for (int i=0; i < 10000000; i++)
              {
              byte[] b = new byte[1000];
              }
              >
              and you'll see the same thing.
              >
              >I have no argument that when GC runs it will clean up
              >memory that is not being used. I personally believe that all references
              >to
              >a variable are not removed in a timely fashion unless you tell them too
              >be.
              >The key here is timely.
              >
              It's not a matter of the reference being removed. It's a case of the
              release-mode garbage collector ignoring variables which are no longer
              relevant.
              >
              >Again as I have said I had a problem with memory creep, the only change I
              >did was add using statements the problem slowed down but was not
              >eliminated.
              >
              And *that* can have a significant impact - because many classes which
              implement IDisposable also have finalizers which are suppressed when
              you call Dispose. That really *does* affect when the memory can be
              freed, and can make a big difference.
              >
              >The second change was to add value=null statement (shotgun blast style)
              >and
              >the problem went away. Since it was a production system I used great
              >care
              >to change as little as possible so I really don't think I fixed any other
              >problems.
              >
              I'm afraid I still don't believe you saw what you claimed to be seeing
              - not on a production system. You *would* see improvements in a
              debugger, but that's a different matter.
              >
              >If at some point in the near future if I can give you code which proves
              >my
              >point I will be happy too but the last time I had the problem it required
              >a
              >system running full blown for 14 days on average.
              >>
              >That being said I may have gotten my head wet and decided it was raining
              >when it was snowing. I decide to use an umbrella and my head it not wet
              >now.
              >
              I really suspect you were mistaken, I'm afraid.
              >
              --
              Jon Skeet - <skeet@pobox.co m>
              http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
              If replying to the group, please do not mail me too

              Comment

              • Tony Sinclair

                #52
                Re: How does &quot;new&qu ot; work in a loop?

                On Fri, 07 Jul 2006 11:09:58 -0700, Tony Sinclair <no@spam.comwro te:
                >My sincere gratitude to everyone who responded.
                I'm afraid the debate on my question sailed over my head quite some
                time ago, but in case anyone is interested, I can give you the actual
                results of my program.

                I used essentially the same code in my OP on an internet file that I
                downloaded, which comprised 50+ segments of 24 MB each, so each time
                through my loop, I was allocating a 24 MB buffer as "new." (I do
                intended to incorporate the improvements suggested, especially the
                using statement, but I haven't gotten to it yet.) I watched the
                memory data with the MS Task manager as I started and ran my program.

                When it started, the program quickly grabbed an extra 24MB from the
                memory pool. It never went more than 1MB above that for the rest of
                the run, and the file assembled perfectly. There was no shortage of
                memory at the time (I have 1GB of physical memory, and Task Manager
                showed about 2500MB of virtual memory available. When I started my
                program, about 700MB of this was committed).

                I conclude that even in a short loop, with no shortage of memory, and
                with no hints from me that might help speed it up, the GC acts quickly
                enough to dispose of the old buffer as soon as it's unneeded. I note
                that even though the loop is short in lines, the CPU probably has a
                lot of time on its hands while I am writing to the output buffer. I
                might try this test again with a high-CPU task running in the
                background and see what happens.

                Thanks again to everyone for their help.

                Comment

                Working...