How does "new" work in a loop?

**Barry Kelly** · Jul 8 '06, 12:15 PM

Re: How does "new&qu ot; work in a loop?

Barry Kelly <barry.j.kelly@ gmail.comwrote:

One possible implementation: the C# compiler compiles to IL, and the JIT
produces the actual code. The IL contains ldloc and stloc for locals,
and thus the JIT can make a note of where the last use of a variable
occurs for each basic block. Hence it can produce tables which indicate
which stack locations / registers are valid roots for given instruction
pointer ranges.

There's another reason why it needs this info: so it can adjust all
pointers to relocated objects after a GC has just finished.

-- Barry

--

Entropy Overload

http://barrkel.blogspot.com/

**Göran Andersson** · Jul 8 '06, 04:15 PM

Re: How does "new&qu ot; work in a loop?

Barry Kelly wrote:

Göran Andersson <guffa@guffa.co mwrote:
>

>But you yourself said that:
>>
>"Scope is a lexical concept that exists only at compile time."
>>
>I guess that's not really so, then.

>
I think you're confusing scope with GC reachability.
>
In compiler theory, the word "scope" is overloaded. It can either refer
to (i) the extent of source code for which the identifier is valid ("the
scope of a variable") or (ii) the set of identifiers which are valid for
the current position while parsing the source code ("the variable isn't
in scope").
>
It is implemented with the compiler's symbol table. After the compiler
has finished parsing and has resolved identifiers, scope no longer
exists. The information may be carried forward to the PDB for debugging,
but that's the end of it.
>
GC reachability is the set of rules by which objects in a graph are
determined to be alive or eligible for collection. Reachability is
typically defined by (i) a set of object roots and (ii) the transitive
closure of objects referenced by these roots.
>
The point is that the set of object roots at a particular location in
compiled code does not necessarily correspond exactly with the variables
which are lexically in scope at that location in the original source
code.
>
A variable being lexically "in scope" does not imply that it is GC
reachable.
>
-- Barry
>

No, I'm not at all confused about what scope is. It's a bit surprising
how much the GC knows about it, though. Even if it doesn't know the
"available" scope of the variables, it seems to know the "utilized"
scope, or the active lifetime of the variables (which may be shorter
than the physical lifetime).

Is there any information that supports the theory that the GC knows when
a reference is no longer reachable? Can we trust that it will always be
able to collect objects that won't be used?

Does the scope matter? Will there be a difference between:

for (int i=0; i<1000; i++) {
byte[] buffer = new byte[10000];
}

and:

byte[] buffer;
for (int i=0; i<1000; i++) {
buffer = new byte[10000];
}

Will it with certainty always be able to collect the previous buffer?
Will it never differ from this?

byte[] buffer;
for (int i=0; i<1000; i++) {
buffer = new byte[10000];
buffer = null;
}

**Barry Kelly** · Jul 8 '06, 07:05 PM

Re: How does "new&qu ot; work in a loop?

Göran Andersson <guffa@guffa.co mwrote:

Barry Kelly wrote:

Göran Andersson <guffa@guffa.co mwrote:

But you yourself said that:
>
"Scope is a lexical concept that exists only at compile time."
>
I guess that's not really so, then.

I think you're confusing scope with GC reachability.

>
No, I'm not at all confused about what scope is.

I don't understand how you could have quoted me in this context without
you being mistaken.

It's a bit surprising
how much the GC knows about it, though.

The GC doesn't know anything about scope. That's what I've been trying
to explain to you. The scope information is *LOST* after compile time.
The thing that the GC knows about IS NOT SCOPE.

it seems to know the "utilized"
scope, or the active lifetime of the variables (which may be shorter
than the physical lifetime).

Like I said in the other messages, the JIT needs this info (variable
lifetime - not scope) for enregistering and stack reuse, and so it
calculates it, and the GC needs this data for adjusting pointers after a
collection.

Is there any information that supports the theory that the GC knows when
a reference is no longer reachable?

The JIT can only detect the last use of a given variable definition (in
the Single Static Assignment (SSA) model of "variable definition"). It's
the JIT compiler that is doing the analysis, not the GC.

I recommend that you Google up on:

* Use-Definition chain, Definition-Use chain (ud-chain, du-chain)
* Single Static Assignment (SSA - this is a more modern approach)

Alternatively, you can look up use-def / def-use chains in the Dragon
book (Compilers: Principles, Techniques and Tools, by Aho, Sethi &
Ullman).

Can we trust that it will always be
able to collect objects that won't be used?

No, it can only collect objects which aren't used. For example:

---8<---
object x = new object();
Halting_Problem (); // might not return
Console.WriteLi ne(x);
--->8---

The JIT clearly can't determine that x is dead at the point of calling
Halting_Problem (), so the GC can't collect x.

Does the scope matter?

THE SCOPE DOESN'T EXIST IN IL. The scope is GONE, GONE, GONE, ALL GONE,
after the C# compiler has produced IL. Use ILDASM to decompile an
assembly some time. You will notice that THERE IS NO SCOPE INFORMATION
in the dump. There is only a list of local variables per method.

Will there be a difference between:
>
for (int i=0; i<1000; i++) {
byte[] buffer = new byte[10000];
}
>
and:
>
byte[] buffer;
for (int i=0; i<1000; i++) {
buffer = new byte[10000];
}
>
Will it with certainty always be able to collect the previous buffer?
Will it never differ from this?

It's entirely implementation defined, based on how smart the JIT is at
recognizing that variables are no longer needed. It's a function of the
sophistication of the compiler. It's in the JIT's interest to discover
when variables are no longer needed, because that creates room for other
variables to be enregistered, or stack space minimized.

-- Barry

--

Entropy Overload

http://barrkel.blogspot.com/

**Göran Andersson** · Jul 9 '06, 11:05 AM

Re: How does "new&qu ot; work in a loop?

Barry Kelly wrote:

Göran Andersson <guffa@guffa.co mwrote:
>

>Barry Kelly wrote:

>>Göran Andersson <guffa@guffa.co mwrote:
>>>
>>>But you yourself said that:
>>>>
>>>"Scope is a lexical concept that exists only at compile time."
>>>>
>>>I guess that's not really so, then.
>>I think you're confusing scope with GC reachability.

>No, I'm not at all confused about what scope is.

>
I don't understand how you could have quoted me in this context without
you being mistaken.

No, I can see that. Hopefully it will dawn on you.

>It's a bit surprising
>how much the GC knows about it, though.

>
The GC doesn't know anything about scope. That's what I've been trying
to explain to you. The scope information is *LOST* after compile time.
The thing that the GC knows about IS NOT SCOPE.

If you read more than just one sentence at a time, perhaps you would
understand what I am saying, instead of gettings stuck on a single word.

>it seems to know the "utilized"
>scope, or the active lifetime of the variables (which may be shorter
>than the physical lifetime).

>
Like I said in the other messages, the JIT needs this info (variable
lifetime - not scope) for enregistering and stack reuse, and so it
calculates it, and the GC needs this data for adjusting pointers after a
collection.

No, it doesn't. It uses the same information for that as it did to
determine which objects can be collected. If it used different
information in the phases, it would mess up the references.

>Can we trust that it will always be
>able to collect objects that won't be used?

>
No, it can only collect objects which aren't used. For example:
>
---8<---
object x = new object();
Halting_Problem (); // might not return
Console.WriteLi ne(x);
--->8---
>
The JIT clearly can't determine that x is dead at the point of calling
Halting_Problem (), so the GC can't collect x.

Well, that is obvious, isn't it?

Ok, let me rephrase the question a bit more precise:

Can we trust that it will always be able to collect objects that
possibly can't be used later in the execution?

>Does the scope matter?

>
THE SCOPE DOESN'T EXIST IN IL. The scope is GONE, GONE, GONE, ALL GONE,
after the C# compiler has produced IL. Use ILDASM to decompile an
assembly some time. You will notice that THERE IS NO SCOPE INFORMATION
in the dump. There is only a list of local variables per method.

Will you PLEASE STOP SHOUTING!

I wasn't asking if the scope was existing in the IL code. I was asking
if the scope mattered.

If you please try to look beyond your hangup on this word, maybe you
could try to understand the question?

**Jon Skeet [C# MVP]** · Jul 9 '06, 09:25 PM

Re: How does "new&qu ot; work in a loop?

Göran Andersson <guffa@guffa.co mwrote:

<snip>

Ok, let me rephrase the question a bit more precise:
>
Can we trust that it will always be able to collect objects that
possibly can't be used later in the execution?

On the current implementations , in release mode? I believe so, in
simple cases. The JIT doesn't do complex analysis, so if you had:

bool first=true;

object bigObject = (...);

for (int i=0; i < 100000; i++)
{
if (first)
{
useObject (bigObject);
first = false;
}

// Code not using bigObject
}

then the JIT wouldn't work out that first could never become true after
the first iteration and bigObject would therefore never be used after
that point. That's one of the few situations where it might make sense
to set a local variable to null.

I looked at the CLR spec and I found *very* little about garbage
collection. No guarantees about this kind of thing at all. Normally I'm
a spec hound in terms of only coding to the spec, but the ramifications
of only trusting to the spec in this case are so horrible that I
believe it makes more sense to go with what happens in reality.

--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

**John J. Hughes II** · Jul 10 '06, 12:55 PM

Re: How does "new&qu ot; work in a loop?

Jon,

Thanks for the debate, at this point I have not changed my mind but it does
give me food for thought. The next time I have some slow time I will
research the matter further taking you points and other points in this
thread into consideration.

But as a last comment: In one of you other messages in this thread you made
the following comment:

then the JIT wouldn't work out that first could never become true after
the first iteration and bigObject would therefore never be used after
that point. That's one of the few situations where it might make sense
to set a local variable to null.

Since I have some rather long running threads which create some long lived
variables it's possible that setting some of them to null cause memory to be
returned which was being held before.

Regards,
John

"Jon Skeet [C# MVP]" <skeet@pobox.co mwrote in message
news:MPG.1f18df ac673567f098d2d d@msnews.micros oft.com...

John J. Hughes II <no@invalid.com wrote:

>I do agree the memory is not marked... poor verbiage on my part.
>>
>I don't think your example really proves anything since you are calling
>garbage collection.

>
Well, I can make an example which ends up garbage collecting due to
other activity if you want. It'll do the same thing. Just change the
call to GC.Collect() to
>
for (int i=0; i < 10000000; i++)
{
byte[] b = new byte[1000];
}
>
and you'll see the same thing.
>

>I have no argument that when GC runs it will clean up
>memory that is not being used. I personally believe that all references
>to
>a variable are not removed in a timely fashion unless you tell them too
>be.
>The key here is timely.

>
It's not a matter of the reference being removed. It's a case of the
release-mode garbage collector ignoring variables which are no longer
relevant.
>

>Again as I have said I had a problem with memory creep, the only change I
>did was add using statements the problem slowed down but was not
>eliminated.

>
And *that* can have a significant impact - because many classes which
implement IDisposable also have finalizers which are suppressed when
you call Dispose. That really *does* affect when the memory can be
freed, and can make a big difference.
>

>The second change was to add value=null statement (shotgun blast style)
>and
>the problem went away. Since it was a production system I used great
>care
>to change as little as possible so I really don't think I fixed any other
>problems.

>
I'm afraid I still don't believe you saw what you claimed to be seeing
- not on a production system. You *would* see improvements in a
debugger, but that's a different matter.
>

>If at some point in the near future if I can give you code which proves
>my
>point I will be happy too but the last time I had the problem it required
>a
>system running full blown for 14 days on average.
>>
>That being said I may have gotten my head wet and decided it was raining
>when it was snowing. I decide to use an umbrella and my head it not wet
>now.

>
I really suspect you were mistaken, I'm afraid.
>
--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

**Tony Sinclair** · Jul 10 '06, 08:35 PM

Re: How does "new&qu ot; work in a loop?

On Fri, 07 Jul 2006 11:09:58 -0700, Tony Sinclair <no@spam.comwro te:

>My sincere gratitude to everyone who responded.

I'm afraid the debate on my question sailed over my head quite some
time ago, but in case anyone is interested, I can give you the actual
results of my program.

I used essentially the same code in my OP on an internet file that I
downloaded, which comprised 50+ segments of 24 MB each, so each time
through my loop, I was allocating a 24 MB buffer as "new." (I do
intended to incorporate the improvements suggested, especially the
using statement, but I haven't gotten to it yet.) I watched the
memory data with the MS Task manager as I started and ran my program.

When it started, the program quickly grabbed an extra 24MB from the
memory pool. It never went more than 1MB above that for the rest of
the run, and the file assembled perfectly. There was no shortage of
memory at the time (I have 1GB of physical memory, and Task Manager
showed about 2500MB of virtual memory available. When I started my
program, about 700MB of this was committed).

I conclude that even in a short loop, with no shortage of memory, and
with no hints from me that might help speed it up, the GC acts quickly
enough to dispose of the old buffer as soon as it's unneeded. I note
that even though the loop is short in lines, the CPU probably has a
lot of time on its hands while I am writing to the output buffer. I
might try this test again with a high-CPU task running in the
background and see what happens.

Thanks again to everyone for their help.

How does "new" work in a loop?

Comment

Comment

Comment

Comment

Comment

Comment

Comment