When is "volatile" used instead of "lock" ?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jon Skeet [C# MVP]

    #46
    Re: When is "volatile& quot; used instead of "lock&quot ; ?

    Peter Ritchie [C# MVP] <PRSoCo@newsgro ups.nospamwrote :
    It specifies how the system as a whole must behave: given a certain
    piece of IL, there are valid behaviours and invalid behaviours. If you
    can observe that a variable has been read before a lock has been
    acquired and that value has then been used (without rereading) after
    the lock has been acquired, then the CLR has a bug, pure and simple.
    It violates the spec in a pretty clear-cut manner.
    >
    That's not the same thing as saying use of Monitor.Enter and Monitor.Exit
    are what are used to maintain that behaviour.
    Well, without that guarantee for Monitor.Enter/Monitor.Exit I don't
    believe it would be possible to write thread-safe code.
    In 335 section 12.6.5 has "[calling Monitor.Enter]...shall implicitly
    perform a volatile read operation..." says to me that one volatile operation
    is performed. And "[calling Monitor.Exit]...shall implicitly perform a
    volatile write operation." A write to what? As in this snippet:
    Monitor.Enter(t his.locker)
    Trace.WriteLine (this.number);
    Monitor.Exit(th is.locker)
    It doesn't matter what the volatile write is to - it's the location in
    the CIL that matters. No other writes can be moved (logically) past
    that write, no matter what they're writing to.
    It only casually mentions "See [section] 12.6.7" which discussions acquire
    and release semantics in the context of the volatile prefix (assuming the C#
    volatile keyword is what causes generation of this prefix).
    I don't see what's "casual" about it, nor why you should believe that
    12.6.7 should only apply to instructions with the "volatile." prefix.
    The section starts off by mentioning the prefix, but then talks in
    terms of volatile reads and volatile writes - which is the same terms
    as 12.6.5 talks in.
    12.6.7 only
    mentions "the read" or "the write" it does not mention anything about a set
    or block of read/writes. I think you've made quite a leap getting to: code
    between Monitor.Enter and Monitor.Exit has volatility guarantees.
    I really, really haven't. I think the problem is the one I talk about
    above - you're assuming that *what* is written to matters, rather than
    just the location of a volatile write in the CIL stream. Look at the
    guarantee provided by the spec:

    <quote>
    A volatile read has =3Facquire semantics=3F meaning that the read is
    guaranteed to occur prior to any references to memory that occur after
    the read instruction in the CIL instruction sequence. A volatile write
    has =3Frelease semantics=3F meaning that the write is guaranteed to happen
    after any memory references prior to the write instruction in the CIL
    instruction sequence.
    </quote>

    Where does that say anything about it being dependent on what is being
    written or what is being read? It just talks about reads and writes
    being moved in terms of their position in the CIL sequence.

    So, no write that occurs before the call to Monitor.Exit in the IL can
    be moved beyond the call to Monitor.Exit in the memory model, and no
    read that occurs after Monitor.Enter in the IL can be moved to earlier
    than Monitor.Enter in the memory model. That's all that's required for
    thread safety.
    Writing a sample "that works" is meaningless to me. I've dealt with
    thousands of snippets of code "that worked" in certain circumstances (usually
    resulting in me fixing them to "really work").
    I'm not talking about certain circumstances - I'm talking about
    *guarantees* provided by the CLI spec.

    I'm saying that I can write code which doesn't use volatile but which
    is *guaranteed* to work. I believe you won't be able to provide any
    exmaple of how it could fail without the CLI spec itself being
    violated.
    You're free to interpret the spec any way you want, and if you've gotten
    information from Chris or Vance, you've got their interpretation of the spec.
    and, best case, you've got information specific to Microsoft's JIT/IL
    Compilers.
    Well, I've got information specific to the .NET 2.0 memory model (which
    is stronger than the CLI specified memory model) elsewhere.

    However, I feel pretty comfortable in having the interpretation experts
    who possibly contributed to the spec or at least have direct contact
    with those who wrote it.
    Based upon the spec, I *know* that this is safe code:
    public volatile int number;
    public void DoSomething() {
    this.Number = 1;
    }
    >
    This is equally as safe:
    public volatile int number;
    public void DoSomething() {
    lock(locker) {
    this.Number = 1;
    }
    }
    >
    I think it's open to interpretation of the spec whether this is safe:
    public int number;
    public void DoSomething() {
    lock(locker) {
    this.Number = 1;
    }
    }
    Well, this is why I suggested that I post a complete program - then you
    could suggest ways in which it could go wrong, and I think I'd be able
    to defend it in fairly clear-cut terms.
    ...it might be safe in Microsoft's implementations ; but that's not open
    information and I don't think it's due to Monitor.Enter/Monitor.Exit.
    I *hope* we won't just have to agree to disagree, but I realise that
    may be the outcome :(
    I don't see what the issue with volatile is, if you're not using "volatile"
    for synchronization . Worst case with this:
    public volatile int number;
    public void DoSomething() {
    this.Number = 1;
    }
    you've explicitly stated your volatility usage/expectation: more readable,
    makes no assumptions...
    It implies that without volatility you've got problems - which you
    haven't (provided you use locking correctly). This means you can use a
    single way of working for *all* types, regardless of whether you can
    use the volatile modifier on them.
    Whereas:
    public int number;
    public void DoSomething() {
    lock(locker) {
    this.Number = 1;
    }
    }
    >
    ...best case, this isn't as readable because it uses implicit volatility
    side-effects.
    If you're not used to that being the idiom, you're right. However, if
    I'm writing thread-safe code (most types don't need to be thread-safe)
    I document what lock any shared data comes under. I can rarely get away
    with a single operation anyway.

    Consider the simple change from this:

    this.number = 1;

    to this:

    this.number++;

    With volatile, your code is now broken - and it's not obvious, and
    probably won't show up in testing. With lock, it's not broken.
    What happens with the following code?
    internal class Tester {
    private Object locker = new Object();
    private Random random = new Random();
    public int number:
    >
    public Tester()
    {
    DoWork(false);
    }
    >
    public void UpdateNumber() {
    Monitor.Enter(l ocker);
    DoWork(true);
    }
    What happens here is that I don't let this method go through code
    review. There have to be *very* good reasons not to use lock{}, and in
    those cases there would almost always still be a try/finally.

    I wouldn't consider using volatile just to avoid the possibility of
    code like this (which I've never seen in production, btw).

    private void DoWork(Boolean doOut) {
    this.number = random.Next();
    if(doOut)
    {
    switch(random.N ext(1))
    {
    case 0:
    Out1();
    break;
    case 1:
    Out2();
    break;
    }
    }
    }
    >
    private void Out1() {
    Montior.Exit(th is.locker);
    }
    >
    private void Out2() {
    Monitor.Exit(th is.locker);
    }
    }
    >
    ...clearly there isn't enough information merely from the existence
    Monitor.Enter and Monitor.Exit to maintain those guarantees.
    It's the other way round - the JIT compiler doesn't have enough
    information to perform certain optimisations, simply because it can't
    know whether or not Monitor.Exit will be called.

    Assuming the CLR follows the spec, it can't move the write to number to
    after the call to random.Next() - because that call to random.Next()
    may involve releasing a lock, and it may involve a write.

    Now, I agree that it really limits the scope of optimisation for the
    JIT - but that's what the CLI spec says.
    Again you're treating atomicity as almost interchangeable with
    volatility,
    <snip>
    No, I'm not. I said you don't need to synchronize an atomic invariant but
    you still need to account for its volatility (by declaring it volatile). I
    didn't say volatility was a secondary concern, I said it needs to be
    accounted for equally. I was implying that using the "lock" keyword is not
    as clear in terms of volatility assumptions/needs as is the "volatile"
    keyword. If a I read some code that uses "lock", I can't assume the author
    did that for volatility reasons and not just synchronization reasons; whereas
    if she had put "volatile" on a field, I know for sure why she put that there.
    I use lock when I'm going to use shared data. When I use shared data, I
    want to make sure I don't ignore previous changes - hence it needs to
    be volatile.

    Volatility is a natural consequence of wanting exclusive access to a
    shared variable - which is why exactly the same strategy works in Java,
    by the way (which has a slightly different memory model). Without the
    guarantees given by the CLI spec, having a lock would be pretty much
    useless.
    This *is* guaranteed, it's the normal way of working in the framework
    (as Willy said, look for volatile fields in the framework itself)
    >
    Which ones? Like Hashtable.versi on or StringBuilder.m _StringValue?
    Yup, there are a few - but I believe there are far more places which
    use the natural (IMO) way of sharing data via exclusive access, and
    taking account the volatility that naturally provides.

    --
    Jon Skeet - <skeet@pobox.co m>
    http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
    If replying to the group, please do not mail me too

    Comment

    • Willy Denoyette [MVP]

      #47
      Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

      "Jon Skeet [C# MVP]" <skeet@pobox.co mwrote in message
      news:1182265936 .322285.14370@q 75g2000hsh.goog legroups.com...
      On Jun 19, 3:11 pm, Peter Ritchie [C# MVP] <PRS...@newsgro ups.nospam>
      wrote:
      >I guess we'll just have to disagree on a few things, for the reasons I've
      >already stated. I don't see much point in going back and forth saying
      >the
      >same things...
      >
      I should say (and I've only just remembered) that a few years ago I
      was unsure where the safety came from, and I mailed someone (Vance
      Morrison? Chris Brumme?) who gave me the explanation I've been giving
      you.
      >
      >With regard to runtime volatile read/writes and acquire/release semantics
      >of
      >Monitor.Ente r and Monitor.Exit we can agree.
      >>
      >I don't agree that anything specified in either 334 or 335 covers all
      >levels
      >of potential compile-time class member JIT/IL compiler optimizations.
      >
      It specifies how the system as a whole must behave: given a certain
      piece of IL, there are
      >
      >I don't agree that "int number; void UpdateNumber(){ lock(locker){
      >number++;}}" is equally as safe as "volatile int number; void
      >UpdateNumber() {
      >number++; }"
      >
      I agree - the version without the lock is *unsafe*. Two threads could
      both read, then both increment, then both store in the latter case.
      With the lock, everything is guaranteed to work.
      >
      >With the following Monitor.Enter/Exit IL, for example:
      >
      <snip>
      >
      >...what part of that IL tells the JIT/IL compiler that Tester.number
      >specifically should be treated differently--where lines commented // *
      >are
      >the only lines distinct to usage of Monitor.Enter/Exit?
      >
      The fact that it knows Monitor.Enter is called, so the load (in the
      logical memory model) cannot occur before Monitor.Enter. Likewise it
      knows that Monitor.Exit is called, so the store can't occur after
      Monitor.Exit. If it calls another method which *might* call
      Monitor.Enter/Exit, it likewise can't move the reads/writes as that
      would violate the spec.
      >
      >...where an IL compiler is given ample amounts of information that
      >Tester.numbe r should be treated differently.
      >
      It's being given ample
      >
      >I don't think it's safe, readable, or future friendly to utilize syntax
      >strictly for their secondary consequences (using Monitor.Enter/Exit not
      >for
      >synchronizatio n but for acquire/release semantics. As in the above line
      >where modification of an int is already atomic; "synchronizatio n" is
      >irrelevant), even if they were effectively identical to another syntax.
      >Yes,
      >if you've got a non-atomic invariant you still have to synchronize (with
      >lock, etc.)... but volatility is different and needs to be accounted for
      >equally as much as thread-safety.
      >
      Again you're treating atomicity as almost interchangeable with
      volatility, when they're certainly not. Synchronization is certainly
      relevant whether or not writes are atomic. Atomicity just states that
      you won't see a "half way" state; volatility state that you will see
      the "most recent" value. That's a huge difference.
      >
      The volatility is certainly not just a "secondary consequence" - it's
      vital to the usefulness of locking.
      >
      Consider a type which isn't thread-aware - in other words, nothing is
      marked as volatile, but it also has no thread-affinity. That should be
      the most common kind of type, IMO. You can't retrospectively mark the
      fields as being volatile, but you *do* want to ensure that if you use
      objects of the type carefully (i.e. always within a consistent lock)
      you won't get any unexpected behaviour. Due to the guarantees of
      locking, you're safe. Otherwise, you wouldn't be. Without that
      guarantee, you'd be entirely at the mercy of type authors for *all*
      types that *might* be used in a multi-threaded environment making all
      their fields volatile.
      >
      Further evidence that it's not just a secondary effect, but one which
      certainly *can* be relied on: there's no other thread-safe way of
      using doubles. They *can't* be marked as volatile - do you really
      believe that MS would build .NET in such a way that wouldn't let you
      write correct code to guarantee that you see the most recent value of
      a double, rather than one cached in a register somewhere?
      >
      This *is* guaranteed, it's the normal way of working in the framework
      (as Willy said, look for volatile fields in the framework itself) and
      it's perfectly fine to rely on it.
      >

      I see that my remark about the FCL was too strong worded, I didn't mean to
      say that "volatile" fields were not used at all in the FCL, sure they are
      used, but only in a context where the author wanted to guarantee that a
      field (most often a bool) access had acquire/release semantics and would not
      be reordered, not in the context of a locked region. Also note that a large
      part of the FCL was written against v1.0 (targeting X86 only) at a time
      there was no VolatileRead and long before the Interlocked Class was
      introduced.
      The latest bits in the FCL use more often Interlocked and VolatileXXX
      operations than than applying the volatile modifier.
      Also note that volatile does not imply a memory barrier, while lock,
      Interlocked ops. and VolatileXXX do effectively imply a MemoryBarrier. The
      way the barrier is implemented is platform specific, on X86 and X64 a full
      barrier is raised, while on IA64 it depends on the operation.


      Willy.

      Comment

      • =?Utf-8?B?UGV0ZXIgUml0Y2hpZSBbQyMgTVZQXQ==?=

        #48
        Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

        "Jon Skeet [C# MVP]" wrote:
        I'm saying that I can write code which doesn't use volatile but which
        is *guaranteed* to work. I believe you won't be able to provide any
        exmaple of how it could fail without the CLI spec itself being
        violated.
        Actually, I'm having a hard time getting the JIT to optimize *any* member
        fields, even with lack of locking. Local variables seem to optimized into
        registers easily, but not member fields...

        If I could get an optimization of a member field I believe I would be able
        show an example.

        For example:
        private Random random = new Random();
        public int Method()
        {
        int result = 0;
        for(int i = 0; i < this.random.Nex t(); ++i)
        {
        result += 10;
        }
        return result;
        }

        ebx is used for result (and edi for i) while in the loop; but with:
        private Random random = new Random();
        private int number;
        public int Method()
        {
        for(int i = 0; i < this.random.Nex t(); ++i)
        {
        this.number += 10;
        }
        return this.number
        }

        ....number is always accessed directly and never optimized to a register. I
        think I'd find the same thing with re-ordering.

        Comment

        • Jon Skeet [C# MVP]

          #49
          Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

          Peter Ritchie [C# MVP] <PRSoCo@newsgro ups.nospamwrote :
          "Jon Skeet [C# MVP]" wrote:
          I'm saying that I can write code which doesn't use volatile but which
          is *guaranteed* to work. I believe you won't be able to provide any
          exmaple of how it could fail without the CLI spec itself being
          violated.
          Actually, I'm having a hard time getting the JIT to optimize *any* member
          fields, even with lack of locking. Local variables seem to optimized into
          registers easily, but not member fields...
          I can well believe that, just as an easy way of fulfilling the spec.
          If I could get an optimization of a member field I believe I would be able
          show an example.
          Well, rather than arguing from a particular implementation (which, as
          you've said before, may be rather stricter than the spec requires) I'd
          be perfectly happy arguing from the spec itself. Then at least if there
          are precise examples where I interpret the spec to say one thing and
          you interpret it a different way, we'll know exactly where our
          disagreement is.

          <snip code>

          --
          Jon Skeet - <skeet@pobox.co m>
          http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
          If replying to the group, please do not mail me too

          Comment

          • Willy Denoyette [MVP]

            #50
            Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

            "Peter Ritchie [C# MVP]" <PRSoCo@newsgro ups.nospamwrote in message
            news:B3CC9E00-7F15-4259-A12D-C485DD525C02@mi crosoft.com...
            "Jon Skeet [C# MVP]" wrote:
            >I'm saying that I can write code which doesn't use volatile but which
            >is *guaranteed* to work. I believe you won't be able to provide any
            >exmaple of how it could fail without the CLI spec itself being
            >violated.
            Actually, I'm having a hard time getting the JIT to optimize *any* member
            fields, even with lack of locking. Local variables seem to optimized into
            registers easily, but not member fields...
            >
            If I could get an optimization of a member field I believe I would be able
            show an example.
            >
            For example:
            private Random random = new Random();
            public int Method()
            {
            int result = 0;
            for(int i = 0; i < this.random.Nex t(); ++i)
            {
            result += 10;
            }
            return result;
            }
            >
            ebx is used for result (and edi for i) while in the loop; but with:
            private Random random = new Random();
            private int number;
            public int Method()
            {
            for(int i = 0; i < this.random.Nex t(); ++i)
            {
            this.number += 10;
            }
            return this.number
            }
            >
            ...number is always accessed directly and never optimized to a register.
            I
            think I'd find the same thing with re-ordering.



            In your sample, the member field has to be read from the object location in
            the GC heap, and after the addition it has to be written back to the same
            location.
            The write "this.numbe r +=.... "must be a "store acquire" to fulfill the
            rules imposed by the CLR memory model. Note that this model derives from the
            ECMA model!

            The assembly code of the core part of the loop, looks something like this
            (your mileage may vary):

            mov eax,dword ptr [ebp-10h]
            add dword ptr [eax+8],0Ah

            here the object reference of the current instance (this) is loaded from
            [ebp-10h] and stored in eax, after which 0Ah is added to the location of the
            'number' field [eax+8].

            Question is what else do you expect to optimize any further, and what are
            you expecting to illustrate?

            Willy.

            Comment

            • Ben Voigt [C++ MVP]

              #51
              Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?


              "Willy Denoyette [MVP]" <willy.denoyett e@telenet.bewro te in message
              news:etCf632sHH A.1376@TK2MSFTN GP02.phx.gbl...
              "Peter Ritchie [C# MVP]" <PRSoCo@newsgro ups.nospamwrote in message
              news:B3CC9E00-7F15-4259-A12D-C485DD525C02@mi crosoft.com...
              >"Jon Skeet [C# MVP]" wrote:
              >>I'm saying that I can write code which doesn't use volatile but which
              >>is *guaranteed* to work. I believe you won't be able to provide any
              >>exmaple of how it could fail without the CLI spec itself being
              >>violated.
              >Actually, I'm having a hard time getting the JIT to optimize *any* member
              >fields, even with lack of locking. Local variables seem to optimized
              >into
              >registers easily, but not member fields...
              >>
              >If I could get an optimization of a member field I believe I would be
              >able
              >show an example.
              >>
              >For example:
              >private Random random = new Random();
              >public int Method()
              >{
              >int result = 0;
              >for(int i = 0; i < this.random.Nex t(); ++i)
              >{
              > result += 10;
              >}
              >return result;
              >}
              >>
              >ebx is used for result (and edi for i) while in the loop; but with:
              >private Random random = new Random();
              >private int number;
              >public int Method()
              >{
              >for(int i = 0; i < this.random.Nex t(); ++i)
              >{
              > this.number += 10;
              >}
              >return this.number
              >}
              >>
              >...number is always accessed directly and never optimized to a register.
              >I
              >think I'd find the same thing with re-ordering.
              >
              >
              >
              >
              In your sample, the member field has to be read from the object location
              in the GC heap, and after the addition it has to be written back to the
              same location.
              The write "this.numbe r +=.... "must be a "store acquire" to fulfill the
              rules imposed by the CLR memory model. Note that this model derives from
              the ECMA model!
              >
              The assembly code of the core part of the loop, looks something like this
              (your mileage may vary):
              >
              mov eax,dword ptr [ebp-10h]
              add dword ptr [eax+8],0Ah
              >
              here the object reference of the current instance (this) is loaded from
              [ebp-10h] and stored in eax, after which 0Ah is added to the location of
              the 'number' field [eax+8].
              >
              Question is what else do you expect to optimize any further, and what are
              you expecting to illustrate?
              I know what just *did* get illustrated -- that the .NET JIT doesn't optimize
              nearly as well as the C++ optimizing compiler.
              >
              Willy.
              >

              Comment

              • =?Utf-8?B?UGV0ZXIgUml0Y2hpZSBbQyMgTVZQXQ==?=

                #52
                Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                Willy, I'm not following where you are going with your comment. As I've
                said, my example should have been number = 10 (or something similar) to
                capture CLI atomicity guarentees; but either operation is optimized to a
                single opcode on x86:

                x86 for number += 10:
                int count = random.Next();
                00000000 56 push esi
                00000001 8B F1 mov esi,ecx
                00000003 8B 4E 04 mov ecx,dword ptr [esi+4]
                00000006 8B 01 mov eax,dword ptr [ecx]
                00000008 FF 50 3C call dword ptr [eax+3Ch]
                0000000b 8B D0 mov edx,eax
                for(int i = 0; i count; ++i)
                0000000d 33 C0 xor eax,eax
                0000000f 85 D2 test edx,edx
                00000011 7D 0B jge 0000001E
                {
                number += 10;
                00000013 83 46 08 0A add dword ptr [esi+8],0Ah
                for(int i = 0; i count; ++i)
                00000017 83 C0 01 add eax,1
                0000001a 3B C2 cmp eax,edx
                0000001c 7F F5 jg 00000013

                and x86 for number = 10:
                int count = random.Next();
                00000000 56 push esi
                00000001 8B F1 mov esi,ecx
                00000003 8B 4E 04 mov ecx,dword ptr [esi+4]
                00000006 8B 01 mov eax,dword ptr [ecx]
                00000008 FF 50 3C call dword ptr [eax+3Ch]
                0000000b 8B D0 mov edx,eax
                for(int i = 0; i count; ++i)
                0000000d 33 C0 xor eax,eax
                0000000f 85 D2 test edx,edx
                00000011 7D 0E jge 00000021
                {
                number = 10;
                00000013 C7 46 08 0A 00 00 00 mov dword ptr [esi+8],0Ah
                for(int i = 0; i count; ++i)
                0000001a 83 C0 01 add eax,1
                0000001d 3B C2 cmp eax,edx
                0000001f 7F F2 jg 00000013

                I don't know if your comment was supposed to show the adjacentness of object
                reference load to the increment; but it's clearly not doing that (the object
                reference load is hoisted out of the loop and before the Next() call, which
                is where it needs it first. If that wasn't your point, pardon my ramblings;
                but they do provide basis for what follows...

                But, the difference here is irrelevent. It's the difference between a
                member field and a local variable. x86 for the same code with a local
                variable instead of a member field:
                int count = random.Next();
                00000000 8B C1 mov eax,ecx
                00000002 8B 48 04 mov ecx,dword ptr [eax+4]
                00000005 8B 01 mov eax,dword ptr [ecx]
                00000007 FF 50 3C call dword ptr [eax+3Ch]
                0000000a 8B D0 mov edx,eax
                int result = 0;
                0000000c 33 C9 xor ecx,ecx
                for (int i = 0; i count; ++i)
                0000000e 33 C0 xor eax,eax
                00000010 85 D2 test edx,edx
                00000012 7D 0A jge 0000001E
                {
                result += 10;
                00000014 83 C1 0A add ecx,0Ah
                for (int i = 0; i count; ++i)
                00000017 83 C0 01 add eax,1
                0000001a 3B C2 cmp eax,edx
                0000001c 7F F6 jg 00000014

                I was expecting the JIT to do much better optimizations (looping x times
                assigning the same value to number) than it had. Sure, the difference
                between an single add with a register and a memory location is small... In
                the increment case, I was expecting something similar to the local variable:
                by using a register for the duration of the loop. If Next() returned 10, the
                loop would effectively be:

                number += 10; number += 10; number += 10; number += 10;
                number += 10; number += 10; number += 10; number += 10;
                number += 10; number += 10; // joined for brevity

                All adjacent writes on the same thread, where optimizing to a register being
                removing a write...

                And, in fact, if you do 10 increments instead of a loop, the JIT *still*
                won't optimize any writes away. I know it knows how; because it will do it
                with local variables.

                Which leads me to believe that that JIT implemention is giving people a
                false sense of security by not introducing things clearly accounted for in
                the specs. Not to mention, makes the whole discussion somewhat moot.

                Comment

                • Willy Denoyette [MVP]

                  #53
                  Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                  "Peter Ritchie [C# MVP]" <PRSoCo@newsgro ups.nospamwrote in message
                  news:8C109B5F-F143-4096-90CA-9305CEF659D4@mi crosoft.com...
                  Willy, I'm not following where you are going with your comment. As I've
                  said, my example should have been number = 10 (or something similar) to
                  capture CLI atomicity guarentees; but either operation is optimized to a
                  single opcode on x86:
                  >
                  x86 for number += 10:
                  int count = random.Next();
                  00000000 56 push esi
                  00000001 8B F1 mov esi,ecx
                  00000003 8B 4E 04 mov ecx,dword ptr [esi+4]
                  00000006 8B 01 mov eax,dword ptr [ecx]
                  00000008 FF 50 3C call dword ptr [eax+3Ch]
                  0000000b 8B D0 mov edx,eax
                  for(int i = 0; i count; ++i)
                  0000000d 33 C0 xor eax,eax
                  0000000f 85 D2 test edx,edx
                  00000011 7D 0B jge 0000001E
                  {
                  number += 10;
                  00000013 83 46 08 0A add dword ptr [esi+8],0Ah
                  for(int i = 0; i count; ++i)
                  00000017 83 C0 01 add eax,1
                  0000001a 3B C2 cmp eax,edx
                  0000001c 7F F5 jg 00000013
                  >
                  and x86 for number = 10:
                  int count = random.Next();
                  00000000 56 push esi
                  00000001 8B F1 mov esi,ecx
                  00000003 8B 4E 04 mov ecx,dword ptr [esi+4]
                  00000006 8B 01 mov eax,dword ptr [ecx]
                  00000008 FF 50 3C call dword ptr [eax+3Ch]
                  0000000b 8B D0 mov edx,eax
                  for(int i = 0; i count; ++i)
                  0000000d 33 C0 xor eax,eax
                  0000000f 85 D2 test edx,edx
                  00000011 7D 0E jge 00000021
                  {
                  number = 10;
                  00000013 C7 46 08 0A 00 00 00 mov dword ptr [esi+8],0Ah
                  for(int i = 0; i count; ++i)
                  0000001a 83 C0 01 add eax,1
                  0000001d 3B C2 cmp eax,edx
                  0000001f 7F F2 jg 00000013
                  >
                  I don't know if your comment was supposed to show the adjacentness of
                  object
                  reference load to the increment; but it's clearly not doing that (the
                  object
                  reference load is hoisted out of the loop and before the Next() call,
                  which
                  is where it needs it first. If that wasn't your point, pardon my
                  ramblings;
                  but they do provide basis for what follows...
                  >

                  This is because you run this code in a "managed debugger", the JIT produces
                  different code from what is produced when no managed bedugger is attached!
                  You need to run the code (release version) in a native debugger to see what
                  the JIT really produces for 'release' mode code.
                  As unmanaged debugger you can use any of the debuggers from the Debugging
                  Tools for Windows like windbg, sdb etc... (Which is what I prefer, because
                  it's more powerfull as the VS2005 debugger)
                  You can also use VS2005 as unmanaged debugger, but you need to make sure you
                  break into an unmanaged debugging session. That means you cannot use
                  System.Diagnost ics.Debugger.Br eak(), you have to call Kernel32.dll
                  DebugBreak().

                  Here is the PInvoke signature:
                  [DllImport("kern el32"), SuppressUnmanag edCodeSecurity] static extern void
                  DebugBreak();

                  Add a call to DebugBreak() in your code, run the program without debugging
                  (CTRL+F5) from within VS and wait until a break is hit, select 'Debug' -
                  select the current VS instance from the list in the "VS JIT Debugger"
                  dialog. Wait for the break message is hit, press 'Break' and in the
                  following dialog press 'Show Disassembly'.

                  When you hit this point you'll see this (partly stripped) :

                  X86, member variable number = 10;
                  ....
                  0032013A mov eax,dword ptr [ebp-10h]
                  0032013D mov dword ptr [eax+8],0Ah
                  00320144 add edx,1
                  00320147 cmp edx,esi
                  00320149 jl 0032013A
                  0032014B mov eax,dword ptr [ebp-10h]
                  0032014E mov eax,dword ptr [eax+8]
                  .......


                  The first two instructions are the load of the 'this' instance pointer into
                  eax, and the store of '0Ah' into the member field 'number' of 'this'.
                  This sequence is repeated until the loop counter reaches the count value.
                  But, the difference here is irrelevent. It's the difference between a
                  member field and a local variable. x86 for the same code with a local
                  variable instead of a member field:
                  int count = random.Next();
                  00000000 8B C1 mov eax,ecx
                  00000002 8B 48 04 mov ecx,dword ptr [eax+4]
                  00000005 8B 01 mov eax,dword ptr [ecx]
                  00000007 FF 50 3C call dword ptr [eax+3Ch]
                  0000000a 8B D0 mov edx,eax
                  int result = 0;
                  0000000c 33 C9 xor ecx,ecx
                  for (int i = 0; i count; ++i)
                  0000000e 33 C0 xor eax,eax
                  00000010 85 D2 test edx,edx
                  00000012 7D 0A jge 0000001E
                  {
                  result += 10;
                  00000014 83 C1 0A add ecx,0Ah
                  for (int i = 0; i count; ++i)
                  00000017 83 C0 01 add eax,1
                  0000001a 3B C2 cmp eax,edx
                  0000001c 7F F6 jg 00000014
                  >
                  I was expecting the JIT to do much better optimizations (looping x times
                  assigning the same value to number) than it had.
                  That's right, the JIT optimizer is quite conservative when optimizing loops.
                  However, I don't know who writes code like this:
                  for(int i = 0; i < count ; ++i)
                  result = 10;

                  Sure, the difference
                  between an single add with a register and a memory location is small...
                  In
                  the increment case, I was expecting something similar to the local
                  variable:
                  by using a register for the duration of the loop. If Next() returned 10,
                  the
                  loop would effectively be:
                  >
                  number += 10; number += 10; number += 10; number += 10;
                  number += 10; number += 10; number += 10; number += 10;
                  number += 10; number += 10; // joined for brevity
                  >
                  Same here, this can be optimized by storing the number in a local before
                  running the loop and once done moving the local to the field variable. This
                  is something you should do whenever you are dealing with field variable in
                  long running algorithms.

                  Granted, in the sample above, the loop won't be optimized as agressively as
                  a native compiler would do (a C compiler will hoist the loop completely),
                  but again, I don't know if one writes code like this.
                  All adjacent writes on the same thread, where optimizing to a register
                  being
                  removing a write...
                  >
                  And, in fact, if you do 10 increments instead of a loop, the JIT *still*
                  won't optimize any writes away. I know it knows how; because it will do
                  it
                  with local variables.
                  >
                  Which leads me to believe that that JIT implemention is giving people a
                  false sense of security by not introducing things clearly accounted for in
                  the specs. Not to mention, makes the whole discussion somewhat moot.
                  >
                  Don't know what this has to do with security and the specs, this is about
                  loop optimizing, right?.

                  Willy.

                  Comment

                  • =?Utf-8?B?UGV0ZXIgUml0Y2hpZSBbQyMgTVZQXQ==?=

                    #54
                    Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                    "Willy Denoyette [MVP]" wrote:
                    <snip>
                    This is because you run this code in a "managed debugger", the JIT produces
                    different code from what is produced when no managed bedugger is attached!
                    You need to run the code (release version) in a native debugger to see what
                    the JIT really produces for 'release' mode code.
                    As unmanaged debugger you can use any of the debuggers from the Debugging
                    Tools for Windows like windbg, sdb etc... (Which is what I prefer, because
                    it's more powerfull as the VS2005 debugger)
                    You can also use VS2005 as unmanaged debugger, but you need to make sure you
                    break into an unmanaged debugging session. That means you cannot use
                    System.Diagnost ics.Debugger.Br eak(), you have to call Kernel32.dll
                    DebugBreak().
                    I've been using Vance Morrison's guide observing optimized managed code [1]
                    That's right, the JIT optimizer is quite conservative when optimizing loops.
                    As I already pointed out, it optimized identical loops--not using member
                    fields.
                    However, I don't know who writes code like this:
                    for(int i = 0; i < count ; ++i)
                    result = 10;
                    That's irrelevant, the optimizer doesn't know who's likely to write what
                    code. The exercise is to show optimized code.
                    Same here, this can be optimized by storing the number in a local before
                    running the loop and once done moving the local to the field variable. This
                    is something you should do whenever you are dealing with field variable in
                    long running algorithms.
                    >
                    Granted, in the sample above, the loop won't be optimized as agressively as
                    a native compiler would do (a C compiler will hoist the loop completely),
                    Huh? In the post you replied to I showed an example where the JIT *did*
                    hoist the loop completely, just not with member fields.
                    Don't know what this has to do with security and the specs, this is about
                    loop optimizing, right?.
                    No, as I pointed out, it's about getting an example of JIT optimization of
                    member fields. It doesn't have to be a loop, it's just loop optimization is
                    easy to have generated.

                    [1] http://blogs.msdn.com/vancem/archive...20/535807.aspx

                    Comment

                    • Willy Denoyette [MVP]

                      #55
                      Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                      "Peter Ritchie [C# MVP]" <PRSoCo@newsgro ups.nospamwrote in message
                      news:0BC83EEC-7D0B-4E28-B06D-F3B284E7701E@mi crosoft.com...
                      "Willy Denoyette [MVP]" wrote:
                      <snip>
                      >This is because you run this code in a "managed debugger", the JIT
                      >produces
                      >different code from what is produced when no managed bedugger is
                      >attached!
                      >You need to run the code (release version) in a native debugger to see
                      >what
                      >the JIT really produces for 'release' mode code.
                      >As unmanaged debugger you can use any of the debuggers from the Debugging
                      >Tools for Windows like windbg, sdb etc... (Which is what I prefer,
                      >because
                      >it's more powerfull as the VS2005 debugger)
                      >You can also use VS2005 as unmanaged debugger, but you need to make sure
                      >you
                      >break into an unmanaged debugging session. That means you cannot use
                      >System.Diagnos tics.Debugger.B reak(), you have to call Kernel32.dll
                      >DebugBreak() .
                      >
                      I've been using Vance Morrison's guide observing optimized managed code
                      [1]
                      >
                      Which is wrong, it doesn't show machine code as it would do when you don't
                      run in the managed debugger (VS debugger or mdbg) code. The CLR knows that
                      he runs in the managed debugger using the "managed debugger interfaces"
                      (ICorDebug - COM interfaces), and forces the JIT to produce different code
                      as it would when no managed debugger would be attached! What Vance calls
                      optimized code is not what the JIT produces when run outside of the
                      debugger. That's why I allways use Windbg to analyze assembly code.

                      You don't have to believe me, just do as I said and try to run the code in
                      windbg (you can download the latest builds for free from
                      http://www.microsoft.com/whdc/devtoo...g/default.mspx, or as I have
                      explained in my previous post, using VS2005, but take care not to run in the
                      VS Debugger!.
                      If you still don't believe me, you can ngen your code and run "dumpbin
                      /rawdata program.ni.exe" , where "program" is the name of the assembly. The
                      ngen'd image can be found in:
                      C:\Windows\asse mbly\NativeImag es_v2.0.50727_3 2\blable....

                      The output should contain something like:

                      30002640: 8B 45 F0 C7 40 08 0A 00 00 00 83 C2 01 3B D1 7C .E­Ã@......â” ¬.;Ð|
                      30002650: EF 8B 45 F0 8B 40 08 8B 7D CC 89 7E 0C 8D 65 F4 ´.E­.@..}╠.~..e¶
                      30002660: 5B 5E 5F 5D C3 CC CC CC BF 27 00 30 6A 46 00 30 [^_]├╠╠╠┐'.0jF.0
                      .....

                      To find the exact addresses you will have to look at the unmanaged debugger
                      output...
                      Following is how it looks like when I ran this in windbg:

                      vols2_ni!Willys .Test.Method2() +0x7c:
                      30002640 8b45f0 mov eax,dword ptr [ebp-10h]
                      30002643 c740080a000000 mov dword ptr [eax+8],0Ah
                      3000264a 83c201 add edx,1
                      3000264d 3bd1 cmp edx,ecx
                      3000264f 7cef jl vols2_ni!Willys .Test.Method2() +0x7c
                      (30002640)

                      Now you just have to compare the sequences of bytes.
                      What I see when running in windbg or sdb orVS2005's VSJIT debugger and what
                      I see produced by ngen are exactly the same, are you telling me that what I
                      see is not correct?.
                      >That's right, the JIT optimizer is quite conservative when optimizing
                      >loops.
                      >
                      As I already pointed out, it optimized identical loops--not using member
                      fields.
                      >
                      >However, I don't know who writes code like this:
                      > for(int i = 0; i < count ; ++i)
                      > result = 10;
                      >
                      That's irrelevant, the optimizer doesn't know who's likely to write what
                      code. The exercise is to show optimized code.
                      >
                      >Same here, this can be optimized by storing the number in a local before
                      >running the loop and once done moving the local to the field variable.
                      >This
                      >is something you should do whenever you are dealing with field variable
                      >in
                      >long running algorithms.
                      >>
                      >Granted, in the sample above, the loop won't be optimized as agressively
                      >as
                      >a native compiler would do (a C compiler will hoist the loop completely),
                      >
                      Huh? In the post you replied to I showed an example where the JIT *did*
                      hoist the loop completely, just not with member fields.
                      >
                      Yes, but again you were running in the managed debugger!
                      > Don't know what this has to do with security and the specs, this is
                      >about
                      >loop optimizing, right?.
                      >
                      No, as I pointed out, it's about getting an example of JIT optimization of
                      member fields. It doesn't have to be a loop, it's just loop optimization
                      is
                      easy to have generated.
                      >
                      [1] http://blogs.msdn.com/vancem/archive...20/535807.aspx
                      >
                      Willy.

                      Comment

                      • =?Utf-8?B?UGV0ZXIgUml0Y2hpZSBbQyMgTVZQXQ==?=

                        #56
                        Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                        It appears the x86 JIT's (or the design-team’s thereof) interpretation is
                        somewhat similar to my interpretation in that compile-time optimization
                        restrictions and run-time acquire/release semantics are separate
                        considerations.

                        The JIT is using the protected region as the guard in which not to optimize,
                        not Monitor.Enter/Monitor.Exit. Anything within a try block will be
                        considered volatile operations by the compile-time optimizer, it has nothing
                        to do with Enter/Exit or directly with "lock" (other than lock is implemented
                        with a try block). Secondary to that, and outside of any documentation I’ve
                        been able to find, it appears the all member access (only tried Instance, not
                        Class) is considered a volatile operation by the JIT in terms of
                        optimizations, regardless of being in or out of a protected region (obviously
                        acquire/release semantics is the responsibility of Enter/Exit, MemoryBarrier,
                        etc. and is not implicitly obtained without them). Therefore acquire/release
                        semantics guarantees do not directly affect what the JIT decides not to
                        optimize.

                        For example:
                        int result = 0;
                        Monitor.Enter(l ocker);
                        result += 10;
                        result += 10;
                        Monitor.Exit(lo cker);
                        result += 10;
                        result += 10;
                        result += 10;
                        return result;

                        ....is compile-time optimized by the JIT to the equivalent of:
                        Monitor.Enter(l ocker);
                        Monitor.Exit(lo cker);
                        return 50;

                        ....and you get acquire/release semantics on nothing in the current threads
                        (other than "locker" in the call to Exit).

                        And:
                        int result = 0;
                        try
                        {
                        result += 10;
                        result += 10;
                        }
                        finally {
                        result += 10;
                        result += 10;
                        result += 10;
                        }
                        return result;

                        ....is compile-time optimized by the JIT to the equivalent of:
                        int result = 0;
                        try
                        {
                        result += 10;
                        result += 10;
                        }
                        finally {
                        result += 30;
                        }
                        return result;

                        ....but I do not get acquire/release semantics within the try block.

                        And finally:

                        int result = 0;
                        Monitor.Enter(l ocker);
                        try {
                        result += 10;
                        result += 10;
                        } finally {
                        Monitor.Exit(lo cker);
                        result += 10;
                        result += 10;
                        result += 10;
                        }
                        return result;

                        ....is compile-time optimized by the JIT to the equivalent of:

                        int result = 0;
                        Monitor.Enter(l ocker);
                        try {
                        result += 10;
                        result += 10;
                        } finally {
                        Monitor.Exit(lo cker);
                        result += 30;
                        }
                        return result;

                        ....and this is the only example where I get the JIT optimization AND
                        acquire/release semantics guarantees you've been talking about.

                        I don’t believe this compile-time optimization behaviour is covered clearly,
                        if at all, in 335.

                        --
                        Browse http://connect.microsoft.com/VisualStudio/feedback/ and vote.

                        Microsoft MVP, Visual Developer - Visual C#

                        Comment

                        • Ben Voigt [C++ MVP]

                          #57
                          Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                          This is because you run this code in a "managed debugger", the JIT
                          produces
                          different code from what is produced when no managed bedugger is attached!
                          The JIT produces different code when *started* in a debugger. When a
                          debugger is attached later, managed or not, the optimized code is already
                          generated.


                          Comment

                          • Willy Denoyette [MVP]

                            #58
                            Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                            "Ben Voigt [C++ MVP]" <rbv@nospam.nos pamwrote in message
                            news:ecY0PvCtHH A.1416@TK2MSFTN GP06.phx.gbl...
                            >
                            >This is because you run this code in a "managed debugger", the JIT
                            >produces
                            >different code from what is produced when no managed bedugger is
                            >attached!
                            >
                            The JIT produces different code when *started* in a debugger. When a
                            debugger is attached later, managed or not, the optimized code is already
                            generated.
                            >
                            Yep, but this is not the case when an "unmanaged debugger" is attached.
                            Using an unmanaged debugger like sdb, you can break into the debugger before
                            the CLR is even loaded, after JITing you will get 'fidelity' code. Unmanaged
                            debuggers aren't using the ICORDebugger COM interface to interact with the
                            CLR (using the CLR's debugger thread as present in any managed code
                            process).
                            Note that when running in VS debugger, you can get the same behavior, you
                            only need to take care not to break using
                            System.Diagnost ics.Debugger.Br eak(), else you will get ICORdebug as
                            interface and the CLR will signal the presence of a managed debugger to the
                            JIT.

                            Willy.

                            Comment

                            • Jon Skeet [C# MVP]

                              #59
                              Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                              Peter Ritchie [C# MVP] <PRSoCo@newsgro ups.nospamwrote :
                              It appears the x86 JIT's (or the design-team?s thereof) interpretation is
                              somewhat similar to my interpretation in that compile-time optimization
                              restrictions and run-time acquire/release semantics are separate
                              considerations.
                              They can be implemented separately without the team having decided that
                              our reading of the spec is incorrect.
                              The JIT is using the protected region as the guard in which not to optimize,
                              not Monitor.Enter/Monitor.Exit. Anything within a try block will be
                              considered volatile operations by the compile-time optimizer, it has nothing
                              to do with Enter/Exit or directly with "lock" (other than lock is implemented
                              with a try block).
                              That may be *a* type of optimisation blocking - it doesn't mean it's
                              the only one.
                              Secondary to that, and outside of any documentation I?ve
                              been able to find, it appears the all member access (only tried Instance, not
                              Class) is considered a volatile operation by the JIT in terms of
                              optimizations, regardless of being in or out of a protected region (obviously
                              acquire/release semantics is the responsibility of Enter/Exit, MemoryBarrier,
                              etc. and is not implicitly obtained without them). Therefore acquire/release
                              semantics guarantees do not directly affect what the JIT decides not to
                              optimize.
                              Unless the reason the JIT decided not to optimise *anything* for member
                              variables is because it's simpler than trying to work out exactly where
                              it can and can't optimise due to Monitor.Enter/Exit.

                              By not reordering member access, the JIT is automatically complying
                              with the spec without having to do any extra checking. That doesn't
                              mean that the guarantees given by the spec don't apply - just that the
                              JIT is being stricter than it needs to.
                              For example:
                              int result = 0;
                              Monitor.Enter(l ocker);
                              result += 10;
                              result += 10;
                              Monitor.Exit(lo cker);
                              result += 10;
                              result += 10;
                              result += 10;
                              return result;
                              >
                              ...is compile-time optimized by the JIT to the equivalent of:
                              Monitor.Enter(l ocker);
                              Monitor.Exit(lo cker);
                              return 50;
                              That's certaily interesting - but the difference can't be *observed*
                              because no other thread has access to the value on that thread's stack.

                              I'll readily confess that I can't see where that's made clear in the
                              spec, unless it's the section about 12.6.4. It certainly makes sense
                              though - optimising within a stack can be done easily without
                              introducing bugs.

                              Another argument in favour of this is that the volatile prefix can't be
                              applied to the ldloc instruction - it's only applicable for potentially
                              shared data:

                              <quote>
                              The volatile. prefix specifies that addr is a volatile address (i.e.,
                              it can be referenced externally to the current thread of execution) and
                              the results of reading that location cannot be cached or that multiple
                              stores to that location cannot be suppressed.
                              </quote>
                              ...and you get acquire/release semantics on nothing in the current threads
                              (other than "locker" in the call to Exit).
                              Again you're talking about acquire/release semantics *on* something -
                              which is something the spec doesn't talk about. It talks about
                              acquire/release semantics at a particular point in time.
                              And:
                              <snip example with try/finally>
                              I don?t believe this compile-time optimization behaviour is covered clearly,
                              if at all, in 335.
                              The spec doesn't talk about compile-time vs run-time optimisation
                              though - it talks about observable behaviour. As a developer trying to
                              write code which is guaranteed to work against the spec, I don't care
                              whether the JIT has to do more or less work depending on the CPU it's
                              on - I just care that my code works in all situations.

                              I still believe the spec guarantees that for the situation I've
                              specified.

                              --
                              Jon Skeet - <skeet@pobox.co m>
                              http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                              If replying to the group, please do not mail me too

                              Comment

                              • Peter Ritchie [C#MVP]

                                #60
                                Re: When is &quot;volatile& quot; used instead of &quot;lock&quot ; ?

                                It's somewhat moot, I feel, at this point to discuss it much further, other
                                to continue to say each other's interpretation is different. But, if you're
                                interested in arguing from the spec itself... If you want to take it
                                offline, just reply to the email address you have for me.

                                The behaviour I've observed proves nothing about anyone's interpretation of
                                the spec, it merely speaks to what appears to be the JIT's opinion of what a
                                volatile operation is, despite what the spec says. What I've observed shows
                                optimization blocking and acquire/release semantics are at least considered
                                indpendantly (I'll admit it's not proof that the JIT does or does not take
                                into account Enter/Exit calls, but it can't use that to decide how it should
                                generate assembler for another method, regardless of whether a JIT allows
                                any optimzation of observable read/writes). The combination of behaviour
                                I've observed may suggest the JIT is attempting to guarantee observable
                                read/writes can't be reorderd and therefore must be flushed at Enter to Exit
                                (which is good); but that neither proves your interpretation of the spec nor
                                disproves mine.

                                If the MS x86 JIT does not in fact optimize member fields to fulfil
                                that/those particular guarantee(s), that really just substantiates my
                                assertion that the spec. is unclear and that your rebuttal is intepretive.
                                It's also a bit contradictory to information you've said you received from
                                Vance and Chris. No offence or implication that you didn't receive that
                                information; just that it seems contradictory, if it's indeed true that the
                                JIT doesn't optimize member fields and therefore does not need to look for
                                Enter/Exit...

                                You're readily confessing that the reasons for the side-effects you're
                                relying upon are not clear in the spec, yet you're still advocating reliance
                                upon the side-effects (arguing from the spec itself)?

                                To reduce typing:

                                "Conforming implementations of the CLI are free to execute programs using
                                any technology that guarantees, within a single thread of execution, that
                                side-effects and exceptions generated by a thread are visible in the order
                                specified by the CIL. For this purpose only volatile operations (including
                                volatile reads) constitute visible side-effects."

                                I'll call that statement optimization allowance 1 (OA1).

                                "Acquiring a lock (System.Threadi ng.Monitor.Ente r or entering a synchronized
                                method) shall implicitly perform a volatile read operation, and releasing a
                                lock (System.Threadi ng.Monitor.Exit or leaving a synchronized method) shall
                                implicitly perform a volatile write operation."

                                I'll call that statement locking rule 1 (LR1).
                                Again you're talking about acquire/release semantics *on* something -
                                which is something the spec doesn't talk about. It talks about
                                acquire/release semantics at a particular point in time.
                                Semantics. In the context, the acquire/release semantics ensure flushing to
                                memory of no values being read after the Enter and before the Exit, other
                                than "locker" (the only thing read merely supports the existance of
                                Enter/Exit) and there are no observable writes. An example that shows the
                                lack of clarity of LR1 (i.e. an "implicitly perform[ing] a volatile write"
                                of what?), the only association between Enter/Exit and acqurie/release.
                                Yes, it has the side effect of having flushed values to memory for
                                subsequent reads (the acquire semantics) but that makes no guarentees for
                                the instructions immediately following Exit and therefore no guarentees on
                                any reads.

                                What LR1 implies for acquire/release semantics hinges on whether you believe
                                the implication that everything on and after a call to Enter and before a
                                call to Exit constitutes one volatile read and that the call to Exit
                                constitues one volatile write. No matter how you interpret that paragraph
                                it's unclear. Regardless of intepretation it still leaves the code between
                                related Enter and Exit calls in a black hole (ignoring the fact there is no
                                syntax ensuring related Enter/Exit calls occur in the same block, the same
                                method, or even the same assembly). With your intepretation the release
                                semantics for the block occur at the call to Exit; which leaves any writes
                                within the block without release semantics until the end of the block and
                                therefore makes no guarantees any writes are visible to other threads until
                                Exit.

                                Without clarity of LR1, you can neither make the connection between
                                acquire/release semantics and with Enter/Exit nor, therefore, the connection
                                to observable side-effect guarantees.
                                The spec doesn't talk about compile-time vs run-time optimisation
                                though - it talks about observable behaviour.
                                And that has been my point. Without taking into account what the JIT *is*
                                doing, your interpretation of the guarentee(s) means the following is safe:
                                //thread A:
                                instance.firstI ntMember = 1;
                                instance.firstI ntMember = 2;

                                //thread B:
                                instance.second IntMember = 3;
                                instance.second IntMember = 4;

                                //thread C:
                                Monitor.Enter(l ocker)
                                instance.otherM ember = instance.firstI ntMember;
                                instance.anothe rMember = instance.second IntMember;
                                Monitor.Exit(lo cker);

                                ....including atomicity rules: the assignment to otherMember in C is
                                "guaranteed " to see any observable side-effects made to firstIntMember, and
                                the assignment to anotherMember in C is "guarenteed " to see any observable
                                side-effects made to secondIntMember . And yet, nowhere in that code is
                                there enough information for a JIT to make any decisions what and what not
                                to optimize in A and B, especially considering thread A code and thread B
                                code are likely in different methods than C and that they could be in
                                different assemblies: JITted independantly. OA1 suggests it could optimize
                                away assignment of 1 in A and 3 in B because that isn't observable "within
                                [that] single thread of execution."

                                Is it good code? Of course not. Should it pass code review? Of course not.
                                What is and isn't sanctioned code is outside the domain of a C# compiler,
                                the JIT, or the CLI. The point is the spec is unclear in this area and to
                                use it as a crutch to support using syntax because of its side-effects is,
                                in my opinion, not a good practice. Using observed behaviour as a crutch is
                                better; but if the behviour doesn't match the spec it's subject to change
                                and, again, not a good practice.


                                Comment

                                Working...