Deadlock in multithread C# app

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Gordon Cone

    Deadlock in multithread C# app

    I am currently debugging a deadlock in a multithread C# application. It
    makes lots of calls to legacy unmanaged code. The application runs on windows
    sever 2003 and uses the sever version of the CLR. The application runs on
    ..NET 1.1 SP1.

    When the deadlock happens the process is using 0 CPU.

    I was able to collect crash dumps and have been trying to make sense out of
    them using windbg.

    !threads show PreEmptive GC is disabled for both thread 20 and 26. There are
    other threads running (all have PreEmptive GC enabled) but I did not display
    them here to keep this shorter.

    PreEmptive GC Alloc
    Lock
    ID ThreadOBJ State GC Context Domain
    Count APT Exception
    20 0x1658 0x16e083d0 0x1800222 Disabled 0x01275044:0x01 276a64 0x001498b8
    1 MTA (Threadpool Worker) System.NullRefe renceException
    26 0x958 0x414274c0 0x1800220 Disabled 0x0d327830:0x0d 327844 0x001498b8
    2 MTA (GC) (Threadpool Worker) System.NullRefe renceException


    Here is the clrstack for thread 26. ttGetRteVarVari ation is unmanaged code.
    Looks like it had a problem and a exception (probably a
    NullReferenceEx ception) was thrown. There are GCFrames in the middle so I
    begin to wonder if the GC is involved.

    ESP EIP
    0x40bae194 0x7c82ed54 [FRAME: HelperMethodFra me]
    0x40bae1c0 0x799ef4fc [DEFAULT] [hasThis] Void
    System.Diagnost ics.StackTrace. CaptureStackTra ce(I4,Boolean,C lass
    System.Threadin g.Thread,Class System.Exceptio n)
    0x40bae1e0 0x799f11fc [DEFAULT] [hasThis] Void
    System.Diagnost ics.StackTrace. .ctor(Class System.Exceptio n,Boolean)
    0x40bae1ec 0x799ef21f [DEFAULT] String
    System.Environm ent.GetStackTra ce(Class System.Exceptio n)
    0x40bae248 0x16e32c14 [FRAME: InterceptorFram e] [DEFAULT] String
    System.Environm ent.GetStackTra ce(Class System.Exceptio n)
    0x40bae258 0x799f1b09 [DEFAULT] [hasThis] String
    System.Exceptio n.get_StackTrac e()
    0x40bae260 0x799f1a32 [DEFAULT] [hasThis] String System.Exceptio n.ToString()
    0x40bae28c 0x16e32b54 [FRAME: InterceptorFram e] [DEFAULT] [hasThis] String
    System.Exceptio n.ToString()
    0x40bae29c 0x799fc50e [DEFAULT] [hasThis] String
    System.Exceptio n.InternalToStr ing()
    0x40bae578 0x791b33cc [FRAME: GCFrame]
    0x40baefa4 0x791b33cc [FRAME: GCFrame]
    0x40baf670 0x791b33cc [FRAME: NDirectMethodFr ameGeneric] [DEFAULT] String
    <namespaceremov ed>.ttGetRteVar Variation(I4)
    0x40baf680 0x40278218 [DEFAULT] [hasThis] String
    <namespaceremov ed>.GetRteVarVa riation(I4)
    at [+0x8] [+0x0]
    0x40baf684 0x402781df [DEFAULT] [hasThis] String
    <namespaceremov ed>.get_Variati on()
    at [+0x1f] [+0x0]
    <more stack trace after this but not that interesting>

    Here is the native stack for thread 26. Look like the thread was trying to
    allocate space (probably for the exception) and forced a call to GC.

    ChildEBP RetAddr
    40badb8c 7c822114 ntdll!KiFastSys temCallRet
    40badb90 77e6711b ntdll!NtWaitFor MultipleObjects +0xc
    40badc38 77e61075 kernel32!WaitFo rMultipleObject sEx+0x11a
    40badc54 791f2ff8 kernel32!WaitFo rMultipleObject s+0x18
    40bade8c 791f311c mscorsvr!Thread ::SysSuspendFor GC+0x248
    40badea4 791f337d mscorsvr!GCHeap ::SuspendEE+0xc f
    40badec0 791f5775 mscorsvr!GCHeap ::GarbageCollec tGeneration+0x1 3f
    40badef0 791bd4ae mscorsvr!gc_hea p::allocate_mor e_space+0x181
    40bae118 791b5411 mscorsvr!GCHeap ::Alloc+0x7b
    40bae12c 791b93c3 mscorsvr!Alloc+ 0x3a
    40bae14c 791b9411 mscorsvr!FastAl locateObject+0x 25
    40bae1b8 799ef4fc mscorsvr!JIT_Ne wFast+0x2c
    WARNING: Stack unwind information not available. Following frames may be
    wrong.
    40bae1cc 799f11fc mscorlib_799900 00+0x5f4fc
    40bae1e0 799ef21f mscorlib_799900 00+0x611fc
    40bae204 791d8504 mscorlib_799900 00+0x5f21f
    40bae218 00000000 mscorsvr!DoDecl arativeSecurity +0x1a

    I went looking for the garbage collection thread to see what it was up to.
    I found 4 GC threads. They all look the same. Here is the native stack for
    one of the GC threads:

    2 Id: 2394.1910 Suspend: 1 Teb: 7ffdd000 Unfrozen
    ChildEBP RetAddr Args to Child
    WARNING: Stack unwind information not available. Following frames may be
    wrong.
    00daff70 77e6ba12 000000bc ffffffff 00000000 ntdll!KiFastSys temCallRet
    00daff84 791f3206 000000bc ffffffff 00000000 kernel32!WaitFo rSingleObject+0 x12
    00daffac 79224ac2 00000000 00daffec 77e66063
    mscorsvr!gc_hea p::gc_thread_fu nction+0x2f
    00daffb8 77e66063 00150448 00000000 00000000
    mscorsvr!gc_hea p::gc_thread_st ub+0x1e
    00daffec 00000000 79224aa4 00150448 00000000 kernel32!GetMod uleFileNameA+0x eb


    Looks like the garbage collection is waiting for something to happen. Some
    research I have done tells me the garbage collection has to suspend all
    threads before doing its work and that garbage collection can not suspend a
    thread if the PreEmptive GC is disabled. PreEmptive GC can be disabled if
    the thread is calling unmanaged code. Looks like I have a classic deadlock:

    1) garbage collector is waiting for all threads to suspend.
    2) thread is waiting for garbage collector to finish.

    I believe I know why the code is throwing NullReferenceEx ception and I am
    fixing it.

    My worry here is that this is a generic thing (A GC call is forced while
    PreEmptive GC is disabled) that could happen and could still happen even if I
    get rid of the NullReferenceEx ceptions.

    Does anybody have any ideas or thoughts here? Have I found a bug/feature in
    the .NET framework?

Working...