Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in DacEnumerableHashTable::BaseFindNextEntryByHash #75041

Closed
carlossanlop opened this issue Sep 2, 2022 · 7 comments · Fixed by #75099
Closed

Race condition in DacEnumerableHashTable::BaseFindNextEntryByHash #75041

carlossanlop opened this issue Sep 2, 2022 · 7 comments · Fixed by #75099
Assignees
Labels
area-VM-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@carlossanlop
Copy link
Member

Found in a release/7.0 backport PR: #75004

Please help determine if this is needs a fix to get backported to 7.0.

C:\h\w\BA110A1A\w\9BF808FF\e>"C:\h\w\BA110A1A\p\dotnet.exe" exec --runtimeconfig System.Composition.TypedParts.Tests.runtimeconfig.json --depsfile System.Composition.TypedParts.Tests.deps.json xunit.console.dll System.Composition.TypedParts.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Composition.TypedParts.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Composition.TypedParts.Tests (found 44 test cases)
  Starting:    System.Composition.TypedParts.Tests (parallel test collections = on, max threads = 2)
Fatal error. Internal CLR error. (0x80131506)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Xunit.Sdk.TestCollectionRunner`1+<RunTestClassesAsync>d__28[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunTestClassesAsync>d__28<System.__Canon> ByRef)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Start[[Xunit.Sdk.TestCollectionRunner`1+<RunTestClassesAsync>d__28[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunTestClassesAsync>d__28<System.__Canon> ByRef)
   at Xunit.Sdk.TestCollectionRunner`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].RunTestClassesAsync()
   at Xunit.Sdk.TestCollectionRunner`1+<RunAsync>d__27[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Xunit.Sdk.TestCollectionRunner`1+<RunAsync>d__27[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunAsync>d__27<System.__Canon> ByRef)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Start[[Xunit.Sdk.TestCollectionRunner`1+<RunAsync>d__27[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunAsync>d__27<System.__Canon> ByRef)
   at Xunit.Sdk.TestCollectionRunner`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].RunAsync()
   at Xunit.Sdk.XunitTestAssemblyRunner.RunTestCollectionAsync(Xunit.Sdk.IMessageBus, Xunit.Abstractions.ITestCollection, System.Collections.Generic.IEnumerable`1<Xunit.Sdk.IXunitTestCase>, System.Threading.CancellationTokenSource)
   at Xunit.Sdk.XunitTestAssemblyRunner+<>c__DisplayClass14_2.<RunTestCollectionsAsync>b__2()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.Tasks.Task+<>c.<.cctor>b__273_0(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteEntry()
   at System.Threading.Tasks.SynchronizationContextTaskScheduler+<>c.<.cctor>b__8_0(System.Object)
   at Xunit.Sdk.MaxConcurrencySyncContext.RunOnSyncContext(System.Threading.SendOrPostCallback, System.Object)
   at Xunit.Sdk.MaxConcurrencySyncContext+<>c__DisplayClass11_0.<WorkerThreadProc>b__0(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at Xunit.Sdk.ExecutionContextHelper.Run(System.Object, System.Action`1<System.Object>)
   at Xunit.Sdk.MaxConcurrencySyncContext.WorkerThreadProc()
   at Xunit.Sdk.XunitWorkerThread+<>c.<QueueUserWorkItem>b__5_0(System.Object)
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task+<>c.<.cctor>b__273_0(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
   at System.Threading.Tasks.ThreadPoolTaskScheduler+<>c.<.cctor>b__10_0(System.Object)
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()
----- end Fri 09/02/2022 18:08:01.56 ----- exit code -1073741819 ----------------------------------------------------------
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 2, 2022
@ghost
Copy link

ghost commented Sep 3, 2022

Tagging subscribers to this area: @dotnet/area-system-composition
See info in area-owners.md if you want to be subscribed.

Issue Details

Found in a release/7.0 backport PR: #75004

Please help determine if this is needs a fix to get backported to 7.0.

C:\h\w\BA110A1A\w\9BF808FF\e>"C:\h\w\BA110A1A\p\dotnet.exe" exec --runtimeconfig System.Composition.TypedParts.Tests.runtimeconfig.json --depsfile System.Composition.TypedParts.Tests.deps.json xunit.console.dll System.Composition.TypedParts.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Composition.TypedParts.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Composition.TypedParts.Tests (found 44 test cases)
  Starting:    System.Composition.TypedParts.Tests (parallel test collections = on, max threads = 2)
Fatal error. Internal CLR error. (0x80131506)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Xunit.Sdk.TestCollectionRunner`1+<RunTestClassesAsync>d__28[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunTestClassesAsync>d__28<System.__Canon> ByRef)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Start[[Xunit.Sdk.TestCollectionRunner`1+<RunTestClassesAsync>d__28[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunTestClassesAsync>d__28<System.__Canon> ByRef)
   at Xunit.Sdk.TestCollectionRunner`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].RunTestClassesAsync()
   at Xunit.Sdk.TestCollectionRunner`1+<RunAsync>d__27[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Xunit.Sdk.TestCollectionRunner`1+<RunAsync>d__27[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunAsync>d__27<System.__Canon> ByRef)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Start[[Xunit.Sdk.TestCollectionRunner`1+<RunAsync>d__27[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], xunit.execution.dotnet, Version=2.4.2.0, Culture=neutral, PublicKeyToken=8d05b1bb7a6fdb6c]](<RunAsync>d__27<System.__Canon> ByRef)
   at Xunit.Sdk.TestCollectionRunner`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].RunAsync()
   at Xunit.Sdk.XunitTestAssemblyRunner.RunTestCollectionAsync(Xunit.Sdk.IMessageBus, Xunit.Abstractions.ITestCollection, System.Collections.Generic.IEnumerable`1<Xunit.Sdk.IXunitTestCase>, System.Threading.CancellationTokenSource)
   at Xunit.Sdk.XunitTestAssemblyRunner+<>c__DisplayClass14_2.<RunTestCollectionsAsync>b__2()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.Tasks.Task+<>c.<.cctor>b__273_0(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteEntry()
   at System.Threading.Tasks.SynchronizationContextTaskScheduler+<>c.<.cctor>b__8_0(System.Object)
   at Xunit.Sdk.MaxConcurrencySyncContext.RunOnSyncContext(System.Threading.SendOrPostCallback, System.Object)
   at Xunit.Sdk.MaxConcurrencySyncContext+<>c__DisplayClass11_0.<WorkerThreadProc>b__0(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at Xunit.Sdk.ExecutionContextHelper.Run(System.Object, System.Action`1<System.Object>)
   at Xunit.Sdk.MaxConcurrencySyncContext.WorkerThreadProc()
   at Xunit.Sdk.XunitWorkerThread+<>c.<QueueUserWorkItem>b__5_0(System.Object)
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task+<>c.<.cctor>b__273_0(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
   at System.Threading.Tasks.ThreadPoolTaskScheduler+<>c.<.cctor>b__10_0(System.Object)
   at System.Threading.Thread+StartHelper.RunWorker()
   at System.Threading.Thread+StartHelper.Run()
   at System.Threading.Thread.StartCallback()
----- end Fri 09/02/2022 18:08:01.56 ----- exit code -1073741819 ----------------------------------------------------------
Author: carlossanlop
Assignees: -
Labels:

area-System.Composition, untriaged

Milestone: -

@jkotas
Copy link
Member

jkotas commented Sep 3, 2022

Stacktrace of the crash

 # Child-SP          RetAddr               Call Site
00 (Inline Function) --------`--------     coreclr!DacEnumerableHashTable<InstMethodHashTable,InstMethodHashEntry,4>::BaseFindNextEntryByHash+0x13 [D:\a\_work\1\s\src\coreclr\vm\dacenumerablehash.inl @ 375] 
01 (Inline Function) --------`--------     coreclr!InstMethodHashTable::FindMethodDesc+0x38c [D:\a\_work\1\s\src\coreclr\vm\instmethhash.cpp @ 151] 
02 (Inline Function) --------`--------     coreclr!InstantiatedMethodDesc::FindLoadedInstantiatedMethodDesc+0x457 [D:\a\_work\1\s\src\coreclr\vm\genmeth.cpp @ 594] 
03 000000ed`9e27db70 00007ffa`c24f8eb9     coreclr!MethodDesc::FindOrCreateAssociatedMethodDesc+0xea2 [D:\a\_work\1\s\src\coreclr\vm\genmeth.cpp @ 1132] 
04 000000ed`9e27e340 00007ffa`c2448e78     coreclr!MethodTable::TryResolveConstraintMethodApprox+0x1e9 [D:\a\_work\1\s\src\coreclr\vm\methodtable.cpp @ 8628] 
05 000000ed`9e27e3d0 00007ffa`c24471e0     coreclr!Dictionary::PopulateEntry+0xe08 [D:\a\_work\1\s\src\coreclr\vm\genericdict.cpp @ 1074] 
06 (Inline Function) --------`--------     coreclr!JIT_GenericHandleWorker+0xa0 [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 3130] 
07 000000ed`9e27e640 00007ffa`c253e917     coreclr!JIT_GenericHandle_Framed+0x1c8 [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 3194] 
08 000000ed`9e27e8e0 00007ffa`635297cd     coreclr!JIT_GenericHandleMethod+0xa7 [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 3224] 
09 000000ed`9e27e950 000000ed`9e27f6b0     System_Private_CoreLib!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start<<RunTestClassesAsync>d__28>+0x9d

@jkotas
Copy link
Member

jkotas commented Sep 3, 2022

The problem is that m_pNextEntry changed from non-null to null between the time we have checked it and fetched it:

while (pVolatileEntry->m_pNextEntry)
{
// Advance to the next entry.
pVolatileEntry = pVolatileEntry->m_pNextEntry;
if (pVolatileEntry->m_iHashValue == iHash)

@VSadov Could you please take a look? It seems to be related to your lock-free changes.

@jkotas jkotas added area-VM-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed area-System.Composition labels Sep 3, 2022
@jkotas jkotas added this to the 7.0.0 milestone Sep 3, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 3, 2022
@VSadov
Copy link
Member

VSadov commented Sep 3, 2022

Is this easy to repro or it was only seen once? (I just wonder if we have a reliable repro)

@jkotas
Copy link
Member

jkotas commented Sep 3, 2022

It was only seen once. No reliable repro.

@AntonLapounov AntonLapounov changed the title CLR failure when running System.Composition.TypedParts tests Race condition in DacEnumerableHashTable::BaseFindNextEntryByHash Sep 5, 2022
@AntonLapounov
Copy link
Member

This might be the cause of intermittent crossgen2 failures, e.g., #74913 (we started to collect dumps just recently). Note that the race condition existed even before #61346. Line 200 in the old code below would set the m_pNextEntry pointer of the first bucket's entry to NULL.

// Try to lock out readers from scanning this bucket. This is obviously a race which may fail.
// However, note that it's OK if somebody is already in the list - it's OK if we mess with the bucket
// groups, as long as we don't destroy anything. The lookup function will still do appropriate
// comparison even if it wanders aimlessly amongst entries while we are rearranging things. If a
// lookup finds a match under those circumstances, great. If not, they will have to acquire the lock &
// try again anyway.
(GetBuckets())[i] = NULL;
while (pEntry != NULL)
{
DWORD dwNewBucket = pEntry->m_iHashValue % cNewBuckets;
PTR_VolatileEntry pNextEntry = pEntry->m_pNextEntry;
pEntry->m_pNextEntry = pNewBuckets[dwNewBucket];
pNewBuckets[dwNewBucket] = pEntry;
pEntry = pNextEntry;
}

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 6, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants