Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCStress: Remove special handing for call to CORINFO_HELP_STOP_FOR_GC #38317

Merged
merged 2 commits into from
Jun 25, 2020

Conversation

AndyAyersMS
Copy link
Member

We shouldn't need this anymore as the case it protects against should be
covered by the new check added in #38246.

We shouldn't need this anymore as the case it protects against should be
covered by the new check added in dotnet#38246.
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 24, 2020
@AndyAyersMS
Copy link
Member Author

@jkotas PTAL. Will add on some GC stress legs if this looks good.

cc @BruceForstall in case you're interested.

Ran this locally on the test from #37236 and let it run through a few thousand iterations with no failures. GC counts looking comparable to before as well.

// 6) Now, thread T can modify the stack (ex: RedirectionFrame setup) while the GC thread is scanning it.
//
// This race is now mitigated below. Where we won't initiate a stress mode GC
// for a thread in cooperative mode with an active ICF, if g_TtrapReturningThreads is true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// for a thread in cooperative mode with an active ICF, if g_TtrapReturningThreads is true.
// for a thread in cooperative mode with an active ICF, if g_TrapReturningThreads is true.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@AndyAyersMS
Copy link
Member Author

The Interop\COM\NativeClients\Events test failed under gc stress on xarch. Looking at one of the dumps, the finalizer thread is hitting some kind of recursive exception. Bottom of the stack is:

41e ntdll!RtlDispatchException
41f ntdll!KiUserExceptionDispatch
420 ntdll!RtlpIncrementCriticalSectionContentionCount
421 ntdll!RtlpWaitOnCriticalSection
422 ntdll!RtlpEnterCriticalSectionContended
423 ntdll!RtlEnterCriticalSection
424 COMClientEvents!__acrt_lock
425 COMClientEvents!_free_dbg
426 COMClientEvents!__crt_internal_free_policy::operator()<char>
427 COMClientEvents!__crt_unique_heap_ptr<char,__crt_internal_free_policy>::release
428 COMClientEvents!__crt_unique_heap_ptr<char,__crt_internal_free_policy>::~__crt_unique_heap_ptr<char,__crt_internal_free_policy>
429 COMClientEvents!__crt_stdio_output::formatting_buffer::~formatting_buffer
42a COMClientEvents!__crt_stdio_output::common_data<wchar_t>::~common_data<wchar_t>
42b COMClientEvents!__crt_stdio_output::output_adapter_data<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> >::~output_adapter_data<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> >
42c COMClientEvents!__crt_stdio_output::standard_base<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> >::~standard_base<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> >
42d COMClientEvents!__crt_stdio_output::format_validation_base<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> >::~format_validation_base<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> >
42e COMClientEvents!__crt_stdio_output::output_processor<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t>,__crt_stdio_output::format_validation_base<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> > >::~output_processor<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t>,__crt_stdio_output::format_validation_base<wchar_t,__crt_stdio_output::string_output_adapter<wchar_t> > >
42f COMClientEvents!common_vsprintf<__crt_stdio_output::format_validation_base,wchar_t>
430 COMClientEvents!common_vsnprintf_s<wchar_t>
431 COMClientEvents!__stdio_common_vsnwprintf_s
432 COMClientEvents!_VCrtDbgReportW
433 COMClientEvents!_CrtDbgReportW
434 COMClientEvents!issue_debug_notification
435 COMClientEvents!__acrt_report_runtime_error
436 COMClientEvents!abort
437 COMClientEvents!__vcrt_getptd
438 COMClientEvents!__CxxFrameHandler4
439 ntdll!RtlpExecuteHandlerForException
43a ntdll!RtlDispatchException
43b ntdll!KiUserExceptionDispatch
43c ntdll!RtlpIncrementCriticalSectionContentionCount
43d ntdll!RtlpWaitOnCriticalSection
43e ntdll!RtlpEnterCriticalSectionContended
43f ntdll!RtlEnterCriticalSection
440 COMClientEvents!_Mtxlock
441 COMClientEvents!std::_Lockit::_Lockit
442 COMClientEvents!std::_Container_base12::_Orphan_all
443 COMClientEvents!std::_Tree_val<std::_Tree_simple_types<std::pair<long const ,std::basic_string<unsigned short,std::char_traits<unsigned short>,std::allocator<unsigned short> > > > >::_Erase_head<std::allocator<std::_Tree_node<std::pair<long const ,std::basic_string<unsigned short,std::char_traits<unsigned short>,std::allocator<unsigned short> > >,void *> > >
444 COMClientEvents!std::_Tree<std::_Tmap_traits<long,std::basic_string<unsigned short,std::char_traits<unsigned short>,std::allocator<unsigned short> >,std::less<long>,std::allocator<std::pair<long const ,std::basic_string<unsigned short,std::char_traits<unsigned short>,std::allocator<unsigned short> > > >,0> >::{dtor}
445 COMClientEvents!`anonymous namespace'::EventSink::`scalar deleting destructor'
446 COMClientEvents!UnknownImpl::DoRelease
447 COMClientEvents!`anonymous namespace'::EventSink::Release
448 coreclr!SafeReleasePreemp
449 coreclr!RCW::ReleaseAllInterfaces
44a coreclr!RCW::ReleaseAllInterfacesCallBack
44b coreclr!RCW::Cleanup
44c coreclr!RCWCleanupList::ReleaseRCWListRaw
44d coreclr!RCWCleanupList::ReleaseRCWListInCorrectCtx
44e coreclr!RCWCleanupList::CleanupAllWrappers
44f coreclr!`SyncBlockCache::CleanupSyncBlocks'::`11'::__Body::Run
450 coreclr!SyncBlockCache::CleanupSyncBlocks
451 coreclr!Thread::DoExtraWorkForFinalizer
452 coreclr!FinalizerThread::FinalizerThreadWorker
453 coreclr!ManagedThreadBase_DispatchInner
454 coreclr!ManagedThreadBase_DispatchMiddle
455 coreclr!``ManagedThreadBase_DispatchOuter'::`11'::__Body::Run'::`5'::__Body::Run
456 coreclr!`ManagedThreadBase_DispatchOuter'::`11'::__Body::Run
457 coreclr!ManagedThreadBase_DispatchOuter
458 coreclr!ManagedThreadBase_NoADTransition
459 coreclr!FinalizerThread::FinalizerThreadStart
45a kernel32!BaseThreadInitThunk
45b ntdll!RtlUserThreadStart

and the lower-most critical section looks like it has been freed

    [+0x000] DebugInfo        : 0x0 [Type: _RTL_CRITICAL_SECTION_DEBUG *]
    [+0x008] LockCount        : -6 [Type: long]
    [+0x00c] RecursionCount   : 0 [Type: long]
    [+0x010] OwningThread     : 0x0 [Type: void *]
    [+0x018] LockSemaphore    : 0xffffffffffffffff [Type: void *]
    [+0x020] SpinCount        : 0x0 [Type: unsigned __int64]

main thread is trying to exit

00 ntdll!ZwWaitForSingleObject
01 ntdll!LdrpDrainWorkQueue
02 ntdll!RtlExitUserProcess
03 kernel32!FatalExit
04 COMClientEvents!exit_or_terminate_process
05 COMClientEvents!common_exit
06 COMClientEvents!exit
07 COMClientEvents!__scrt_common_main_seh
08 COMClientEvents!__scrt_common_main
09 COMClientEvents!mainCRTStartup
0a kernel32!BaseThreadInitThunk
0b ntdll!RtlUserThreadStart

I suspect this change has altered thread timing and exposed a shutdown race in the test case. It runs fine locally.

@AaronRobinsonMSFT does this seem plausible?

@AaronRobinsonMSFT
Copy link
Member

I suspect this change has altered thread timing and exposed a shutdown race in the test case. It runs fine locally.

@AaronRobinsonMSFT does this seem plausible?

@AndyAyersMS Sigh... The short answer is "yes" this is a shut down race using the STL. The std::_Lockit type is typically used for debug iterators and attempts to detect undefined behavior in usage. Since this is in the shutdown path the cleanup of std types from the finalizer thread seems to have some unexpected behavior - probably because the CRT is in the middle of being or has been shutdown from the main thread.

Feel free to update the test use of std::map below or file a bug and assign it to me. I can handle this tomorrow.

class EventSink : public UnknownImpl, public TestingEvents
{
std::map<DISPID, std::wstring> _firedEvents;

@AndyAyersMS
Copy link
Member Author

Thanks Aaron.

There are some timeouts in Linux testing across arm, arm32, and x64. For the arm and arm64 cases there's now one extra stress interrupt in the pinvoke path -- would not surprise me if these interrupts are more costly on linux vs windows. Still it seems a bit hard to believe this relatively small number of extra interrupts (that do not induce gcs) could dramatically impact timing.

For x64 it seems unlikely the new suppression logic is more costly than the old, so not sure what's up there either.

I am going to see if can get a handle on slowdown in GC stress with this change.

@AaronRobinsonMSFT
Copy link
Member

/cc @janvorli

@AndyAyersMS
Copy link
Member Author

arm32 gcstress timeouts look similar to the ones we've seen past runs, eg https://dev.azure.com/dnceng/public/_build/results?buildId=659155&view=ms.vss-test-web.build-test-results-tab

Note w/o stress some of these test finish in ~5 seconds. Timeout is set to 1 hour.

@BruceForstall is this a known issue?

@BruceForstall
Copy link
Member

Yes, there are lots of GCStress=3 arm32 timeouts. There probably isn't a current issue; I was planning to open one when the rest of the noise is eliminated.

There is also #38230 for Linux arm32 failures.

@AndyAyersMS
Copy link
Member Author

Ok, I'm going to merge this.

@BruceForstall keep this one in mind if you see any regressions in next weekend's GC stress runs.

@AndyAyersMS AndyAyersMS merged commit 7058b5e into dotnet:master Jun 25, 2020
@AndyAyersMS AndyAyersMS deleted the GCStressFollowUp branch June 25, 2020 23:41
@xiangzhai
Copy link
Contributor

:mips-interest

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants