-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition between GCInfo and Rundown #70609
Conversation
This makes the JIT/EE contract complicated. Instead of teaching JIT the intricacies of publishing the code and related artifacts that is VM implementation detail, it would be better to publish everything in the right order once the JIT returns here: runtime/src/coreclr/vm/jitinterface.cpp Line 12954 in 86a59cd
|
@jkotas Looking at the prior art, it seems like Then |
Yep. |
With the test app below it would hit asserts on a checked build within a minute before, and I have run it for an hour or so with no issue now on windows x64. On x86 it runs out of memory after ~25k dynamic methods, but works fine up until then. I am going to run it against linux arm32 to make sure, but I have no reason to suspect it will be any different there
|
src/coreclr/vm/eventtrace.cpp
Outdated
// it is not in use yet. | ||
#ifdef TARGET_X86 | ||
hdrInfo gcInfo; | ||
DecodeGCHdrInfo(codeInfo.GetGCInfoToken(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this rather check for NULL GetGCInfoToken? I do not think we should be attempting to decode the GC info that has not been published.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe this change is not needed with the new approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a couple hours today trying to convince myself if this check is needed or not, and the more I look at it the more convinced I am that the race condition is different than the original hypothesis
The EECodeHeapIterator
uses MethodSectionIterator
to go through all active jitted methods, and it will only return a method if the appropriate index in HeapList::pHdrMap
is set.
But we only set the index in pHdrMap
in EEJitManager::NibbleMapSet
, which is called from CEEJitInfo::WriteCode
, after the GCInfo is generated.
Either I'm missing something or the real issue is a combination of the freeing happening in the wrong order on all arches, and then pointer tearing on arm archictures because the publishing for codeHeaders happens outside the lock
runtime/src/coreclr/vm/jitinterface.cpp
Line 10927 in 48adb84
memcpy(codeWriterHolder.GetRW(), m_CodeHeaderRW, m_codeWriteBufferSize); |
If I run my repro app with the fix to move freeing the code header to after freeing the code data, I no longer hit the assert on x64, which suggests but does not confirm my hypothesis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the memcpy I point out there also happens before NibbleMapSet
, so I think I am missing something
src/coreclr/vm/jitinterface.cpp
Outdated
@@ -10970,6 +10970,12 @@ void CEEJitInfo::WriteCode(EEJitManager * jitMgr) | |||
UnwindInfoTable::PublishUnwindInfoForMethod(m_moduleBase, m_CodeHeader->GetUnwindInfo(0), m_totalUnwindInfos); | |||
#endif // defined(TARGET_AMD64) | |||
|
|||
{ | |||
ExecutableWriterHolder<BYTE *> gcInfoWriterHolder(m_CodeHeader->GetGCInfoAddr(), sizeof(void *)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExecutableWriterHolder
is not a cheap operation. WriteCodeBytes
has one already. Can we refactor such that we have just one ExecutableWriterHolder for both operations?
src/coreclr/vm/codeman.cpp
Outdated
@@ -3215,21 +3215,22 @@ BYTE* EEJitManager::allocGCInfo(CodeHeader* pCodeHeader, DWORD blockSize, size_t | |||
} CONTRACTL_END; | |||
|
|||
MethodDesc* pMD = pCodeHeader->GetMethodDesc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just pass in the MethodDesc
instead of the whole CodeHeader
. This method does not need the CodeHeader
anymore.
src/coreclr/vm/jitinterface.cpp
Outdated
block = m_jitManager->allocGCInfo(m_CodeHeaderRW,(DWORD)size, &m_GCinfo_len); | ||
if (!block) | ||
m_pGCInfo = m_jitManager->allocGCInfo(m_CodeHeaderRW,(DWORD)size, &m_GCinfo_len); | ||
if (!m_pGCInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed. allocGCInfo
throws on OOM.
@davmason is this PR still active or should we move to draft mode? |
Still active, just didn't finish it before I took vacation |
@jkotas - I've tested the heck out of it and have convinced myself the only change needed is to free the dynamic code heaps after freeing the code data. I can run my test program for hours without a crash with just that change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
The test failure is #70450 |
Fixes #69375