-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Buffers.ArrayPool.Tests.ArrayPoolUnitTests.RentBufferFiresRentedDiagnosticEvent failure #42899
Comments
Tagging subscribers to this area: @tannergooding, @pgovind, @jeffhandley |
Tagging subscribers to this area: @tommcdon |
Given that this is diagnostic event test and that the process spent just 10ms of processor time, this is most likely event pipe race condition that results into a hand during startup. |
I looked at the test: runtime/src/libraries/System.Buffers/tests/ArrayPool/UnitTests.cs Lines 402 to 419 in 642b7cb
I don't think this would go through EventPipe. If I recall correctly, because this is a managed Looking at the history it looks like this is the first failure of this test in recent history as well. There isn't a dump of the remote executor process in AzDO/Helix, so we'll have to see if this happens again to get more details. |
According to the log, the process spent 10ms of processor time. 10ms of processor time is not enough to start executing managed code. I think this got stuck in the unmanaged runtime startup, most likely in the unmanaged eventpipe or tracing initialization - since it happened while executing tests that stress these components. |
EventPipe should be dormant for this test. Since the event producer is a managed EventSource and the consumer is an EventListener in process, EventPipe shouldn't be doing anything out of the ordinary especially before managed code runs. All the eventing in this test is happening in managed code. I agree that something must have happened before managed code executed, though, I'm just not sure it was EventPipe. Have we seen any similar failures where a RemoteExecutor based test fails on timeout with the remote process only executing for a few milliseconds? |
failed again in job:runtime 20201102.59 failed test: System.Buffers.ArrayPool.Tests.ArrayPoolUnitTests.RentBufferFiresRentedDiagnosticEvent net6.0-OSX-Release-x64-CoreCLR_checked-OSX.1013.Amd64.Open Error message
|
This has happened recently again (i.e. in https://dev.azure.com/dnceng/public/_build/results?buildId=871598&view=ms.vss-test-web.build-test-results-tab&runId=27884432&resultId=158379&paneView=debug). Updated the issue to be live tracking via runfo. |
We also saw #44037 failing very frequently recently; I disabled the test as it was taking out too many PRs, but it had a similar symptom. |
failed again in job: runtime 20201122.5 failed test: System.Buffers.ArrayPool.Tests.ArrayPoolUnitTests.RentBufferFiresRentedDiagnosticEvent net6.0-OSX-Release-x64-CoreCLR_checked-OSX.1013.Amd64.Open Error message
|
Happened again on: #45430
I'm going to disable the test. |
According to runfo we hit this 36 times on the last month, 8 times this week and 1 time today. https://runfo.azurewebsites.net/tracking/issue/90 An interesting pattern is that it only happens under a Checked CoreCLR. |
@jkotas do you think this could be because of a checked coreclr being slow that remote executor is killing it before it finishes? If so, we can just add |
Just based on the remote processes' stacks above, I think it's more likely this is a deadlock. |
Ok thanks for looking. I will then disable with active issue. |
I can take a look at fixing this @safern. Please tag/assign me in the issue you open. |
Sounds good. I will just open a PR to disable the test while this issue is fixed, thanks for looking into it. |
It is a classic A-B B-A deadlock: Main thread is waiting for this lock:
The lock is taken by this thread that is waiting on a lock taken by the main thread:
|
It's this issue #45059 ? Some of the EventSource initalizations (e.g. manifest generation) will request an ArrayPool buffer which then calls ArrayPool.Log to check if its enabled, which then requires the same lock |
I think |
@benaadams This is a different issue from #45059 (deadlock in the EventListener implementation used in the test) but yes, PortableThreadPool is triggering the deadlock more frequently. |
The test failed in CI on my PR (with change unrelated to the issue):
https://dev.azure.com/dnceng/public/_build/results?buildId=836124&view=ms.vss-test-web.build-test-results-tab&runId=26640246&resultId=172262&paneView=history
Runfo Tracking Issue: system.buffers.arraypool.tests.arraypoolunittests.rentbufferfiresrenteddiagnosticevent
Build Result Summary
The text was updated successfully, but these errors were encountered: