Fix GCStress timeouts in JIT/jit64 #85040

markples · 2023-04-19T10:40:34Z

This includes several changes that seem to help with the timeouts. It might be overkill but seems like a good direction as this has been broken for a while.

Change the test wrapper logic to only put one test in a TestExecutor so that the callstacks are much simpler.
Factor the test wrapper logic into some helpers to simplify the main method. I also tried to make the "Full" and "XHarness" code generation very similar but didn't try to factor/unify them.
Mark several tests as RequiresProcessIsolation so that their gcstress is kept separate from the rest of the tests. Disables a large test under gcstress.
Add gcstress striping to some merged groups.

Should fix #85590

ghost · 2023-04-19T10:40:46Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	markples
Assignees:	markples
Labels:	`area-CodeGen-coreclr`
Milestone:	-

markples · 2023-04-19T10:42:21Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-04-19T10:42:33Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-04-19T10:43:24Z

PTAL @kunalspathak (and this should help with the weekend gcstress failure)

markples · 2023-04-20T07:00:01Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-04-20T07:00:20Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-04-20T17:18:56Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-04-20T17:19:07Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-04-21T03:46:56Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-04-21T03:47:09Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-04-21T17:34:51Z

@trylek @davidwrighton We've been having gcstress timeouts occur every time we add merged test groups. The behavior has indicated some degradation over time within a gcstress process (probably the original motivation for striping). However, we've also seen individual tests take much longer, even when first or early in a merged test group run. My new theory is that the extra stack frames have a prohibitively high cost (and like it's just the test executor methods with the N try/catch blocks).

The current iteration of this PR is (overly) aggressive at simplifying the stack. It also still marks several tests as RequiresProcessIsolation as leftover from my initial experiments. Before I go further, I was hoping to get some feedback on the area. My thought is to just go to one test per TestExecutor (and therefore simplify the logic there), make XHarnessTestRunner match it for consistency, and keep the RPIs in order to get gcstress testing unblocked. They can be removed in the future, though this is low priority since individual tests don't hurt test throughput too much.

markples · 2023-04-21T18:24:50Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-04-21T18:25:01Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-04-25T00:32:26Z

fyi - I'm now looking at using BuildAsStandalone in gcstress builds to completely avoid merged test groups for now. See #85284 though it will probably take a few rounds for me to get the yaml right.

trylek · 2023-05-04T18:21:04Z

@markples - Do you think we might be able to reduce some of these costs by emitting calls to the individual test entrypoints through helper methods so that each such helper method would have just the one try-catch block?

markples · 2023-05-04T19:04:10Z

@trylek This PR currently does that (it was easy by setting the grouping value to 1). I think that it helped but still hit a problem (though it's been long enough that I don't remember the details), which is why I had shelved this and was trying the BuildAsStandalone thing. However, that has hit an issue that (at least) one of the HardwareIntrinsics projects is big enough to time out (test merging can stripe -within- a project since it is dealing with individual tests).

markples · 2023-05-05T19:24:18Z

fyi - this is close but I'm waiting for test results

markples · 2023-05-11T23:46:54Z

@trylek I propose that we move forward with these fixes for now. They might be overkill, and we might change things again in the future, but this gets jit64 gcstress under control and lets us move forward. A few JIT\Regression legs are still slow but working.

(also resetting @kunalspathak 's review since much has changed since then)

The code has changed a lot

trylek

Looks great to me, thanks Mark!

trylek

Looks great to me, thanks Mark!

markples · 2023-05-12T08:11:14Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2023-05-12T08:11:31Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-05-12T19:27:02Z

MemorySsa is failing elsewhere.

markples · 2023-05-12T19:27:44Z

running gcstress yet again because my other change restructued the groups

markples · 2023-05-12T19:27:47Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-05-12T19:27:58Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-05-15T17:54:55Z

/azp run runtime, runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-05-15T17:55:29Z

Azure Pipelines successfully started running 2 pipeline(s).

markples · 2023-05-16T00:36:37Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-05-16T00:36:47Z

Azure Pipelines successfully started running 1 pipeline(s).

This reverts commit 0992368.

…ns.csproj

markples · 2023-05-17T18:00:46Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-05-17T18:01:01Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-05-18T06:28:06Z

Previous test run might have passed.. but the devops machine flaked out. JIT/jit64 and JIT/opt appear to be ok.

markples · 2023-05-18T06:28:11Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2023-05-18T06:28:25Z

Azure Pipelines successfully started running 1 pipeline(s).

markples · 2023-05-18T11:25:13Z

Some of the gc stress legs are still quite slow, suggesting more striping would be desirable. Hopefully this current run is sufficient to unblock testing and that striping can be handled separately, but osx arm64 continues to be stubborn with this.

markples · 2023-05-18T18:03:17Z

Build analysis is showing a failure from a previous run of runtime-coreclr gcstress0x3-gcstress0xc. (perhaps of interest to @JulieLeeMSFT @trylek @ivdiazsa ?)

Failure was in https://dev.azure.com/dnceng-public/public/_build/results?buildId=277134&view=results
Current run is https://dev.azure.com/dnceng-public/public/_build/results?buildId=277895&view=results

Add GCStress striping to jit64_do

cb75c0b

ghost assigned markples Apr 19, 2023

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 19, 2023

kunalspathak previously approved these changes Apr 19, 2023

View reviewed changes

build-analysis bot mentioned this pull request Apr 19, 2023

Tracking issue for CI build timeouts #76454

Closed

Move a number of tests out-of-proc for GCStress

741d10a

build-analysis bot mentioned this pull request Apr 20, 2023

[wasm] interpreter timeouts when WebSocket closes unexpectedly #84101

Closed

WaitForPendingFinalizers

adc6e98

markples changed the title ~~Add GCStress striping to jit64_do~~ Fix GCStress timeouts in JIT/jit64 Apr 20, 2023

build-analysis bot mentioned this pull request Apr 20, 2023

IOException running NuGet-Migrations during tests in dotnet CLI first run #80619

Closed

markples added 2 commits April 20, 2023 20:46

Another RPI

35e18e6

Trying something else

0f14826

another rpi

69132b6

Merge remote-tracking branch 'dotnet/main' into jit64_do

507a79e

markples mentioned this pull request May 5, 2023

Set BuildAsStandalone for gcstress test builds #85284

Closed

markples requested a review from kunalspathak May 11, 2023 23:47

trylek approved these changes May 12, 2023

View reviewed changes

Merge remote-tracking branch 'dotnet/main' into jit64_do

2aa6584

stripe gcstresse JIT.opt.csproj

0992368

markples added 2 commits May 17, 2023 10:52

Revert "stripe gcstresse JIT.opt.csproj"

a18841c

This reverts commit 0992368.

Disable gcstress testing of JIT/opt/Regressions/Regression2_Regressio…

81c1981

…ns.csproj

markples merged commit fb0b206 into dotnet:main May 18, 2023

ghost locked as resolved and limited conversation to collaborators Jun 17, 2023

Fix GCStress timeouts in JIT/jit64 #85040

Fix GCStress timeouts in JIT/jit64 #85040

Conversation

markples commented Apr 19, 2023 • edited Loading

ghost commented Apr 19, 2023

markples commented Apr 19, 2023

azure-pipelines bot commented Apr 19, 2023

markples commented Apr 19, 2023

markples commented Apr 20, 2023

azure-pipelines bot commented Apr 20, 2023

markples commented Apr 20, 2023

azure-pipelines bot commented Apr 20, 2023

markples commented Apr 21, 2023

azure-pipelines bot commented Apr 21, 2023

markples commented Apr 21, 2023

markples commented Apr 21, 2023

azure-pipelines bot commented Apr 21, 2023

markples commented Apr 25, 2023

trylek commented May 4, 2023

markples commented May 4, 2023

markples commented May 5, 2023

markples commented May 11, 2023

trylek left a comment

Choose a reason for hiding this comment

trylek left a comment

Choose a reason for hiding this comment

markples commented May 12, 2023

azure-pipelines bot commented May 12, 2023

markples commented May 12, 2023

markples commented May 12, 2023

markples commented May 12, 2023

azure-pipelines bot commented May 12, 2023

markples commented May 15, 2023

azure-pipelines bot commented May 15, 2023

markples commented May 16, 2023

azure-pipelines bot commented May 16, 2023

markples commented May 17, 2023

azure-pipelines bot commented May 17, 2023

markples commented May 18, 2023

markples commented May 18, 2023

azure-pipelines bot commented May 18, 2023

markples commented May 18, 2023

markples commented May 18, 2023

markples commented Apr 19, 2023 •

edited

Loading