leaktest: recover from panics #45983

tbg · 2020-03-11T14:33:58Z

This turns tests that panic into failed tests, which means that the test
suite has a chance of running to completion.

Release justification: testing-only change
Release note: None

This turns tests that panic into failed tests, which means that the test suite has a chance of running to completion. Release justification: testing-only change Release note: None

cockroach-teamcity · 2020-03-11T14:34:07Z

This change is

tbg · 2020-03-11T14:35:00Z

This was useful for #45984.

tbg · 2020-12-03T12:58:49Z

bors r=nvanbenschoten

craig · 2020-12-03T13:29:59Z

Build succeeded:

GitHub CI (Cockroach)

This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic when invoked on a panic unwind, instead of failing the test but otherwise swallowing the panic and letting further tests run. Recovering from panics is generally not a Go thing to do. The leaktest can only recover from panics on the test's main goroutine (panics on other goroutines, for example panics in a TestServer/TestCluster still crash the binary), so the leaktest's behavior is inconsistent at best. The panic-swallowing behavior is also surprising and magical - tests using leaktest get it, tests not using leaktests don't get it, and figuring out this difference is hard. Additionally, having the leaktest recover panics is contageous - it puts pressure on other random code to be mindful of its behavior under panic unwind. For example, the Stopper has complex logic for recovering panics and doing best-effort cleanup, recovering more panics from confused cleanup code, so that ultimately future tests can run in a more or less sane environment. I don't think the magic is worth it. Let's try again without it. I believe that, at the time the leaktest magic was added, panics in go tests were not resulting in a very clean experience in CI - it wasn't trivial to see what test failed. I don't know if that's still the case. Release note: None Epic: None

Before this patch, Stopper.Stop() would do a "best-effort" shutdown when called while a panic is unwinding: it would signal quiescence, and then run all closer async while swallowing their panics, then wait a bit, then re-panic. If the stopper is to do any shutdown on panics, this shutdown has to be somewhat involved, since some closers are known to panic easily when closed in an unclean state (e.g. memory accounts with bytes taken out of them panic), and tasks can fail to respond to quiescence, etc. This is all too complicated, and pretty unsound, for little, if any, gain, so this patch gets rid of it all in favor of not doing any shutdown and re-panicking immediately. This best-effort shutdown was added in cockroachdb#3581, for the purported benefit of CLI tests that wanted to trigger panic code-paths. The respective code is long gone - as far as I can dig up, it had to do with ancient, per-SQL cli kv-client commands. Over time, this panic recovery became load-bearing in cockroachdb#45983, when leaktest was modified to not crash the package binary on panics, and instead continue with further tests. If we are to continue with other tests after a panicking one, we'd better try to cleanup the panicking TestServer as best we can. The leaktest panic behavior is reverted in the prior commit. Release note: None

This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic when invoked on a panic unwind, instead of failing the test but otherwise swallowing the panic and letting further tests run. Recovering from panics is generally not a Go thing to do. The leaktest can only recover from panics on the test's main goroutine (panics on other goroutines, for example panics in a TestServer/TestCluster still crash the binary), so the leaktest's behavior is inconsistent at best. The panic-swallowing behavior is also surprising and magical - tests using leaktest get it, tests not using leaktests don't get it, and figuring out this difference is hard. Additionally, having the leaktest recover panics is contageous - it puts pressure on other random code to be mindful of its behavior under panic unwind. For example, the Stopper has complex logic for recovering panics and doing best-effort cleanup, recovering more panics from confused cleanup code, so that ultimately future tests can run in a more or less sane environment. I don't think the magic is worth it. Let's try again without it. I believe that, at the time the leaktest magic was added, panics in go tests were not resulting in a very clean experience in CI - it wasn't trivial to see what test failed. I don't know if that's still the case. Release note: None Epic: None

Before this patch, Stopper.Stop() would do a "best-effort" shutdown when called while a panic is unwinding: it would signal quiescence, and then run all closer async while swallowing their panics, then wait a bit, then re-panic. If the stopper is to do any shutdown on panics, this shutdown has to be somewhat involved, since some closers are known to panic easily when closed in an unclean state (e.g. memory accounts with bytes taken out of them panic), and tasks can fail to respond to quiescence, etc. This is all too complicated, and pretty unsound, for little, if any, gain, so this patch gets rid of it all in favor of not doing any shutdown and re-panicking immediately. This best-effort shutdown was added in cockroachdb#3581, for the purported benefit of CLI tests that wanted to trigger panic code-paths. The respective code is long gone - as far as I can dig up, it had to do with ancient, per-SQL cli kv-client commands. Over time, this panic recovery became load-bearing in cockroachdb#45983, when leaktest was modified to not crash the package binary on panics, and instead continue with further tests. If we are to continue with other tests after a panicking one, we'd better try to cleanup the panicking TestServer as best we can. The leaktest panic behavior is reverted in the prior commit. Release note: None

leaktest: recover from panics

8398c22

This turns tests that panic into failed tests, which means that the test suite has a chance of running to completion. Release justification: testing-only change Release note: None

tbg requested a review from nvanbenschoten March 11, 2020 14:33

nvanbenschoten approved these changes Mar 11, 2020

View reviewed changes

craig bot merged commit b02c788 into cockroachdb:master Dec 3, 2020

andreimatei mentioned this pull request Dec 1, 2022

leaktest: don't recover from panics #92874

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leaktest: recover from panics #45983

leaktest: recover from panics #45983

tbg commented Mar 11, 2020

cockroach-teamcity commented Mar 11, 2020

tbg commented Mar 11, 2020

tbg commented Dec 3, 2020

craig bot commented Dec 3, 2020

leaktest: recover from panics #45983

leaktest: recover from panics #45983

Conversation

tbg commented Mar 11, 2020

cockroach-teamcity commented Mar 11, 2020

tbg commented Mar 11, 2020

tbg commented Dec 3, 2020

craig bot commented Dec 3, 2020