-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leaktest: recover from panics #45983
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This turns tests that panic into failed tests, which means that the test suite has a chance of running to completion. Release justification: testing-only change Release note: None
This was useful for #45984. |
nvanbenschoten
approved these changes
Mar 11, 2020
bors r=nvanbenschoten |
Build succeeded: |
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this pull request
Nov 28, 2022
This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic when invoked on a panic unwind, instead of failing the test but otherwise swallowing the panic and letting further tests run. Recovering from panics is generally not a Go thing to do. The leaktest can only recover from panics on the test's main goroutine (panics on other goroutines, for example panics in a TestServer/TestCluster still crash the binary), so the leaktest's behavior is inconsistent at best. The panic-swallowing behavior is also surprising and magical - tests using leaktest get it, tests not using leaktests don't get it, and figuring out this difference is hard. Additionally, having the leaktest recover panics is contageous - it puts pressure on other random code to be mindful of its behavior under panic unwind. For example, the Stopper has complex logic for recovering panics and doing best-effort cleanup, recovering more panics from confused cleanup code, so that ultimately future tests can run in a more or less sane environment. I don't think the magic is worth it. Let's try again without it. I believe that, at the time the leaktest magic was added, panics in go tests were not resulting in a very clean experience in CI - it wasn't trivial to see what test failed. I don't know if that's still the case. Release note: None Epic: None
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this pull request
Nov 28, 2022
Before this patch, Stopper.Stop() would do a "best-effort" shutdown when called while a panic is unwinding: it would signal quiescence, and then run all closer async while swallowing their panics, then wait a bit, then re-panic. If the stopper is to do any shutdown on panics, this shutdown has to be somewhat involved, since some closers are known to panic easily when closed in an unclean state (e.g. memory accounts with bytes taken out of them panic), and tasks can fail to respond to quiescence, etc. This is all too complicated, and pretty unsound, for little, if any, gain, so this patch gets rid of it all in favor of not doing any shutdown and re-panicking immediately. This best-effort shutdown was added in cockroachdb#3581, for the purported benefit of CLI tests that wanted to trigger panic code-paths. The respective code is long gone - as far as I can dig up, it had to do with ancient, per-SQL cli kv-client commands. Over time, this panic recovery became load-bearing in cockroachdb#45983, when leaktest was modified to not crash the package binary on panics, and instead continue with further tests. If we are to continue with other tests after a panicking one, we'd better try to cleanup the panicking TestServer as best we can. The leaktest panic behavior is reverted in the prior commit. Release note: None
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this pull request
Dec 2, 2022
This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic when invoked on a panic unwind, instead of failing the test but otherwise swallowing the panic and letting further tests run. Recovering from panics is generally not a Go thing to do. The leaktest can only recover from panics on the test's main goroutine (panics on other goroutines, for example panics in a TestServer/TestCluster still crash the binary), so the leaktest's behavior is inconsistent at best. The panic-swallowing behavior is also surprising and magical - tests using leaktest get it, tests not using leaktests don't get it, and figuring out this difference is hard. Additionally, having the leaktest recover panics is contageous - it puts pressure on other random code to be mindful of its behavior under panic unwind. For example, the Stopper has complex logic for recovering panics and doing best-effort cleanup, recovering more panics from confused cleanup code, so that ultimately future tests can run in a more or less sane environment. I don't think the magic is worth it. Let's try again without it. I believe that, at the time the leaktest magic was added, panics in go tests were not resulting in a very clean experience in CI - it wasn't trivial to see what test failed. I don't know if that's still the case. Release note: None Epic: None
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this pull request
Dec 2, 2022
Before this patch, Stopper.Stop() would do a "best-effort" shutdown when called while a panic is unwinding: it would signal quiescence, and then run all closer async while swallowing their panics, then wait a bit, then re-panic. If the stopper is to do any shutdown on panics, this shutdown has to be somewhat involved, since some closers are known to panic easily when closed in an unclean state (e.g. memory accounts with bytes taken out of them panic), and tasks can fail to respond to quiescence, etc. This is all too complicated, and pretty unsound, for little, if any, gain, so this patch gets rid of it all in favor of not doing any shutdown and re-panicking immediately. This best-effort shutdown was added in cockroachdb#3581, for the purported benefit of CLI tests that wanted to trigger panic code-paths. The respective code is long gone - as far as I can dig up, it had to do with ancient, per-SQL cli kv-client commands. Over time, this panic recovery became load-bearing in cockroachdb#45983, when leaktest was modified to not crash the package binary on panics, and instead continue with further tests. If we are to continue with other tests after a panicking one, we'd better try to cleanup the panicking TestServer as best we can. The leaktest panic behavior is reverted in the prior commit. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This turns tests that panic into failed tests, which means that the test
suite has a chance of running to completion.
Release justification: testing-only change
Release note: None