Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leaktest: recover from panics #45983

Merged
merged 1 commit into from
Dec 3, 2020
Merged

leaktest: recover from panics #45983

merged 1 commit into from
Dec 3, 2020

Conversation

tbg
Copy link
Member

@tbg tbg commented Mar 11, 2020

This turns tests that panic into failed tests, which means that the test
suite has a chance of running to completion.

Release justification: testing-only change
Release note: None

This turns tests that panic into failed tests, which means that the test
suite has a chance of running to completion.

Release justification: testing-only change
Release note: None
@tbg tbg requested a review from nvanbenschoten March 11, 2020 14:33
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member Author

tbg commented Mar 11, 2020

This was useful for #45984.

@tbg
Copy link
Member Author

tbg commented Dec 3, 2020

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Dec 3, 2020

Build succeeded:

@craig craig bot merged commit b02c788 into cockroachdb:master Dec 3, 2020
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Nov 28, 2022
This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic
when invoked on a panic unwind, instead of failing the test but
otherwise swallowing the panic and letting further tests run. Recovering
from panics is generally not a Go thing to do. The leaktest can only
recover from panics on the test's main goroutine (panics on other
goroutines, for example panics in a TestServer/TestCluster still crash
the binary), so the leaktest's behavior is inconsistent at best. The
panic-swallowing behavior is also surprising and magical - tests using
leaktest get it, tests not using leaktests don't get it, and figuring
out this difference is hard. Additionally, having the leaktest recover
panics is contageous - it puts pressure on other random code to be
mindful of its behavior under panic unwind. For example, the Stopper has
complex logic for recovering panics and doing best-effort cleanup,
recovering more panics from confused cleanup code, so that ultimately
future tests can run in a more or less sane environment.

I don't think the magic is worth it. Let's try again without it.

I believe that, at the time the leaktest magic was added, panics in go
tests were not resulting in a very clean experience in CI - it wasn't
trivial to see what test failed. I don't know if that's still the case.

Release note: None
Epic: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Nov 28, 2022
Before this patch, Stopper.Stop() would do a "best-effort" shutdown when
called while a panic is unwinding: it would signal quiescence, and then
run all closer async while swallowing their panics, then wait a bit,
then re-panic. If the stopper is to do any shutdown on panics, this
shutdown has to be somewhat involved, since some closers are known to
panic easily when closed in an unclean state (e.g. memory accounts with
bytes taken out of them panic), and tasks can fail to respond to
quiescence, etc. This is all too complicated, and pretty unsound, for
little, if any, gain, so this patch gets rid of it all in favor of not
doing any shutdown and re-panicking immediately.

This best-effort shutdown was added in cockroachdb#3581, for the purported benefit
of CLI tests that wanted to trigger panic code-paths. The respective
code is long gone - as far as I can dig up, it had to do with ancient,
per-SQL cli kv-client commands.
Over time, this panic recovery became load-bearing in cockroachdb#45983, when
leaktest was modified to not crash the package binary on panics, and
instead continue with further tests. If we are to continue with other
tests after a panicking one, we'd better try to cleanup the panicking
TestServer as best we can. The leaktest panic behavior is reverted in
the prior commit.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Dec 2, 2022
This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic
when invoked on a panic unwind, instead of failing the test but
otherwise swallowing the panic and letting further tests run. Recovering
from panics is generally not a Go thing to do. The leaktest can only
recover from panics on the test's main goroutine (panics on other
goroutines, for example panics in a TestServer/TestCluster still crash
the binary), so the leaktest's behavior is inconsistent at best. The
panic-swallowing behavior is also surprising and magical - tests using
leaktest get it, tests not using leaktests don't get it, and figuring
out this difference is hard. Additionally, having the leaktest recover
panics is contageous - it puts pressure on other random code to be
mindful of its behavior under panic unwind. For example, the Stopper has
complex logic for recovering panics and doing best-effort cleanup,
recovering more panics from confused cleanup code, so that ultimately
future tests can run in a more or less sane environment.

I don't think the magic is worth it. Let's try again without it.

I believe that, at the time the leaktest magic was added, panics in go
tests were not resulting in a very clean experience in CI - it wasn't
trivial to see what test failed. I don't know if that's still the case.

Release note: None
Epic: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Dec 2, 2022
Before this patch, Stopper.Stop() would do a "best-effort" shutdown when
called while a panic is unwinding: it would signal quiescence, and then
run all closer async while swallowing their panics, then wait a bit,
then re-panic. If the stopper is to do any shutdown on panics, this
shutdown has to be somewhat involved, since some closers are known to
panic easily when closed in an unclean state (e.g. memory accounts with
bytes taken out of them panic), and tasks can fail to respond to
quiescence, etc. This is all too complicated, and pretty unsound, for
little, if any, gain, so this patch gets rid of it all in favor of not
doing any shutdown and re-panicking immediately.

This best-effort shutdown was added in cockroachdb#3581, for the purported benefit
of CLI tests that wanted to trigger panic code-paths. The respective
code is long gone - as far as I can dig up, it had to do with ancient,
per-SQL cli kv-client commands.
Over time, this panic recovery became load-bearing in cockroachdb#45983, when
leaktest was modified to not crash the package binary on panics, and
instead continue with further tests. If we are to continue with other
tests after a panicking one, we'd better try to cleanup the panicking
TestServer as best we can. The leaktest panic behavior is reverted in
the prior commit.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants