-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli/start: ensure startup panics are reported to stderr #91823
Conversation
76e9501
to
23927cf
Compare
23927cf
to
bcc243f
Compare
ci barking on unrelated flake #91828. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)
pkg/cli/start.go
line 662 at r1 (raw file):
// The settings.Values, if available, determines whether a Sentry // report should be sent. No Sentry report is sent if sv is nil. logcrash.ReportPanic(ctx, sv, r, 1)
pls comment the 1
inline
pkg/util/log/logcrash/crash_reporting.go
line 137 at r1 (raw file):
// RecoverAndReportNonfatalPanic is an alternative RecoverAndReportPanic that // does not re-panic in Release builds. // sv may not be nil.
Isn't this an unusual comment?
Code quote:
sv may not be nil.
pkg/util/log/logcrash/crash_reporting.go
line 153 at r1 (raw file):
// does not re-panic in release builds. // sv may not be nil. func ReportNonfatalPanic(ctx context.Context, sv *settings.Values, r interface{}, depth int) {
consider adding a comment about r
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
// avoids stalls and helps some tests in `./cli` finish cleanly (where // panics happen on purpose). if r := recover(); r != nil {
This whole recovery business in the stopper strikes me as dubious. Out of the motivations in the comment above, I only buy "avoid stalls". I think we can achieve that with putting a log.Fatal()
inside this recover
block, and nothing else. I would get rid of the new EmergencyStop
; make the one caller outside of this file also log.Fatal()
instead.
pkg/sql/rowexec/processor_utils_test.go
line 222 at r3 (raw file):
panic(r) } else { p.config.FlowCtx.EvalCtx.Stop(ctx)
here aren't you breaking your _must_ be called in a defer...
comment? Consider relaxing that comment.
8183c5e
to
b2356f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/cli/start.go
line 662 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls comment the
1
inline
Done.
pkg/util/log/logcrash/crash_reporting.go
line 137 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Isn't this an unusual comment?
I don't understand your comment.
pkg/util/log/logcrash/crash_reporting.go
line 153 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
consider adding a comment about
r
Done.
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
This whole recovery business in the stopper strikes me as dubious. Out of the motivations in the comment above, I only buy "avoid stalls". I think we can achieve that with putting a
log.Fatal()
inside thisrecover
block, and nothing else. I would get rid of the newEmergencyStop
; make the one caller outside of this file alsolog.Fatal()
instead.
I don't like the idea of log.Fatal for use in unit tests.
Another argument against it is that it prevents the leak prevention checker from running.
pkg/sql/rowexec/processor_utils_test.go
line 222 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
here aren't you breaking your
_must_ be called in a defer...
comment? Consider relaxing that comment.
Done.
b2356f2
to
65af6c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)
pkg/sql/sem/eval/context.go
line 18 at r7 (raw file):
"time" apd "github.com/cockroachdb/apd/v3"
see if this is a good change. I don't have a problem with it, but maybe your editor is not like most people's if it added it (the vast majority of importers to not have the alias)?
pkg/sql/sem/eval/context.go
line 439 at r7 (raw file):
// EmergencyStop should be called in the recover() branch of a context // shutdown when `defer evalCtx.Stop()` is not possible/desirable. func (ec *Context) EmergencyStop(c context.Context) {
similar to the stopper discussion, I would not add this and leave the handful of tests that use it vulnerable to a bogus panic about some account being closed without memory returned to it (or whatever TestingMon.EmergencyStop()
is supposed to help with).
pkg/util/log/logcrash/crash_reporting.go
line 137 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I don't understand your comment.
I mean generally speaking, pointer args are assumed to be non-nil, unless otherwise specified. Is this sv
special enough to warrant a comment? (I definitely did not mean adding more words to the comment :) )
I think you did have a comment about some other sv
somewhere being potentially nil
, but it's not in this commit, and it's probably not reason enough to pollute all these other functions.
pkg/util/log/logcrash/crash_reporting.go
line 154 at r5 (raw file):
// the recovered panic object. // The caller is responsible for ensuring that the argument sv is non-nil. func ReportNonfatalPanic(ctx context.Context, sv *settings.Values, r interface{}, depth int) {
does anyone call this function? I would get rid of RecoverAndReportNonfatalPanic()
(I consider it harmful), rather than introducing (or at least exporting) more brothers to it.
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I don't like the idea of log.Fatal for use in unit tests.
Another argument against it is that it prevents the leak prevention checker from running.
What I'm actually bothered by here is the introduction of EmergecyStop
; otherwise I don't necessarily have a problem with the old code. But EmergencyStop
I think is too complicated; it's hard to understand when one needs to call it and why, etc. And it leads to a cascade of uncertainty - for example, does transientCluster.Close()
now need a comment to say that it should only be used in a defer
? EmergencyStop
is only used in transientCluster.Close()
, which is used only by two tests. I say let those tests vulnerable to the danger of the Stopper.Stop()
call deadlocking, and don't touch this code at all.
One thing I would do to this code, though, is unify this recover block with the recovery part of the defer
above. I think the
if r := recover(); r != nil {
s.handlePanic(ctx, r)
}
part can be massaged into this block. As written, it's quite hard to keep track of what's going on.
pkg/util/stop/stopper.go
line 591 at r8 (raw file):
// stopper.Stop(ctx) // }() func (s *Stopper) EmergencyStop(ctx context.Context) {
sprinkle some comment saying that this is best effort
65af6c6
to
40af148
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)
pkg/sql/sem/eval/context.go
line 439 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
similar to the stopper discussion, I would not add this and leave the handful of tests that use it vulnerable to a bogus panic about some account being closed without memory returned to it (or whatever
TestingMon.EmergencyStop()
is supposed to help with).
Done.
pkg/util/log/logcrash/crash_reporting.go
line 137 at r1 (raw file):
unless otherwise specified. Is this sv special enough to warrant a comment?
Yes, because some other functions in this file support sv being nil, I found it coherent to clarify when it must not be nil so that there's no open question.
pkg/util/log/logcrash/crash_reporting.go
line 154 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
does anyone call this function? I would get rid of
RecoverAndReportNonfatalPanic()
(I consider it harmful), rather than introducing (or at least exporting) more brothers to it.
Yes alas we have a few calls to it remaining.
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
What I'm actually bothered by here is the introduction of
EmergecyStop
; otherwise I don't necessarily have a problem with the old code. ButEmergencyStop
I think is too complicated; it's hard to understand when one needs to call it and why, etc. And it leads to a cascade of uncertainty - for example, doestransientCluster.Close()
now need a comment to say that it should only be used in adefer
?EmergencyStop
is only used intransientCluster.Close()
, which is used only by two tests. I say let those tests vulnerable to the danger of theStopper.Stop()
call deadlocking, and don't touch this code at all.One thing I would do to this code, though, is unify this recover block with the recovery part of the
defer
above. I think theif r := recover(); r != nil { s.handlePanic(ctx, r) }
part can be massaged into this block. As written, it's quite hard to keep track of what's going on.
I've re-written this bit. Do you like it better?
pkg/util/stop/stopper.go
line 591 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
sprinkle some comment saying that this is best effort
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)
pkg/util/log/logcrash/crash_reporting.go
line 154 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Yes alas we have a few calls to it remaining.
a few calls to what? This ReportNonfatalPanic
that you're introducing now only has the one caller above. I would not introduce it, or at least not export it.
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I've re-written this bit. Do you like it better?
I like that EmergencyStop
is gone.
Other than that, I personally like the old comment around here better. You've moved the "avoids stalls" part around, which I think is the important one; in the old phrasing, it was pretty clear that the "avoid stalls" refers to the "would likely block" - so the stalls are about stopper tasks not terminating. In the new phrasing, it's not clear what these stalls refer to - are they internal to the stopper, or not.
I continue to stand by the suggestion to merge this block with the defered recover above. I understood that, they way the defered one is written, it will catch panics from the Close()
calls below, but that comes at the cost of convoluted code and I don't think it's worth it. We've lived without catching those panics forever cause the code was broken (which you're fixing), right? I say we continue to live without it. In tests, such a panic will be caught by leaktest
, for better or worse, and the goroutine leak detection will be inhibited.
I don't care too much about all this though, so feel free to do what you want if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz)
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
I continue to stand by the suggestion to merge this block with the defered recover above. I understood that, they way the defered one is written, it will catch panics from the Close() calls below, but that comes at the cost of convoluted code and I don't think it's worth it. We've lived without catching those panics forever cause the code was broken (which you're fixing), right? I say we continue to live without it. In tests, such a panic will be caught by leaktest, for better or worse, and the goroutine leak detection will be inhibited.
Instead of the defered recover above that catches panics from closers in the case when Stopper.Stop()
is not called while unwinding from a panic, what I would find useful would be catching panics in go c.Close()
below (inside the other goroutine) - so when the closers are called on this panic recovery path. Cause that's when the damn memory monitors panic because memory hasn't been release to them. And when they panic, that panic ends up printed, and the original panic doesn't make it to the output.
This just happened to me now, and it's not the first time.
ok i can do this too |
40af148
to
b0c7db7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did one more round of improvements. PTAL.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/util/log/logcrash/crash_reporting.go
line 154 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
a few calls to what? This
ReportNonfatalPanic
that you're introducing now only has the one caller above. I would not introduce it, or at least not export it.
Done.
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
it was pretty clear that the "avoid stalls" refers to the "would likely block" - so the stalls are about stopper tasks not terminating.
Gotcha. I restored that part.
what I would find useful would be catching panics in go c.Close() below (inside the other goroutine)
Done.
31680bd
to
2e2fb5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz)
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
it was pretty clear that the "avoid stalls" refers to the "would likely block" - so the stalls are about stopper tasks not terminating.
Gotcha. I restored that part.
what I would find useful would be catching panics in go c.Close() below (inside the other goroutine)
Done.
Last pitch for removing the deferred recover on line 541.
pkg/util/stop/stopper.go
line 580 at r16 (raw file):
// panic recovery which will ensure panic objects go to logs. defer s.recover(ctx) s.Quiesce(ctx)
Quiesce
has its own recover
inside, doesn't it? Which, btw, I'd get rid of, along with getting rid of the defer s.recover
here; I don't think they're worth having. I don't think Quiesce
is likely to ever panic.
Short of getting rid of them, I'd make it such that, when called on this recovery path, Quiesce
doesn't re-panic on panics it recovers internally. Because what we want to see is the original panic - see below.
pkg/util/stop/stopper.go
line 598 at r16 (raw file):
// (If we let the panic below happen too early, the async // errors from Quiesce/Close may never get a chance to get reported.) time.Sleep(2 * time.Second)
I think this sleep makes sense on its own (although I'd make it more like 5ms instead of 2s for the common enough case where every test in a pkg starts panicking; I don't want to wait forever for that run), but I think its effects on what gets printed are the wrong way around. What we really want to see is the original panic (r
). Seeing other panics from the closers is secondary. With this sleep, we're going to see only the closer panic, right?
What I'd do is print r
as soon as we recover it, and then perhaps also pass it into s.recover
to be printed alongside every other panic from closers, and also I wouldn't repanic on closer panics.
8927371
to
923116d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we're getting there soon.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
Instead of the defered recover above that catches panics from closers in the case when Stopper.Stop() is not called while unwinding from a panic, what I would find useful would be catching panics in go c.Close() below (inside the other goroutine) - so when the closers are called on this panic recovery path
I've re-read this and I believe we need both.
The above defer (for when Stop is not called while unwinding), applies for panics caused in the synchronous path (at the end of the function). We do need to recover from (and log) those.
Meanwhile, we also need the recover/log for the async panics below (as you suggested) for the recover that applies during unwinding. IMHO they are distinct.
Last pitch for removing the deferred recover on line 541.
I've peeled it out of the way where it wasn't needed any more. Let me know what you think.
pkg/util/stop/stopper.go
line 580 at r16 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Quiesce
has its ownrecover
inside, doesn't it? Which, btw, I'd get rid of, along with getting rid of thedefer s.recover
here; I don't think they're worth having. I don't thinkQuiesce
is likely to ever panic.
Short of getting rid of them, I'd make it such that, when called on this recovery path,Quiesce
doesn't re-panic on panics it recovers internally. Because what we want to see is the original panic - see below.
Let's keep the one in Quiesce.
I agree we want to let the original panic appear in logs. But we can achieve this by reporting it earlier (as per your comment below). I've changed that.
Meanwhile I think it's important to let all the panic objects propagate however. If there's a panic under Quiesce/Close, that's non-recoverable even if the code "above" Stop() decides to absorb its own panic.
pkg/util/stop/stopper.go
line 598 at r16 (raw file):
its effects on what gets printed are the wrong way around
Good catch. I added the logging as suggested.
Again I don't think we should avoid the re-panic. They are signalling something bad and we shouldn't paper over them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I would not backport this to any releases.
but :), I'd like to drag @tbg into this and argue about the past. I think the panic recovery that was added in #45983 is a bad idea. If that wasn't there, I would push for this here patch to look differently. So I want to see if you'd be fine with removing it.
No other leaktest on the Internet recovers panics, and generally I don't think any Go testing framework does. Recovering panics in tests is not really a thing because you can only do it, at best, on the test's main goroutine - a panic on any other goroutine will still kill the package run. With our complicated code, even catching panics on the main goroutine doesn't work reliably - because such a recovered panic will frequently cause a secondary panic on a Closer
registered with the Stopper
(e.g. a memory account being closed without all the memory returned to it). Also, recovering panics leads to leaked goroutines, causes us to disable the leaktest
for future tests, and generally just leads to pollution. Plus, putting this in the leaktest is too magic. It took me a good while to figure out why some tests had one behavior and others didn't, and I was very surprised to find this hidden recover.
I say we stop doing it - at least until we get annoyed enough by panics again.
WDYT?
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz)
pkg/util/log/logcrash/crash_reporting.go
line 154 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Done.
inline this function into the caller
pkg/util/stop/stopper.go
line 552 at r4 (raw file):
Last pitch for removing the deferred recover on line 541.
I've peeled it out of the way where it wasn't needed any more. Let me know what you think.
I think what you have now is better, thanks.
pkg/util/stop/stopper.go
line 580 at r16 (raw file):
Meanwhile I think it's important to let all the panic objects propagate however. If there's a panic under Quiesce/Close, that's non-recoverable even if the code "above" Stop() decides to absorb its own panic.
Let's separate the Quiesce
from the Closers
.
Quiesce
simply does not panic. There are no callbacks in there. It doesn't need any recovery logic (therecover()
call inside it, and the sleep above). Therecover
call inside it was added by excessive zeal.- For the
Closers
, I don't really agree that they're panics happening when we're already recovering from a panic need to be re-panicked. A panic in aCloser
is not any less recoverable than the panic that's stopping the stopper; if anything, it is more recoverable. I think the panic recovery that we do in the leaktest is a bad idea but, if we like that recovery, then I think we definitely do not want it hindered by a panicking closer.
pkg/util/stop/stopper.go
line 540 at r17 (raw file):
} // In any case, this Stopper is done for once we return from here.
I wouldn't add this comment. The "in any case" will only confuse most readers.
pkg/util/stop/stopper.go
line 558 at r17 (raw file):
// quiesce and closes asynchronously. This hopefully causes a // number of routines to still terminate, which simplifies // subsequent goroutine dumps. It may avoids stalls and helps
double space before "avoids"
pkg/util/stop/stopper.go
line 573 at r17 (raw file):
// below, and cause the go runtime to terminate the process too // early. So we wait a little on the return path. defer time.Sleep(50 * time.Millisecond)
I would not add this sleep; I don't think it's worth it - Quiesce
can't really panic. Any line of code we can remove from around here I think is worth removing.
pkg/util/stop/stopper.go
line 605 at r17 (raw file):
// is catching a panic already encountered in the caller. // NB: Quiesce has its own panic recovery/logging code. No need to
I wouldn't add this NB :P
pkg/util/stop/stopper.go
line 609 at r17 (raw file):
s.Quiesce(ctx) // The following defer handles panics encountered on the synchronous
nit: this comment doesn't add anything (the code is straight-forward to read without it), and has the potential to become stale. Consider removing.
f3d2856
to
f203419
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
wrt backports: I suggest we exclude the last commit (the one under discussion) from the backports.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)
pkg/util/log/logcrash/crash_reporting.go
line 154 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
inline this function into the caller
Done.
pkg/util/stop/stopper.go
line 580 at r16 (raw file):
I don't really agree that they're panics happening when we're already recovering from a panic need to be re-panicked.
Now I see what you mean. Since we're already going to propagate a panic object (the original one), we may as well eat up the Closer panics instead of letting them propagate to the go runtime. I'm OK with that.
Done.
Quiesce: The recover call inside it was added by excessive zeal.
I checked the git log and it was there from the beginning. I think there's wisdom in regularity -- it makes the code easier to read and avoids raising the question "why do some methods recover panics and others not".
pkg/util/stop/stopper.go
line 598 at r16 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
its effects on what gets printed are the wrong way around
Good catch. I added the logging as suggested.
Again I don't think we should avoid the re-panic. They are signalling something bad and we shouldn't paper over them.
I checked manually and 50ms or less was too short. With 100ms we're good. This is non-ordinary code anyway.
pkg/util/stop/stopper.go
line 540 at r17 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I wouldn't add this comment. The "in any case" will only confuse most readers.
Done.
pkg/util/stop/stopper.go
line 558 at r17 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
double space before "avoids"
Done.
pkg/util/stop/stopper.go
line 573 at r17 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I would not add this sleep; I don't think it's worth it -
Quiesce
can't really panic. Any line of code we can remove from around here I think is worth removing.
Done.
pkg/util/stop/stopper.go
line 605 at r17 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I wouldn't add this NB :P
Done.
pkg/util/stop/stopper.go
line 609 at r17 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: this comment doesn't add anything (the code is straight-forward to read without it), and has the potential to become stale. Consider removing.
Done.
f203419
to
f2f9870
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but for the record, I think this commit had it right, and all the panic recovery that came to the stopper afterwards was misguided. My opinion is that we should go back to a simpler time - we should stop recovering in leaktest
, and we should figure out something else, out of the regular stopper.Stop()
code, for the cli
tests that cause panics (which I think were the original reason why some recovery was added to the stopper).
You never said where you stand on the leaktest
, and I am curious.
I wish @tbg would also opine, as the one who introduced the leaktest
recovery.
about backports, as a matter of practice, I'd ask why should we backport anything here? The default should be a pretty strong no backport. The first thing you need to tick off on the backport PR says
Patches should only be created for serious issues or test-only changes.
Is that met here? Me complaining that I don't get the output I want when I'm introducing server startup bugs during development does not count.
I don't take that backport checklist too seriously at time (but I also almost never backport anything), but I don't see a clear reason to backport this. If you do, however, I'd approve the first commit; I think the rest are clearly unjustifiable.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz)
pkg/util/stop/stopper.go
line 542 at r19 (raw file):
defer func() { if r := recover(); r != nil { s.handlePanic(ctx, r)
this change on its own I think has the problem that unregister(s)
is no longer called when stopper.Stop()
is called while unwinding from a panic. That sounds bad?
This is solved in the later commit, and I think this commit should be squashed anyway as I say elsewhere.
pkg/util/stop/stopper.go
line 540 at r21 (raw file):
the comments on the OnPanic
options say
// When Stop() is invoked during stack unwinding, OnPanic is also invoked, but
// Stop() may not have carried out its duties.
That's no longer true, is it? (I don't know if it was ever true)
The OnPanic
option is not used by anyone; let's take the opportunity to delete it - particularly since it's getting hard to explain when the thing runs and when it doesn't.
pkg/util/stop/stopper.go
line 608 at r21 (raw file):
s.Quiesce(ctx) defer func() {
This can be defer s.recover()
, right? And in that case, I think there's no more use of the handlePanic
method introduced in a prior commit; let's remove it. Consider squashing that commit with this one.
This fixes a regression introduced in fc7bfdf (in 2017!) The `recover()` call from `RecoverAndReportPanic` doesn't work unless the function is called via `defer` directly. It can't be called from another function e.g. a closure with `defer func() { ... myPanicCatcher() ... }`. This commit clarifies this in the API docs for the `logcrash` package and fixes the only call site that was incorrect. Release note (bug fix): Server crashes that occur during startup are now more clearly reported in logs and the standard error output.
Release note: None
Release note: None
f2f9870
to
934012f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this commit had it right
which I think were the original reason why some recovery was added to the stopper
It's not just that - we also wanted to capture all panics happening during server shutdown and ensure they appear in logs. Which, because of the go runtime vs stderr kerfduddle, wasn't happening on its own.
Me complaining that I don't get the output I want when I'm introducing server startup bugs during development does not count.
👍 works for me.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/util/stop/stopper.go
line 542 at r19 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
this change on its own I think has the problem that
unregister(s)
is no longer called whenstopper.Stop()
is called while unwinding from a panic. That sounds bad?
This is solved in the later commit, and I think this commit should be squashed anyway as I say elsewhere.
You're right. I'll squash them together before merge.
pkg/util/stop/stopper.go
line 540 at r21 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
the comments on the
OnPanic
options say// When Stop() is invoked during stack unwinding, OnPanic is also invoked, but
// Stop() may not have carried out its duties.That's no longer true, is it? (I don't know if it was ever true)
TheOnPanic
option is not used by anyone; let's take the opportunity to delete it - particularly since it's getting hard to explain when the thing runs and when it doesn't.
Nice. Let's simplify.
pkg/util/stop/stopper.go
line 608 at r21 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
This can be
defer s.recover()
, right? And in that case, I think there's no more use of thehandlePanic
method introduced in a prior commit; let's remove it. Consider squashing that commit with this one.
Yay, that works.
okay this looks good now (and much simpler) bors r=andreimatei |
Build succeeded: |
I'm fine with removing this. This was added somewhat opportunistically to duct tape around the poor intra-test isolation that we get in Go tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with removing this. This was added somewhat opportunistically to duct tape around the poor intra-test isolation that we get in Go tests.
Cool. Here's what I think of the whole topic: #92627. PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
Fixes #91700.
This fixes a regression introduced in
fc7bfdf (in 2017!)
The
recover()
call fromRecoverAndReportPanic
doesn't work unless the function is called viadefer
directly. It can't be called from another function e.g. a closure withdefer func() { ... myPanicCatcher() ... }
.This commit clarifies this in the API docs for the
logcrash
package and fixes the only call site that was incorrect.Release note (bug fix): Server crashes that occur during startup are now more clearly reported in logs and the standard error output.