-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log: fatal log is not written inline in the log file #62768
Comments
cockroach/pkg/util/log/clog.go Lines 352 to 355 in 8451341
It doesn't look like any logs are held during msg seqno generation, so it would make sense that they can go out of order in the actual files. If I interpret your snippet above correctly, 619 (the stack trace) got reordered with 620. (The counter is per-sink, btw) I think what's happening here is that in this repro above rendering the fatal error probably took a long time (they tend to be large!) diff --git a/pkg/cli/start.go b/pkg/cli/start.go
index 80718edb50..fa95e24f93 100644
--- a/pkg/cli/start.go
+++ b/pkg/cli/start.go
@@ -59,6 +59,13 @@ import (
"google.golang.org/grpc"
)
+type longTime struct{}
+
+func (longTime) String() string {
+ time.Sleep(5 * time.Second)
+ return "yawn"
+}
+
// startCmd starts a node by initializing the stores and joining
// the cluster.
var startCmd = &cobra.Command{
@@ -688,6 +695,8 @@ If problems persist, please see %s.`
fmt.Print(msgS.StripMarkers())
}
+ time.Sleep(5 * time.Second)
+ log.Fatalf(ctx, "exploding at exactly %s but delayed via %s", timeutil.Now(), longTime{})
return nil
}(); err != nil {
errChan <- err and it behaves like what you describe above. If I take out |
As Tobias explained, it probably has to do with goroutine scheduling + the cost of packaging the fatal error + the cost of sending the crash report to Sentry before the process is terminated
I don't see evidence this is the case (if you have it, please point me to it)
The serial number indicates the point at which it happens.
As explained above, the fatal message takes a while to package and emit. It's not being forgotten. The precise point can be derived from the sequence number.
This exists because if there are multiple sinks and the error occurs while writing to one sink, it may only be reported to another sink. We'll want more testing / tutorials to show and tell what this mechanism is about.
Discussed above.
Discussed above. |
From the previous exchange, I understand that we should clarify (in user-facing docs, and in the API docs) that logging is now a concurrent package and that no global lock is held, and so the sequence number should be used to "understand" the ordering with which the messages are emitted. Is there any other remaining work needed? |
We actually had a bug there which got fixed in a later version. |
I'm seeing that a
log.Fatal
only appears in the log file a second after it actually happens. This seems to have something to do with the log message being somehow redirected to the "dev channel"?I don't quite understand what's going on, but I think it'd be better for the fatal to also appear inline at the point where it happens.
Right now I see this:
I would like to see (a copy of) the fatal message instead of the "the server is terminating due to a fatal error (see the DEV channel for details)" (this other message I think should not exist), because the point where the fatal happens is important.
Then, I'm a bit confused about how come I'm seeing other messages after a fatal (particularly .5s worth of them) - which fatal I think calls os.Exit() which I'd expect to be fast? And then I'm further confused about why the final message has a counter of 619 - there's probably an explanation, but that counter value doesn't seem to make sense with either the position of the "he server is terminating due to a fatal error" message, nor is it the highest counter in the file.
9.log
Jira issue: CRDB-2670
The text was updated successfully, but these errors were encountered: