-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: include Raft application on leaseholder in request trace #72738
kvserver: include Raft application on leaseholder in request trace #72738
Conversation
Nice, thanks! When I run a single-node cluster I get a populated span (I added in the
However, when running a 3-node cluster I only get the span operation itself, and none of the contained events, regardless of which node I'm connected to (including leaseholder):
I'll very occasionally get another event here, but usually not. Also notice the Regardless, this will still shows how long was spent on replication, which is great. It'd be nice if we could add in the actual Raft replication trace at some point too (including network propagation to followers and fsync latency). |
c1986ab
to
d996767
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, when running a 3-node cluster I only get the span operation itself, and none of the contained events, regardless of which node I'm connected to (including leaseholder):
(FWIW, in case that was surprising, the node you're connected to is not supposed to matter.)
I think you've discovered that Raft application is frequently async wrt the client acknowledgement (i.e. the end of the BatchRequest
RPC). My commit message deserved to call that out, and now it does.
So, in these cases, command application races with the RPC finish, which collects the trace. That's why you might see fewer or more messages, depending on timing.
I've taken the opportunity to include a log message hinting at async application which I think would have helped. It required some plumbing; PTAL at the new prepended commit.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @irfansharif, @nvanbenschoten, and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, suspected as much -- thanks for clarifying. The additional trace event is helpful.
Reviewed 6 of 6 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)
Before this patch, CheckedCommand.AckSuccess() was called with a Raft worker context. That's wasn't great because each command captures a better context to use - one that derives from the proposal's ctx in the case of local proposals. This patch switches to using that by exposing the captured context through the Command interface. Taking advantage of the new ctx, we also log a message now about early acks, as it seems like a notable hint to see in a trace. This patch also cleans up most existing uses of that captured context to use the new interface method; before, various code paths were type asserting the implementation of the Command, and getting the internal context that way. This patch moves the resposibility of deciding what context to use upwards, to callers. Release note: None
Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in cockroachdb#39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although generally it doesn't because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 In case of sync consensus requests, it's possible for the Raft application span to outlive the evaluation span in cases when the application part (as opposed to the consensus part) can be done async (see CanAckBeforeApplication). Still, regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until cockroachdb#59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches cockroachdb#70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication.
Raft application was forking a span on followers, instead of the more usual creation of child spans. This was done probably because the parent span is usually a long-running Raft worker span, and adding children to such long-running spans is generally not a great idea. But these days there are better ways of dealing with long-running tasks - they can have Sterile spans, meaning that the parent declares that it doesn't want any children. And indeed the task in question uses such a Sterile span: https://github.com/cockroachdb/cockroach/blob/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvserver/scheduler.go#L217 So, the forking is not needed and this patch changes it to a regular child relationship. This has a couple of benefits: - the parent was not always a long-running taks. I believe it can also be a snapshot application, in which case we want the span in question to be a regular child (e.g. because tracing a snapshot application should also include this span). This shows that the decision about what kind if relationship a child should have to its parent is not appropriate to be taken by the child. - less contention on the active spans registry. Forking a span internally creates a "local root" span that needs to be added to the registry directly, instead of being added indirectly through its parent. Release note: None
d996767
to
f2bc7d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)
Build succeeded: |
Before this patch, the decoding and application of a Raft command was
not included in the recording of the request that generated the
respective command, even in the case where consensus is synchronous with
respect to the request (i.e. non-AsyncConsensus requests). This was
because, although we plumb tracing information below Raft, the span
under which Raft processing was occurring was Fork()ed from the parent
span (i.e. the request evaluation's span). The reasons why that was are
not good:
forking (as opposed to creating a regular child) was introduced in
storage: create new tracing span for each replicatedCmd during application #39425. I'm not sure whether there was a particular reason for this
decision. Perhaps there was fear at the time about the child
outliving the parent - although I don't think that it does because,
in the case of async consensus, we fork a parent which I think will
outlive the child:
https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157
Regardless of the exact details of the lifetimes, with time it has become
clear that it's appropriate to create a child when it's expected that
it will usually not outlive the parent even if, on occasion, it can
outlive it.
forked spans used to be included in the parent's recording until tracing: use manual collection in ForkCtxSpan #59815.
This explains why we regressed here - Raft application used to be
included in recordings properly.
This patch makes the application span be a regular child, and so the
raft application span is included in the request trace.
Touches #70864. That issue asks for a new tracing feature but I think
what motivated it is the application info missing from query traces.
This patch is sufficient for that.
Release note (general change): Tracing transaction commits now includes
details about replication.