-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util/tracing: optimize gprc span creation for Batch and DistSQL #76199
util/tracing: optimize gprc span creation for Batch and DistSQL #76199
Conversation
First two commits are #76174. Included here because they add a benchmark that demonstrates the benefit. |
58c1b48
to
94d1492
Compare
94d1492
to
8710f9e
Compare
This patch defaults the COCKROACH_REAL_SPANS and COCKROACH_REUSE_TRACING_SPANS env vars to true. This causes tracing spans to be created even for operations that are not explicitly traced (i.e. operations for which a "trace recording" has not been requested). This was already the case in unit tests. Creating the tracing spans is nice because, which a span is alive, it is present in the active spans registry which can be queried in various ways (e.g. through the crdb_internal.node_inflight_trace_spans virtual table, and through the /debug/tracez console page). I've got big plans for exposing this registry. I believe this change to have almost no measurable performance impact. Some unittest benchmarks: ``` make bench PKG=./pkg/sql/tests BENCHES='BenchmarkKV///rows=1$$' name old time/op new time/op delta KV/Insert/Native/rows=1-32 92.2µs ±11% 88.0µs ± 6% ~ (p=0.421 n=5+5) KV/Insert/SQL/rows=1-32 322µs ± 6% 312µs ± 4% ~ (p=0.151 n=5+5) KV/Update/Native/rows=1-32 114µs ± 0% 118µs ± 3% +3.64% (p=0.016 n=4+5) KV/Update/SQL/rows=1-32 398µs ± 2% 402µs ± 5% ~ (p=0.690 n=5+5) KV/Delete/Native/rows=1-32 94.3µs ± 2% 91.0µs ± 5% ~ (p=0.056 n=5+5) KV/Delete/SQL/rows=1-32 266µs ±10% 260µs ±17% ~ (p=0.690 n=5+5) KV/Scan/Native/rows=1-32 19.4µs ± 3% 21.0µs ± 5% +8.01% (p=0.008 n=5+5) KV/Scan/SQL/rows=1-32 194µs ± 8% 201µs ± 5% ~ (p=0.421 n=5+5) ``` And 30 runs of kv95 and kv0 roachtests with results processed through benchstat: ``` kv95: name old ops/sec new ops/sec delta kv95/enc=false/nodes=3 29.6k ± 4% 29.2k ± 8% -1.33% (p=0.018 n=30+30) kv0: name old ops/sec new ops/sec delta kv0/enc=false/nodes=3 13.6k ± 4% 13.5k ± 3% ~ (p=0.217 n=30+27) ``` This measurements were taken with cockroachdb#76199 - a last pending performance optimization for the tracing library in a long chain of them. It's possible that this changes causes some roachtests to fail with span use-after-finish assertions; roachtests are configured to crash on these assertions, unlike production. But I don't expect that to happen (at least, not too much), since unit tests have been running like this for a while. I've also kick off a roachtest run; it's far from done, but so far everything looks good. Release note: The crdb_internal.node_inflight_trace_spans virtual table will now present traces for all operations ongoing on the respective node. Before, the table was reflecting a small percentage of ongoing operations unless tracing was explicitly enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but maybe @nvanbenschoten should also have a look?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
-- commits, line 9 at r5:
imported?
pkg/server/node.go, line 1137 at r6 (raw file):
} ctx, newSpan = tr.StartSpanCtx(ctx, tracing.BatchMethodName, tracing.WithRemoteParent(remoteParent),
For my understanding, where else are we not able to use the FromTraceInfo
variant? It was a little hard for me to trace this out, so do you think we can have a comment over WithRemoteParent
that outlines the scenario(s) it's now used in?
8710f9e
to
56f78fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
Previously, aayushshah15 (Aayush Shah) wrote…
imported?
indeed. done
pkg/server/node.go, line 1137 at r6 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
For my understanding, where else are we not able to use the
FromTraceInfo
variant? It was a little hard for me to trace this out, so do you think we can have a comment overWithRemoteParent
that outlines the scenario(s) it's now used in?
I've update the comment on WithRemoteParent
.
We can/should only use WithRemoteParentFromTraceInfo
when we have a TraceInfo
that already lives on the heap (i.e. one can always create a TraceInfo
, but if it does so only to pass it to WithRemoteParentFromTraceInfo
, that defeats the purpose since it'll escape). There are two situation where we have TraceInfo
's on the heap: in the Batch RPC and in the SetupFlow
RPCs; the TraceInfo
is part of the RPC request proto, and the proto is naturally on the heap.
Besides these two cases, there are a couple of other cases where we want to use WithRemoteParent*
and we do not have a TraceInfo
. In all these cases, we have a SpanMeta
is synthesized from grpc metada. All these places allocate.
- in the gRPC interceptor that handles tracing for most RPCs (but not for
Batch
andSetupFlow
). - in the RPC handlers for Batch (i.e. right here) and SetupFlow, in the backwards compatibility case.
- in the Raft application path, where we marshal the trace info in a very inefficient way. I'll probably do something about this. The respective code is only enabled for verbose tracing, at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/server/node.go, line 1137 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I've update the comment on
WithRemoteParent
.We can/should only use
WithRemoteParentFromTraceInfo
when we have aTraceInfo
that already lives on the heap (i.e. one can always create aTraceInfo
, but if it does so only to pass it toWithRemoteParentFromTraceInfo
, that defeats the purpose since it'll escape). There are two situation where we haveTraceInfo
's on the heap: in the Batch RPC and in theSetupFlow
RPCs; theTraceInfo
is part of the RPC request proto, and the proto is naturally on the heap.Besides these two cases, there are a couple of other cases where we want to use
WithRemoteParent*
and we do not have aTraceInfo
. In all these cases, we have aSpanMeta
is synthesized from grpc metada. All these places allocate.
- in the gRPC interceptor that handles tracing for most RPCs (but not for
Batch
andSetupFlow
).- in the RPC handlers for Batch (i.e. right here) and SetupFlow, in the backwards compatibility case.
- in the Raft application path, where we marshal the trace info in a very inefficient way. I'll probably do something about this. The respective code is only enabled for verbose tracing, at the moment.
👍 thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r7, 10 of 10 files at r8, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/util/tracing/span_options.go, line 252 at r8 (raw file):
// be added to the StartSpan invocation. // // If you're in posession of a TraceInfo instead of a SpanMeta, prefer using
If WithRemoteParentFromTraceInfo
is now the preferred method of setting a RemoteParent
, do you think it's worth renaming this instead to indicate that it should be avoided when possible? Is this one still more common?
pkg/util/tracing/span_options.go, line 277 at r8 (raw file):
} // WithRemoteParentFromTraceInfo is like WithRemoteParent, except the remote
nit: the construct usually precedes the apply
method in this file. Consider reordering for consistency.
pkg/util/tracing/tracer.go, line 1125 at r7 (raw file):
// the moment, but let's not rely on that. if !opts.Parent.empty() && opts.Parent.i.crdb != nil { s.i.crdb.parentSpanID = opts.parentSpanID()
I don't understand this code, but will we enter this branch if !opts.RemoteParent.Empty()
?
We weren't setting the parent id on spans with remote parents. This patch fixes that, because it's the right thing to do and also because there were comments that implied that we did. There were no major consequences of the fact that we didn't set it because the importing of a remote recording was overwriting the parent id of the first importeed span. The next patch has a test for this. Release note: None
The WithRemoteParent(SpanMeta) span creation option allocates. This patch adds a flavor that doesn't allocate that takes in an *tracingpb.TraceInfo, which is the structure passed in some RPC requests (kv and DistSQL). The TraceInfo is already on the heap, so we can take advantage of that. This saves one allocation in these RPCs. ``` name old time/op new time/op delta SetupSpanForIncomingRPC-32 442ns ± 0% 308ns ± 1% -30.24% (p=0.008 n=5+5) name old alloc/op new alloc/op delta SetupSpanForIncomingRPC-32 145B ± 0% 49B ± 0% -66.21% (p=0.008 n=5+5) name old allocs/op new allocs/op delta SetupSpanForIncomingRPC-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.008 n=5+5) ``` Release note: None
56f78fc
to
dce4cec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/util/tracing/span_options.go, line 252 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If
WithRemoteParentFromTraceInfo
is now the preferred method of setting aRemoteParent
, do you think it's worth renaming this instead to indicate that it should be avoided when possible? Is this one still more common?
This one is still more common, counting by number of callers. There are more ways to get your hands on a SpanMeta
.
I've renamed it to WithRemoteParentFromSpanMeta
, to avoid the appearance of favoritism.
I will try to merge SpanMeta
with TraceInfo
in the future. I haven't made up my mind exactly; there's minor differences between the two.
pkg/util/tracing/span_options.go, line 277 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: the construct usually precedes the
apply
method in this file. Consider reordering for consistency.
done
pkg/util/tracing/tracer.go, line 1125 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't understand this code, but will we enter this branch if
!opts.RemoteParent.Empty()
?
This line move out of the if
in the next commit. Fixed this commit now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 10 files at r9, 14 of 14 files at r10, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
This patch defaults the COCKROACH_REAL_SPANS and COCKROACH_REUSE_TRACING_SPANS env vars to true. This causes tracing spans to be created even for operations that are not explicitly traced (i.e. operations for which a "trace recording" has not been requested). This was already the case in unit tests. Creating the tracing spans is nice because, which a span is alive, it is present in the active spans registry which can be queried in various ways (e.g. through the crdb_internal.node_inflight_trace_spans virtual table, and through the /debug/tracez console page). I've got big plans for exposing this registry. I believe this change to have almost no measurable performance impact. Some unittest benchmarks: ``` make bench PKG=./pkg/sql/tests BENCHES='BenchmarkKV///rows=1$$' name old time/op new time/op delta KV/Insert/Native/rows=1-32 92.2µs ±11% 88.0µs ± 6% ~ (p=0.421 n=5+5) KV/Insert/SQL/rows=1-32 322µs ± 6% 312µs ± 4% ~ (p=0.151 n=5+5) KV/Update/Native/rows=1-32 114µs ± 0% 118µs ± 3% +3.64% (p=0.016 n=4+5) KV/Update/SQL/rows=1-32 398µs ± 2% 402µs ± 5% ~ (p=0.690 n=5+5) KV/Delete/Native/rows=1-32 94.3µs ± 2% 91.0µs ± 5% ~ (p=0.056 n=5+5) KV/Delete/SQL/rows=1-32 266µs ±10% 260µs ±17% ~ (p=0.690 n=5+5) KV/Scan/Native/rows=1-32 19.4µs ± 3% 21.0µs ± 5% +8.01% (p=0.008 n=5+5) KV/Scan/SQL/rows=1-32 194µs ± 8% 201µs ± 5% ~ (p=0.421 n=5+5) ``` And 30 runs of kv95 and kv0 roachtests with results processed through benchstat: ``` kv95: name old ops/sec new ops/sec delta kv95/enc=false/nodes=3 29.6k ± 4% 29.2k ± 8% -1.33% (p=0.018 n=30+30) kv0: name old ops/sec new ops/sec delta kv0/enc=false/nodes=3 13.6k ± 4% 13.5k ± 3% ~ (p=0.217 n=30+27) ``` This measurements were taken with cockroachdb#76199 - a last pending performance optimization for the tracing library in a long chain of them. It's possible that this changes causes some roachtests to fail with span use-after-finish assertions; roachtests are configured to crash on these assertions, unlike production. But I don't expect that to happen (at least, not too much), since unit tests have been running like this for a while. I've also kick off a roachtest run; it's far from done, but so far everything looks good. Release note: The crdb_internal.node_inflight_trace_spans virtual table will now present traces for all operations ongoing on the respective node. Before, the table was reflecting a small percentage of ongoing operations unless tracing was explicitly enabled.
bors r+ |
Build succeeded: |
76403: util/tracing: enable the trace registry beyond tests r=andreimatei a=andreimatei This patch defaults the COCKROACH_REAL_SPANS and COCKROACH_REUSE_TRACING_SPANS env vars to true. This causes tracing spans to be created even for operations that are not explicitly traced (i.e. operations for which a "trace recording" has not been requested). This was already the case in unit tests. Creating the tracing spans is nice because, which a span is alive, it is present in the active spans registry which can be queried in various ways (e.g. through the crdb_internal.node_inflight_trace_spans virtual table, and through the /debug/tracez console page). I've got big plans for exposing this registry. I believe this change to have almost no measurable performance impact. Some unittest benchmarks: ``` make bench PKG=./pkg/sql/tests BENCHES='BenchmarkKV///rows=1$$' name old time/op new time/op delta KV/Insert/Native/rows=1-32 92.2µs ±11% 88.0µs ± 6% ~ (p=0.421 n=5+5) KV/Insert/SQL/rows=1-32 322µs ± 6% 312µs ± 4% ~ (p=0.151 n=5+5) KV/Update/Native/rows=1-32 114µs ± 0% 118µs ± 3% +3.64% (p=0.016 n=4+5) KV/Update/SQL/rows=1-32 398µs ± 2% 402µs ± 5% ~ (p=0.690 n=5+5) KV/Delete/Native/rows=1-32 94.3µs ± 2% 91.0µs ± 5% ~ (p=0.056 n=5+5) KV/Delete/SQL/rows=1-32 266µs ±10% 260µs ±17% ~ (p=0.690 n=5+5) KV/Scan/Native/rows=1-32 19.4µs ± 3% 21.0µs ± 5% +8.01% (p=0.008 n=5+5) KV/Scan/SQL/rows=1-32 194µs ± 8% 201µs ± 5% ~ (p=0.421 n=5+5) ``` And 30 runs of kv95 and kv0 roachtests with results processed through benchstat: ``` kv95: name old ops/sec new ops/sec delta kv95/enc=false/nodes=3 29.6k ± 4% 29.2k ± 8% -1.33% (p=0.018 n=30+30) kv0: name old ops/sec new ops/sec delta kv0/enc=false/nodes=3 13.6k ± 4% 13.5k ± 3% ~ (p=0.217 n=30+27) ``` This measurements were taken with #76199 - a last pending performance optimization for the tracing library in a long chain of them. It's possible that this changes causes some roachtests to fail with span use-after-finish assertions; roachtests are configured to crash on these assertions, unlike production. But I don't expect that to happen (at least, not too much), since unit tests have been running like this for a while. I've also kick off a roachtest run; it's far from done, but so far everything looks good. Release note: The crdb_internal.node_inflight_trace_spans virtual table will now present traces for all operations ongoing on the respective node. Before, the table was reflecting a small percentage of ongoing operations unless tracing was explicitly enabled. Co-authored-by: Andrei Matei <[email protected]>
This patch defaults the COCKROACH_REAL_SPANS and COCKROACH_REUSE_TRACING_SPANS env vars to true. This causes tracing spans to be created even for operations that are not explicitly traced (i.e. operations for which a "trace recording" has not been requested). This was already the case in unit tests. Creating the tracing spans is nice because, which a span is alive, it is present in the active spans registry which can be queried in various ways (e.g. through the crdb_internal.node_inflight_trace_spans virtual table, and through the /debug/tracez console page). I've got big plans for exposing this registry. I believe this change to have almost no measurable performance impact. Some unittest benchmarks: ``` make bench PKG=./pkg/sql/tests BENCHES='BenchmarkKV///rows=1$$' name old time/op new time/op delta KV/Insert/Native/rows=1-32 92.2µs ±11% 88.0µs ± 6% ~ (p=0.421 n=5+5) KV/Insert/SQL/rows=1-32 322µs ± 6% 312µs ± 4% ~ (p=0.151 n=5+5) KV/Update/Native/rows=1-32 114µs ± 0% 118µs ± 3% +3.64% (p=0.016 n=4+5) KV/Update/SQL/rows=1-32 398µs ± 2% 402µs ± 5% ~ (p=0.690 n=5+5) KV/Delete/Native/rows=1-32 94.3µs ± 2% 91.0µs ± 5% ~ (p=0.056 n=5+5) KV/Delete/SQL/rows=1-32 266µs ±10% 260µs ±17% ~ (p=0.690 n=5+5) KV/Scan/Native/rows=1-32 19.4µs ± 3% 21.0µs ± 5% +8.01% (p=0.008 n=5+5) KV/Scan/SQL/rows=1-32 194µs ± 8% 201µs ± 5% ~ (p=0.421 n=5+5) ``` And 30 runs of kv95 and kv0 roachtests with results processed through benchstat: ``` kv95: name old ops/sec new ops/sec delta kv95/enc=false/nodes=3 29.6k ± 4% 29.2k ± 8% -1.33% (p=0.018 n=30+30) kv0: name old ops/sec new ops/sec delta kv0/enc=false/nodes=3 13.6k ± 4% 13.5k ± 3% ~ (p=0.217 n=30+27) ``` This measurements were taken with cockroachdb#76199 - a last pending performance optimization for the tracing library in a long chain of them. It's possible that this changes causes some roachtests to fail with span use-after-finish assertions; roachtests are configured to crash on these assertions, unlike production. But I don't expect that to happen (at least, not too much), since unit tests have been running like this for a while. I've also kick off a roachtest run; it's far from done, but so far everything looks good. Release note: The crdb_internal.node_inflight_trace_spans virtual table will now present traces for all operations ongoing on the respective node. Before, the table was reflecting a small percentage of ongoing operations unless tracing was explicitly enabled.
The WithRemoteParent(SpanMeta) span creation option allocates.
This patch adds a flavor that doesn't allocate that takes in an
*tracingpb.TraceInfo, which is the structure passed in some RPC requests
(kv and DistSQL). The TraceInfo is already on the heap, so we can take
advantage of that. This saves one allocation in these RPCs.
Release note: None