-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce distributed epoch-level tracing #10315
Conversation
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 3611 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
1608 | 1 | 2002 | 0 |
Click to see the invalid file list
- src/meta/src/barrier/trace.rs
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
@@ -93,7 +93,7 @@ message Barrier { | |||
ResumeMutation resume = 8; | |||
} | |||
// Used for tracing. | |||
bytes span = 2; | |||
map<string, string> tracing_context = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never persist this message so it's okay.
// Drop the span as the inner executor has finished processing the barrier (then all | ||
// data from the previous epoch). | ||
let _ = std::mem::replace(&mut span, Span::none()); | ||
|
||
yield message; | ||
|
||
// Create a new span after we're called again. Now we're in a new epoch and the | ||
// parent of the span is updated. | ||
span = new_span(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordering here could be really subtle. I find the current implementation to be the most understandable.
let (subtask, executor) = subtask::wrap(executor, actor_context.id); | ||
subtasks.push(subtask); | ||
executor.boxed() | ||
// TODO(bugen): subtask does not work with tracing spans. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering removing this optimization or introduce fragment-level implementation back, as we're fearless to No-Shuffle exchange anymore.
// TODO(bugen): better service name | ||
// https://github.com/jaegertracing/jaeger-ui/issues/336 | ||
format!("{}-{}", settings.name, id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically for web services, the nodes with the same service type (for HA purposes) should have the same name, so they'll not be distinguishable in Jaeger UI. However I find it not that fit our usages so I add the unique ID here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM!
By the way, I am wondering how much overhead (i.e. performance loss) it takes to run tracing? It would be nice if it can be enabled all the time.
@@ -281,9 +292,67 @@ pub fn init_risingwave_logger(settings: LoggerSettings, registry: prometheus::Re | |||
}); | |||
}; | |||
|
|||
let filter = filter::Targets::new().with_target("aws_smithy_client::retry", Level::DEBUG); | |||
// Tracing layer | |||
if let Ok(endpoint) = std::env::var("RW_TRACING_ENDPOINT") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May print a log line to show tracing has been enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we cannot get it printed before the logger is initialized. 😂 I put a println
here.
@@ -367,7 +368,7 @@ impl<S: StateStoreIterItemStream> MonitoredStateStoreIter<S> { | |||
} | |||
|
|||
fn into_stream(self) -> impl StateStoreIterItemStream { | |||
Self::into_stream_inner(self) | |||
Self::into_stream_inner(self).instrument(tracing::trace_span!("store_iter")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that each time of storage iteration will be shown as a span in the Jaegar's graph? If so, it sounds like there will a large amount of data, and it will actually be row-level instead of "epoch-level".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that each time of storage iteration will be shown as a span in the Jaegar's graph? If so, it sounds like there will a large amount of data,
Yes but only if the tracing level is set to TRACE
. Currently, we follow the logging level filter so the levels should be DEBUG
or INFO
by default in production.
it will actually be row-level instead of "epoch-level".
By "epoch-level" I mean the spans are traced in the granularity of "epoch". That is, operations in different epochs are records under different root spans, then different pages in Jaeger.
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Agree. My idea is also to enable it in production. Let's get it tested after the cloud infrastructure supports collecting these tracing events. |
Signed-off-by: Bugen Zhao <[email protected]>
cc. @arkbriar kube-bench as well |
LGTM. |
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
cc. @huangjw806 |
Sounds great! Several questions:
cc. @Nebulazhang Please take a look at the OTLP-compatible services and see if there are production ready solutions. |
Please help create an issue under the risingwavelabs/risingwave-operator repo about the tracing support. Thanks! |
I've investigated Grafana Tempo these days and found it to be a seamless experience since we already use Grafana and Loki in the cloud. We can further integrate them together and navigate between traces and logs or metrics with ease. |
Signed-off-by: Bugen Zhao <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #10315 +/- ##
==========================================
- Coverage 70.48% 70.46% -0.02%
==========================================
Files 1258 1260 +2
Lines 214803 214944 +141
==========================================
+ Hits 151403 151464 +61
- Misses 63400 63480 +80
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 5 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This reverts commit 1705f2a.
Signed-off-by: Bugen Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Bugen Zhao <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Introduce distributed epoch-level tracing across the meta node and all compute nodes in a cluster, which could be helpful for developers to investigate barrier latency. (#10000)
Preview:
How to read this timeline
x
) is about to inject, ends when the next barrier (with previous epochx
) is fully collected and committed. This includes the whole lifetime of the epochx
in the system.|
in the span) indicating that the next barrier is injected. Therefore, the time from this symbol to the end of the span will be the barrier latency of the next barrier.How to enable distributed tracing
Tracing
withrisedev configure
and adduse: jaeger
to the RiseDev profile. You will see the Jaeger UI URL after launching.RW_TRACING_ENDPOINT
env to its OTLP gRPC server.How does it work
TracingContext
is introduced in this PR.Barrier
proto and other related request bodies.How to add more spans or events here
target: "rw_tracing"
to anevent!
to only show it in Jaeger without outputting it into the log.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note