Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce distributed epoch-level tracing #10315

Merged
merged 28 commits into from
Jun 19, 2023
Merged

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Jun 13, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Introduce distributed epoch-level tracing across the meta node and all compute nodes in a cluster, which could be helpful for developers to investigate barrier latency. (#10000)

Preview:

image

How to read this timeline

  • Different colors represent different nodes in the cluster.
  • Click on the span to get more structural information.
  • The meta span starts when a barrier (with current epoch x) is about to inject, ends when the next barrier (with previous epoch x) is fully collected and committed. This includes the whole lifetime of the epoch x in the system.
  • There'll be an "info event" (marked as a black | in the span) indicating that the next barrier is injected. Therefore, the time from this symbol to the end of the span will be the barrier latency of the next barrier.
  • For each actor or executor, the span starts when it is first polled after the barrier is yielded, ends just before it yields the next barrier.
  • Some significant storage spans are included in this PR. We can add more spans in the future.

How to enable distributed tracing

  • For local development, enable Tracing with risedev configure and add use: jaeger to the RiseDev profile. You will see the Jaeger UI URL after launching.
  • For manual deployments, start any OTLP-compatible tracing service and set RW_TRACING_ENDPOINT env to its OTLP gRPC server.

How does it work

  • The core idea is to serialize the tracing span to a trace context and propagate it through the wire. A utility named TracingContext is introduced in this PR.
  • For standard RPC calls, a middleware can be introduced and do this automatically in the request headers. (not in this PR)
  • There's no intuitive client-server hierarchy for barrier propagation. So we put the trace context manually in the Barrier proto and other related request bodies.

How to add more spans or events here

  • All existing logs (events) will be reported and recorded in Jaeger, following the same filtering configuration as stdout.
  • Add target: "rw_tracing" to an event! to only show it in Jaeger without outputting it into the log.
  • To add spans in the scope of the execution, follow the documentation here.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR contains user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

BugenZhao added 11 commits June 12, 2023 20:11
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 3611 files.

Valid Invalid Ignored Fixed
1608 1 2002 0
Click to see the invalid file list
  • src/meta/src/barrier/trace.rs

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao marked this pull request as ready for review June 14, 2023 11:37
Signed-off-by: Bugen Zhao <[email protected]>
@@ -93,7 +93,7 @@ message Barrier {
ResumeMutation resume = 8;
}
// Used for tracing.
bytes span = 2;
map<string, string> tracing_context = 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never persist this message so it's okay.

Comment on lines +61 to +69
// Drop the span as the inner executor has finished processing the barrier (then all
// data from the previous epoch).
let _ = std::mem::replace(&mut span, Span::none());

yield message;

// Create a new span after we're called again. Now we're in a new epoch and the
// parent of the span is updated.
span = new_span();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering here could be really subtle. I find the current implementation to be the most understandable.

let (subtask, executor) = subtask::wrap(executor, actor_context.id);
subtasks.push(subtask);
executor.boxed()
// TODO(bugen): subtask does not work with tracing spans.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering removing this optimization or introduce fragment-level implementation back, as we're fearless to No-Shuffle exchange anymore.

Comment on lines +330 to +332
// TODO(bugen): better service name
// https://github.com/jaegertracing/jaeger-ui/issues/336
format!("{}-{}", settings.name, id),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically for web services, the nodes with the same service type (for HA purposes) should have the same name, so they'll not be distinguishable in Jaeger UI. However I find it not that fit our usages so I add the unique ID here.

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM!

By the way, I am wondering how much overhead (i.e. performance loss) it takes to run tracing? It would be nice if it can be enabled all the time.

@@ -281,9 +292,67 @@ pub fn init_risingwave_logger(settings: LoggerSettings, registry: prometheus::Re
});
};

let filter = filter::Targets::new().with_target("aws_smithy_client::retry", Level::DEBUG);
// Tracing layer
if let Ok(endpoint) = std::env::var("RW_TRACING_ENDPOINT") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May print a log line to show tracing has been enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we cannot get it printed before the logger is initialized. 😂 I put a println here.

@@ -367,7 +368,7 @@ impl<S: StateStoreIterItemStream> MonitoredStateStoreIter<S> {
}

fn into_stream(self) -> impl StateStoreIterItemStream {
Self::into_stream_inner(self)
Self::into_stream_inner(self).instrument(tracing::trace_span!("store_iter"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that each time of storage iteration will be shown as a span in the Jaegar's graph? If so, it sounds like there will a large amount of data, and it will actually be row-level instead of "epoch-level".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that each time of storage iteration will be shown as a span in the Jaegar's graph? If so, it sounds like there will a large amount of data,

Yes but only if the tracing level is set to TRACE. Currently, we follow the logging level filter so the levels should be DEBUG or INFO by default in production.

it will actually be row-level instead of "epoch-level".

By "epoch-level" I mean the spans are traced in the granularity of "epoch". That is, operations in different epochs are records under different root spans, then different pages in Jaeger.

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao
Copy link
Member Author

By the way, I am wondering how much overhead (i.e. performance loss) it takes to run tracing? It would be nice if it can be enabled all the time.

Agree. My idea is also to enable it in production. Let's get it tested after the cloud infrastructure supports collecting these tracing events.

Signed-off-by: Bugen Zhao <[email protected]>
@fuyufjh
Copy link
Member

fuyufjh commented Jun 16, 2023

By the way, I am wondering how much overhead (i.e. performance loss) it takes to run tracing? It would be nice if it can be enabled all the time.

Agree. My idea is also to enable it in production. Let's get it tested after the cloud infrastructure supports collecting these tracing events.

cc. @arkbriar kube-bench as well

@fuyufjh
Copy link
Member

fuyufjh commented Jun 16, 2023

LGTM.

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
@arkbriar
Copy link
Contributor

By the way, I am wondering how much overhead (i.e. performance loss) it takes to run tracing? It would be nice if it can be enabled all the time.

Agree. My idea is also to enable it in production. Let's get it tested after the cloud infrastructure supports collecting these tracing events.

cc. @arkbriar kube-bench as well

cc. @huangjw806

@arkbriar
Copy link
Contributor

By the way, I am wondering how much overhead (i.e. performance loss) it takes to run tracing? It would be nice if it can be enabled all the time.

Agree. My idea is also to enable it in production. Let's get it tested after the cloud infrastructure supports collecting these tracing events.

Sounds great! Several questions:

  • How's the overhead when tracing is enabled?
  • Does the status of the tracing service affect how RisingWave behaves? IIUC, it shouldn't.
  • How much data will be produced every day?

cc. @Nebulazhang Please take a look at the OTLP-compatible services and see if there are production ready solutions.

@arkbriar
Copy link
Contributor

Please help create an issue under the risingwavelabs/risingwave-operator repo about the tracing support. Thanks!

@BugenZhao
Copy link
Member Author

cc. @Nebulazhang Please take a look at the OTLP-compatible services and see if there are production ready solutions.

I've investigated Grafana Tempo these days and found it to be a seamless experience since we already use Grafana and Loki in the cloud. We can further integrate them together and navigate between traces and logs or metrics with ease.

Signed-off-by: Bugen Zhao <[email protected]>
@codecov
Copy link

codecov bot commented Jun 19, 2023

Codecov Report

Merging #10315 (5ed1836) into main (ade16ec) will decrease coverage by 0.02%.
The diff coverage is 41.53%.

@@            Coverage Diff             @@
##             main   #10315      +/-   ##
==========================================
- Coverage   70.48%   70.46%   -0.02%     
==========================================
  Files        1258     1260       +2     
  Lines      214803   214944     +141     
==========================================
+ Hits       151403   151464      +61     
- Misses      63400    63480      +80     
Flag Coverage Δ
rust 70.46% <41.53%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/cmd/src/lib.rs 0.00% <0.00%> (ø)
src/cmd_all/src/bin/risingwave.rs 0.00% <0.00%> (ø)
src/common/src/config.rs 82.12% <ø> (-0.17%) ⬇️
src/common/src/util/mod.rs 0.00% <ø> (ø)
src/compute/src/lib.rs 0.00% <ø> (ø)
src/compute/src/rpc/service/stream_service.rs 0.00% <0.00%> (ø)
src/meta/src/barrier/progress.rs 7.73% <0.00%> (ø)
src/risedevtool/src/service_config.rs 0.00% <ø> (ø)
src/risedevtool/src/task/compactor_service.rs 0.00% <0.00%> (ø)
src/risedevtool/src/task/compute_node_service.rs 0.00% <0.00%> (ø)
... and 20 more

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@BugenZhao BugenZhao enabled auto-merge June 19, 2023 07:27
Copy link
Member

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@BugenZhao BugenZhao added this pull request to the merge queue Jun 19, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 19, 2023
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao enabled auto-merge June 19, 2023 09:14
@BugenZhao BugenZhao added this pull request to the merge queue Jun 19, 2023
Merged via the queue into main with commit 11768fd Jun 19, 2023
@BugenZhao BugenZhao deleted the bz/epoch-tracing branch June 19, 2023 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants