Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix overridden Jaeger span duration #195

Merged
merged 4 commits into from
Dec 16, 2022

Conversation

jclee
Copy link
Contributor

@jclee jclee commented Nov 30, 2022

When starting to record a Jaeger span, we do two different clock reads -- we read the calendar clock for the timestamp, and we read from the monotonic clock for the start of the span duration. We also allow callers to override the timestamp, to record a span that started earlier. But this override was only affecting the timestamp, not the start of the span duration, so durations for overridden spans were incorrect.

Ideally, we'd want callers to provide both the monotonic and calendar clock measurements for their span overrides, but as a stopgap we can compute the elapsed calendar time at the time of the override and subtract it from the current monotonic clock time, to estimate the intended duration start time.

(The official Jaeger C++ library has a more complex system, allowing users to specify both clock overrides, or just one or the other -- in which case the missing value is estimated using a similar calculation:

https://github.com/jaegertracing/jaeger-client-cpp/blob/e56b1bdb/src/jaegertracing/Tracer.cpp#L41
https://github.com/opentracing/opentracing-cpp/blob/06b57f48/include/opentracing/util.h#L40-L63

It appears the rust opentelemetry library doesn't have this problem, since it just uses monotonic clocks for everything... which presumably has a different set of problems.)

@jclee jclee requested a review from harrishancock November 30, 2022 20:32
@github-actions
Copy link

github-actions bot commented Nov 30, 2022

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@jclee
Copy link
Contributor Author

jclee commented Nov 30, 2022

I have read the CLA Document and I hereby sign the CLA

@jasnell
Copy link
Member

jasnell commented Dec 3, 2022

Added the needs-internal-pr label to ensure that this doesn't land without the internal PR being ready to go. We can remove the label once the readiness of the internal PR is confirmed.

Copy link
Collaborator

@harrishancock harrishancock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits; thanks for fixing this!

@@ -253,9 +253,10 @@ Trace::Exception::Exception(rpc::Trace::Exception::Reader reader)
name(kj::str(reader.getName())),
message(kj::str(reader.getMessage())) {}

Tracer::Span::Span(kj::Own<Tracer> tracer, kj::Maybe<Jaeger::SpanData> spanData)
Tracer::Span::Span(kj::Own<Tracer> tracer, kj::Maybe<Jaeger::SpanData> spanData,
kj::TimePoint durationStartTime)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we typically use 4-space indentation for line continuations, even in constructors (which looks a bit funny since it aligns with the :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think I copy-pasted the definition from the header and neglected to tidy up the spacing.

: tracer(kj::mv(tracer)), spanData(kj::mv(spanData)),
durationStartTime(kj::systemPreciseMonotonicClock().now()) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this doesn't need/have an isPredictableModeForTest()...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wondered about that, too... and probably wrote the original code. :I It's because the measurement is only used for calculating the duration, and the duration calculation itself is covered by isPredictableModeForTest().

I guess it wouldn't hurt much to just use isPredictableModeForTest() everywhere, or define a common function that does that. We already have several places that are defining something like getTimeForJaegerSpan() locally.

Copy link
Collaborator

@harrishancock harrishancock Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, a comment would probably suffice.

Edit: Probably doesn't matter.

@@ -494,7 +494,8 @@ Tracer::Span Tracer::makeSpan(kj::StringPtr operationName,
}
kj::TimePoint durationStartTime = kj::origin<kj::TimePoint>();
if (parentSpanContext != nullptr || overrideParent != nullptr) {
durationStartTime = kj::systemPreciseMonotonicClock().now();
auto delta = kj::systemPreciseCalendarClock().now() - overrideStartTime;
durationStartTime = kj::systemPreciseMonotonicClock().now() - delta;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to have an explanatory comment and/or a TODO to require both start time overrides in the future.

@jclee jclee force-pushed the jlee/fix-overridden-jaeger-span-duration branch from 7f95471 to 51ede46 Compare December 12, 2022 23:43
@jclee jclee force-pushed the jlee/fix-overridden-jaeger-span-duration branch from 51ede46 to 113d1bf Compare December 12, 2022 23:54
@jclee
Copy link
Contributor Author

jclee commented Dec 12, 2022

Fixed indentation, added TODO for duration override, added internal PR, and rebased.

I'm not sure what the overhead of clock reads is, but presumably it would be
good to avoid the overhead when not tracing, which should be the common case.
… time

Fixes incorrect durations seen on overridden spans.
@jclee jclee force-pushed the jlee/fix-overridden-jaeger-span-duration branch from 113d1bf to 750864b Compare December 16, 2022 03:25
@jclee jclee merged commit 644d55b into main Dec 16, 2022
@jclee jclee deleted the jlee/fix-overridden-jaeger-span-duration branch December 16, 2022 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants