-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUM-3470 feat: Head-based sampling for local and (automatic) distributed tracing #1783
RUM-3470 feat: Head-based sampling for local and (automatic) distributed tracing #1783
Conversation
let spanContext1 = DDSpanContext(traceID: .init(idHi: 10, idLo: 100), spanID: 200, parentSpanID: .mockAny(), baggageItems: .mockAny()) | ||
let spanContext2 = DDSpanContext(traceID: 3, spanID: 4, parentSpanID: .mockAny(), baggageItems: .mockAny()) | ||
let spanContext1 = DDSpanContext(traceID: .init(idHi: 10, idLo: 100), spanID: 200, parentSpanID: .mockAny(), baggageItems: .mockAny(), sampleRate: .mockRandom(), isKept: .mockRandom()) | ||
let spanContext2 = DDSpanContext(traceID: 3, spanID: 4, parentSpanID: .mockAny(), baggageItems: .mockAny(), sampleRate: .mockRandom(), isKept: .mockRandom()) | ||
|
||
let httpHeadersWriter = HTTPHeadersWriter(sampler: .mockKeepAll()) | ||
let httpHeadersWriter = HTTPHeadersWriter(sampler: Sampler.mockKeepAll()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This surfaces the problem of head-based sampling in manual distributed tracing. No matter of the isKept
decision made by tracer for spanContext
, distributed trace headers are sampled exclusively using the sampler configured in *HTTPHeadersWriter
.
I will work on this in "part 2" of this PR.
Datadog ReportBranch report: ✅ 0 Failed, 3004 Passed, 0 Skipped, 12m 19.28s Wall Time 🔻 Code Coverage Decreases vs Default Branch (13)
|
b552be4
to
39341f0
Compare
sampleRate: parentSpanContext?.sampleRate ?? sampler.samplingRate, | ||
isKept: parentSpanContext?.isKept ?? sampler.sample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 This is the main point for head-based sampling implementation in this PR:
- use the sampling decision from the parent span (if available)
- or determine it based on provided sampler (which can be local or distributed trace sampler).
…atic distributed tracing
39341f0
to
8a89706
Compare
Identifier = "HeadBasedSamplingTests/testSamplingLocalTrace()"> | ||
Identifier = "HeadBasedSamplingTests/testSendingDroppedDistributedTraceWithNoParent_throughTracerAPI()"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Some tests were renamed, some "red" ones were enabled (now "green") and some new "red" tests were added (disabled) to be fixed in next PR.
/// Returns the trace of the current execution context. | ||
func traceContext() -> TraceContext? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 No longer needed. It was actually doubling the effort:
- active span was obtained once in
modify(request:)
of Trace handler, - obtained second time in
intercept(task:)
inNetworkInstrumentationFeature
, - it was no-op in RUM handler.
Now we read it only once, in Trace handler and pass the parentSpanID
in TraceContext
.
@@ -6,6 +6,8 @@ | |||
|
|||
import Foundation | |||
|
|||
// TODO: RUM-3470 Add tests to `URLSessionInterceptor` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 URLSessionInterception
has no tests coverage. I will add tests in next PR. Here, we're merging against feature branch.
func testGivenOpenTracing_whenInterceptingRequests_itInjectsTrace() throws { | ||
// Given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 All these tests are not necessary because we removed traceContext()
in DatadogURLSessionHandler
.
func testTraceContext_whenInterceptionStarts_withActiveSpan_itReturnCurrentSpan() { | ||
// When |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Not necessary with the removal of traceContext()
from DatadogURLSessionHandler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall 👍 but I left a blocking point that we can discuss first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good nothing block.
@@ -58,7 +58,7 @@ public class HTTPHeadersWriter: TracePropagationHeadersWriter { | |||
/// Initializes the headers writer. | |||
/// | |||
/// - Parameter sampler: The sampler used for headers injection. | |||
public init(sampler: Sampler) { | |||
public init(sampler: Sampling) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗
This is a breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically yes, but the Sampler
type is only defined in DatadogInternal
without being re-exported from user-facing modules. In practise, the only supported public initializer for HTTPHeadersWriter
is init(sampleRate:)
, which is not altered by this change.
@@ -21,3 +30,25 @@ public struct Sampler { | |||
return Float.random(in: 0.0..<100.0) < samplingRate | |||
} | |||
} | |||
|
|||
/// A sampler that determines sampling decisions deterministically (the same each time). | |||
public struct DeterministicSampler: Sampling { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for inspiration, I think we can multiple samplers like here https://github.com/open-telemetry/opentelemetry-swift/blob/4eb75bc8f0d6449bc197637b5c6044b51ab8c4ed/Sources/OpenTelemetrySdk/Trace/Samplers/Samplers.swift#L10
I'm particularly interested in ParentBasedSampler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ganeshnj I'm sure what is the actual ask here 🙂, could you clarify? The SDK doesn't expose any sampler implementation to the end-user (interface + impls exist only in DatadogInternal
). Because end-users only work with sampleRate: Float
API today, I don't see reason for introducing more samplers than we need for SDK internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies, it wasn't clear. I mean to say - for our internal working we can use some existing pattern. I wasn't expecting any action on the comment. Just for information if we have tricky cases around sampling decision.
✅ Local trace with implicit parent:
This must be rather from parent-child point of view.
|
It already is:
^ one parent with two childs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
c452a2d
into
ncreated/RUM-3470/head-based-sampling
What and why?
📦 This PR implements head-based sampling for the scenarios defined earlier in #1772. It also adds tests for one more class of scenarios: using distributed trace created manually with
OTFormatWriter
API.See all scenarios for head-based sampling vs their support level:
URLSessionInstrumentation
:URLSessionInstrumentation
with implicit parent:*HTTPHeadersWriter
):*HTTPHeadersWriter
) with implicit parent:The last 2 scenarios are defined as failing (and disabled) tests. They will be supported and enabled in next PR.
How?
I did few changes that enable this feature.
The sampling decision is now a part of
DDSpanContext
In OT, creating child span requires passing the
OTSpanContext
of the parent. This makes a good place for propagating the sampling decision for head-based sampling. For that reason,sampleRate
andisKept
are added toDDSpanContext
.Note: While
isKept
is crucial for sampling the whole trace equally, thesampleRate
is not. We need to propagate it, so it can be later sent inSpanEvent
for computing BE-side metrics.The
OTFormatReader
implementations can now extract the sampling decisionBecause the
OTTracer
can encode and decode theOTSpanContext
:the implementations of HTTP header readers (DD, W3C and B3) were extended to extract the
isKept
andsampleRate
.Note: The
sampleRate
is not encoded as header so it needs to be supplied by thetracer
inextract(reader:)
.NetworkInstrumentationFeature
no longer decodesTraceContext
from request headersI enhanced the implementation of passing
TraceContext
between task interception methods. It now avoids coding it through request headers, which enables passing extra information not available in standard headers (thesampleRate
).Before:
Now:
Note: This simplifies and enhances task interception in the main flow. However, it required additional changes in
URLSessionInterceptor
which is required for Alamofire Extension.Review checklist
Custom CI job configuration (optional)
tools/