-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Suppress Tracing context key #1653
Add Suppress Tracing context key #1653
Conversation
4a162f3
to
4f3c671
Compare
4f3c671
to
53ef6f2
Compare
@open-telemetry/specs-approvers Take a look at this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am -1 on this change in the current form, it feels rushed to me (lacks extensibility), without a use case clearly documented
#530 is mostly about stopping tracing, not stopping all instrumentation. This change would make a lot more sense to me if it was about stopping tracing only, but even then a single boolean is a crude design that cannot be extended further. |
I don't understand the need to be able to extend this key in the future. The whole point of having context with opaque keys was to be able to create keys that mean different things. We can make one for tracing and in the future if we want another signal disabled (or all signals) we can just create a new key. |
…ontrib/opentelemetry-specification into suppress-instrumentation
@yurishkuro modified to only suppress tracing by creating non-recording spans and preventing injection. Extract and metrics removed from the wording. |
I want to add that this mechanism is being used by me in few instrumentation libraries and proven to be very handy. It is useful to reduce the visual and performance overhead of collecting non-interesting underlying implementation details of high-level libraries such as:
You can read more about it here |
I probably miss something fundamental here, but why #530 cannot be solved by using Samplers? Especially point 1, of health checks and such. |
+1
My problem is not with the opaque key, but with growing the surface of the API. For comparison, the OpenTracing API had 6 methods in total: inject, extract, start, finish, setTag, addLog. Here we're proposing two new methods to the public API surface to account for a relatively obscure use case. And we're not even making it extensible so that the cognitive overhead of wider API surface could be amortized for more use cases in the future (back to my original #1653 (comment)). |
haha i think i might also be missing something because I can't see how samplers could fix this issue in an automatic way without using context. how does the exporter/spanprocessor signal to the sampler that this is an export span that shouldn't be sampled? |
sorry for the accidental close and reopen while trying to comment
This is not really a new API, simply leveraging an existing mechanism. Even if it is a new API, I don't think we can use 'growing API' as a counterargument because then no new APIs will ever be added.
i'm not sure I agree that the use case is obscure. suppressing tracing is used by at least every exporter that uses an instrumented transmission channel like http. additionally, @blumamir already cited several examples where he found this useful.
I must not be understanding what you mean by extensibility. To me it seems like it would be easy to add another key to the list of predefined keys in the future. If you're asking for this one key to be able to disable multiple signals in a configurable way, I would argue that is more confusing than just having multiple keys. |
In my opinion, the transport spans can be interesting to see in some cases (debugging low-level issues, performance issues, etc), but most of the time they just add noise to the trace and distract the attention from the logical operation. Allowing the user to turn this instrumentation feature on and off via config option can be very handy to support a wide range of users with different needs. Before this mechanism was introduced, we had to "clean" the trace in the backend which is possible but more tedious. |
@@ -183,6 +183,8 @@ When asked to create a Span, the SDK MUST act as if doing the following in order | |||
A non-recording span MAY be implemented using the same mechanism as when a | |||
`Span` is created without an SDK installed or as described in | |||
[wrapping a SpanContext in a Span](api.md#wrapping-a-spancontext-in-a-span). | |||
If the [`SuppressTracing`](./api.md#suppress-tracing) `Context` flag is set, | |||
the newly created `Span` should be a non-recording `Span`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a situation where it might be desired to suppress tracing but then re-enable it later in the same trace? For example a chain of 5 middlewares (all instrumented by a common instrumentation) where a user suppresses tracing in middleware 2 and then enable it again in middleware 4. If we generate non-recording spans, the spans from middleware 4 and 5 will not will not be children of the span generated by middleware 2. Not recording any spans when tracing is suppressed instead would allow this case to generate correct traces.
This could also cause issues with context propagation. If tracing is suppressed on a code path that end up making a network call and there is user/instrumentation code that inject trace context into the outgoing call, it'll end up either not injecting context or injecting non-recording span's span ID depending on the implementation of non-recording span.
May be there are other such cases.
If we were to actually suppress tracing, i.e, not record any spans at all instead of non-recording spans, both of these cases would work. Neither solution is perfect but not recording anything would probably result in traces that are "less broken".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a situation where it might be desired to suppress tracing but then re-enable it later in the same trace?
If we generate non-recording spans, the spans from middleware 4 and 5 will not will not be children of the span generated by middleware 2. Not recording any spans when tracing is suppressed instead would allow this case to generate correct traces
Sure I think that's a valid usecase. It's definitely not solved by this though for the reasons you stated. It's not easy to solve either because this puts the burden on the caller to not create spans. If an instrumentation doesn't check the context key before calling startSpan
, then it can't be suppressed.
This could also cause issues with context propagation. If tracing is suppressed on a code path that end up making a network call and there is user/instrumentation code that inject trace context into the outgoing call, it'll end up either not injecting context or injecting non-recording span's span ID depending on the implementation of non-recording span.
Yes, it was my intention to suppress injection. For the usecases described, I think this is the desired behavior.
If we were to actually suppress tracing, i.e, not record any spans at all instead of non-recording spans, both of these cases would work. Neither solution is perfect but not recording anything would probably result in traces that are "less broken".
The spec states startSpan
MUST return something implementing the Span interface, so I don't really see any option other than returning a non-recording span. Open to ideas if you have them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if context API could act like a stack in this case. Suppressing instrumentation could store a reference to the active span and undoing it later would re-activate that span again. That'd solve both the cases but it probably bring up more issues and complicates implementation too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context API already works like a stack since context is immutable, you need a new copy for nested scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I think that's a valid usecase. It's definitely not solved by this though for the reasons you stated. It's not easy to solve either because this puts the burden on the caller to not create spans. If an instrumentation doesn't check the context key before calling startSpan, then it can't be suppressed.
True but if we make it part of the spec or semantic conventions then it's very likely they that they will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True but if we make it part of the spec or semantic conventions then it's very likely they that they will.
In my experience this is not true. Instrumentations created by SIGs will, but the community is very unlikely to read the spec that closely.
Interesting issue, this might help solving my long standing issue at work were a lot of spans are created for RPC and HTTP by instrumentation libraries spans while I don't really care about those in some situations, when calling a specific function/action but do care about in all other cases. For example, I am using Datastore which uses RPC for it calls means I am having spans for dns (for the host), RPC connect, my database client span, my database sql builder span, etc. In this case I want to suppress the RPC connect, NET connect, DNS connect spans but only in one case were I doing many calls to Datastore or when I am calling many underlaying REST services in one class/function but still have the behaviour everywhere else. E.g.
|
This seems to be less trivial than I initally thought.
After reading all the feedback on samplers, I think this could indeed also work: We provide, in the SDK, a Sampler EDIT: The other difference with the sampler solution would be that the current PR suggests not to inject at all if sampling is suppressed while with the current sampling decisions we would inject a span context, just with sampled flag set to zero. I created #1663 because I think this is an idea worth thinking about. |
That use case might be better solved by providing samplers access to a readable parent span (I think this already works in some languages with some casting, since the sampler already gets the full parent context) |
If the method to set the flag on the context is in the SDK, then this cannot be leveraged by Instrumentations to suppress underlying, right? (unless instrumentation takes SDK dependency.) |
I would like to avoid instrumentations taking on the SDK as a dependency since this is essentially the primary reason for the strict API/SDK split. If a library author builds opentelemetry into their library they should only have to depend on the API |
To quote #530
Please address alternative solutions to these issues. The problem is so non-obscure and non-rushed that Python and Ruby implementations independently created this solution over a year ago. |
@pauldraper Are you asking me to address alternatives? |
Sorry, @yurishkuro |
Discussed this in the spec SIG today. This PR boils down to 3 use-cases
Prevent Exporter LoopsThis can be solved in the SDK only with no API change and would not require this API to exist. Instrumentations Suppress Lower LevelsThis is a special case of a larger question: how should we handle cases where an instrumentation wants to be the "final" span? This PR suggests that they be suppressed and not traced, but there may be other solutions to this problem. Suppress Noisy SpansIn some ways this is an extension of (2), but was not my original target in this PR and is possibly better solved with other mechanisms like smarter sampling. TakeawayFor now, the specification SIG suggested that JS SIG should move this functionality out of the API into some sort of API extensions package. Instrumentations can rely on this package which our SDK will respect, but third party SDKs will be under no official obligation to respect. I am going to leave this open because I still believe the mechanism has value and I have not yet seen a proposal for solving (2) that is obviously better than this. The linked issue is currently one of our oldest open issues and having multiple languages already having implemented this on the SDK level seems to me to show the value in this mechanism. |
How? Example situation: gRPC module uses the http module. The gRPC module is instrumented. The http module is instrumented. The gRPC instrumentation creates a client span. The http module creates a client span. Client spans are now doubled up which AFAIK is a problem for systems with two-sided spans like Zipkin. To prevent doubled-up client spans, the gRPC instrumentation (which uses only the API) needs a way to tell the http instrumentation (which uses only the API) to not trace. This is a very common situation. Am I missing something? |
Your example isn't an exporter loop, which is where you snipped your comment from. |
This is only one possible solution. Another theoretically (not currently) possible solution is that the SDK user / application owner configures the SDK to suppress any local child spans of spans with type=CLIENT and having an IMHO, having the grpc instrumentation decide whether or not the child spans should be suppressed is wrong. The HTTP child spans are potentially useful, e.g., if you have a specific HTTP-related issue you want to debug. |
Ah, yes. My bad. Regardless, I would like to understand the group's answer to that.
FWIW I agree, I think client/server spans are terrible. Zipkin two-sided spans are terrible. Protocols are inherently nested and Otel seems to ignore that (#526). But, under the current Opentelemetry approach, it seems that double-client spans are Bad, yet there is no good solution to prevent them. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@dyladan As the related issue will still exists, are you ok with closing this PR? |
I don't mind if we close this issue as long as we eventually solve this use-case. The issue has been around for over a year with no movement |
Oh definitely, let's keep #530 open until we finally solve it for all SIGs in an uniform way. Thanks! |
Fixes #530
As discussed in the specification SIG today, this adds a predefined context key for suppressing instrumentation.