-
Notifications
You must be signed in to change notification settings - Fork 166
Automatic propagation of peer.service #247
Automatic propagation of peer.service #247
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This OTEP does not discuss the rollout risks. This new field has a very specific semantics that it must be modified or erased before calling a downstream service, neither of which is the default behavior (default is to propagate state unchanged). So if this behavior is introduced into some SDKs, the data in a company for this specific attribute will be a complete mess until every service deploys the updated SDK version.
In general, I am not a fan of using baggage for things that are supposed to change on every hop, it's a very clear abuse of the mechanism.
|
||
Automatic propagation of `peer.service` through `TraceState`. | ||
|
||
## Motivation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation does not read convincing to me. Why does a service need to know who called it? If this is part of a transitive root cause isolation workflow, then you just use distributed traces for that. If this is about some business-specific behavior depending on who called you, e.g. multi-tenant behavior. then I think this mechanism is quite inappropriate - relying on a deployment/infra name of the caller is a pretty narrow use case, not suitable for general purpose multi-tenancy. So please describe Users and Jobs To Be Done of this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see this being helpful in two scenarios in my work:
- Sampling rule where you look at a given event and can see that its caller is X. For example, filtering out a noisy branches in a trace but that caller is something you want to keep.
- Folks who want to get this information eventually via tracing, but can't today, and so if there's an easier way to "add otel" without fully adopting tracing and getting this caller info, that'd be helpful for them.
Related work, cc @bogdandrutu @kalyanaj |
3bf006e
to
c8b500a
Compare
Hey @yurishkuro Added sampling scenarios that may throw light into how useful this feature could be. Please review. |
|
||
### Use scenarios | ||
|
||
Sampling can benefit from knowing the calling service, specifically: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As sampling is mentioned as a primary usage scenario for this feature, I wonder if would make sense to bundle this together with other sampling related values for which there's a proposal to propagate them via trace state: #235
All those values could then be consistently used and populated by samplers, one wouldn't need to invent a new configurable mechanism in SDKs (at least for the client side).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value makes sense to propagate with the sampling-related attributes in the tracestate
, which are now covered in open-telemetry/semantic-conventions#793. Still, I see it as an independent property.
@carlosalberto I think we should try to build in more protection against accidental propagation of the peer service information. Also, I'm afraid "upstream" can lead to confusion--although unless we do something to avoid accidental propagation, it's literal and true-- the upstream service name would be the nearest ancestor context that happened to set it. I want us to consider a mechanism that helps us scope tracestate variables to limit their impact in the future. I'm thinking of an entirely new tracestate vendor code for information (e.g, That said, this seems like an opportunity to allow peers to exchange more than only their service name. If we had a tracestate field for exchange of arbitrary peer-related variables, then the new SDK configuration knobs would be:
Then, a
Receivers would apply these variables to the context and drop them from the tracestate before creating a new context. |
Closing this as there doesn't seem to be enough interest and I don't plan to work on it myself (It can always be "resurrected" by copying the OTEP file and opening it as a new instance, etc) |
Knowing the service name on the other side of a remote call is valuable troubleshooting information. The semantic conventions represent this via
peer.service
, which needs to be manually populated.This information can be effectively derived in the backend using the
Resource
of the parentSpan
, but is otherwise not available at Collector processing time, where it could be used for sampling and transformation purposes.Defining (optional) automated population of
peer.service
will greatly help adoption of this attribute by users and vendors explicitly interested in this scenario.Based on open-telemetry/semantic-conventions#439