Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context propagation to client instrumentation #3825

Closed
wants to merge 22 commits into from

Conversation

jpkrohling
Copy link
Member

@jpkrohling jpkrohling commented Jan 15, 2024

Changes

This spec change adds the notion of Client Propagators, needed for the use-cases described at #3811. It's not clear to me whether this change warrants an OTEP, and I'll gladly create one if needed.

Edit: here's a prototype showing how this can be used in the Go SDK: jpkrohling/opentelemetry-go@c60efdf#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693

Related issues:

Checklist

Signed-off-by: Juraci Paixão Kröhling [email protected]

@jpkrohling jpkrohling requested review from a team January 15, 2024 13:26
@jpkrohling
Copy link
Member Author

cc @cedricziel, @pyohannes

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's missing for me is the discussion of the mechanics of backwards propagation. In the current wording the ClientPropagator sounds nearly symmetrical to classic forward propagators, but it is very different. It is effectively a one-hop mechanism, since it is not possible to implement a multi-hop "backtrace" solution (e.g. sampling on errors) given the existing Context APIs. I don't believe the intention of this PR is to solve the latter, so I would make the one-hop nature explicit.

Comment on lines 352 to 353
OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting
concern values such as `Span`s (usually only the `SpanContext` portion) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting
concern values such as `Span`s (usually only the `SpanContext` portion) and
OpenTelemetry uses `Propagators` to serialize and deserialize payloads for
cross-cutting applications, such as `Span`s (usually only the `SpanContext` portion) and

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness sake this PR should include any changes to the Trace spec as well. The Trace API uses the spanid from the propagated context as the parent of the span created further or create a root span when there is no context propagated. When a Client propagator is configured, what is the expected behavior for what used to be a root span previously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I had only the "client instrumentation" case in mind when writing this. For the classic microservices case, I'm not sure what use the client propagation would be used for: if A makes a client to B and includes a trace context, does it need to know anything at all about B's context?

Or perhaps B is creating a new trace ID and you'd want it as a span link on A's call? I'm open to ideas here.

cc @yurishkuro

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a valid question how a response received from the server is meant to be handled. For example, it cannot be stored in the context as that new context will immediately go out of scope. So while what's described in the PR now is reasonably abstract and stand-alone, it is not actually useful for the stated purposes without additional clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro, I can clarify that from a client perspective, but I'm interested in seeing if you have suggestions for the server-to-server scenario. Or perhaps this first version should exclusively be about client<-server, and not server<-server?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be open to having this as an enhancement to this on a follow-up proposal? I feel like this needs more discussion, and that it's not entirely related to the goal of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would like to see a prototype. I do not see a working solution in this proposal, only a piece of it, the physical propagation of context upwards, but not how it can be used in the client, in which layers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling I'm okay with keeping the usage of the response propagator in a separate PR but it's up to the approvers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro, can you confirm you've seen this example? Or did you mean you expect a PoC with a browser-based example?

jpkrohling/opentelemetry-go@c60efdf#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro, I can clarify that from a client perspective

I don't see see any indication or what an SDK or instrumentation should do, or can be configured to do when it encounters receives context encountered backwards. So this seems like something that could be theoretically useful without a practical way to be useful. Without some corresponding spec on what should happen when these are received, its not clear what we gain by standardizing this.

I know this is experimental and would evolve, but would be good to include some sort of hint on how this is expected to evolve. Do we imagine this SDKs or instrumentation or something else to handle these? Presumably the behavior would vary by configured ResponsePropagator on the client side. What might the receiver of W3C tracesponse be expected to do?

@mmanciop
Copy link

mmanciop commented Jan 15, 2024

I personally would find more intuitive something on the lines of PropagatorToCaller, CallerBackwardsPropagator or BackwardsPropagator

specification/configuration/sdk-environment-variables.md Outdated Show resolved Hide resolved
specification/context/api-propagators.md Outdated Show resolved Hide resolved
specification/context/api-propagators.md Outdated Show resolved Hide resolved
specification/context/api-propagators.md Outdated Show resolved Hide resolved
@jpkrohling
Copy link
Member Author

It is effectively a one-hop mechanism

Good call, I think it was implied in one of the paragraphs, but explicit is better than implicit. I'm careful here to not prescribe too much, leaving room for the concrete implementations. I could add one example using Server-Timing, making sure to note that it's not an example that might end up in an implementation (given how things can still move on the W3C TraceContext spec).

@jpkrohling
Copy link
Member Author

Of the suggested names, I think I prefer BackwardsPropagator the most. I'm just not sure whether it should be "backward" or "backwards"; perhaps a native speaker can help us here?

specification/context/api-propagators.md Outdated Show resolved Hide resolved
specification/context/api-propagators.md Outdated Show resolved Hide resolved
specification/configuration/sdk-environment-variables.md Outdated Show resolved Hide resolved
@pyohannes
Copy link
Contributor

It is effectively a one-hop mechanism, since it is not possible to implement a multi-hop "backtrace" solution (e.g. sampling on errors) given the existing Context APIs.

@yurishkuro Would backpropagation im combination with deferred sampling qualify as a multi-hop solution, or do you have something else in mind?

With deferred sampling in place and the client changing its sampling decision based on back-propagated context, sampling decisions can eventually be back-propagated through multiple hops.

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specification change needs stability markers (and clearly outline new sections from existing stable sections).

Additionally, I'd like to see a prototype implementation somewhere in any SDK language (branch/fork is fine) before approving.

specification/context/api-propagators.md Outdated Show resolved Hide resolved
specification/context/api-propagators.md Outdated Show resolved Hide resolved
Comment on lines 352 to 353
OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting
concern values such as `Span`s (usually only the `SpanContext` portion) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness sake this PR should include any changes to the Trace spec as well. The Trace API uses the spanid from the propagated context as the parent of the span created further or create a root span when there is no context propagated. When a Client propagator is configured, what is the expected behavior for what used to be a root span previously?

specification/context/api-propagators.md Show resolved Hide resolved
@yurishkuro
Copy link
Member

Of the suggested names, I think I prefer BackwardsPropagator the most.

I would rather go for ResponsePropagator, to emphasize one-hop, not a transitive backwards prop.

@yurishkuro
Copy link
Member

@yurishkuro Would back propagation in combination with deferred sampling qualify as a multi-hop solution, or do you have something else in mind?

@pyohannes yes, in theory, but there is no specification how that would work.

@jpkrohling jpkrohling force-pushed the jpkrohling/issue3811 branch from 7a3e4e9 to d104bb2 Compare January 23, 2024 13:35
@jpkrohling
Copy link
Member Author

@jsuereth:

This specification change needs stability markers (and clearly outline new sections from existing stable sections).

Can you please take another look? I added stability markers to the new sections, but I think it's important to mention the new response propagator elsewhere. I marked the propagator as experimental there in one place but not in the overview. If you think it's not enough, let me know: I think the current state should be clear enough to determine what's experimental and what's stable, while still maintaining the readability.

@jpkrohling
Copy link
Member Author

I would make the one-hop nature explicit

I added a paragraph about this, let me know if you think it's sufficient:

The trace context obtained from response propagators are meant to be consumed
only by the caller and shouldn't be propagated further back.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

specification/configuration/file-configuration.md Outdated Show resolved Hide resolved
specification/context/api-propagators.md Outdated Show resolved Hide resolved
specification/context/api-propagators.md Show resolved Hide resolved
@jpkrohling jpkrohling force-pushed the jpkrohling/issue3811 branch from e631ca7 to 4329fe3 Compare January 31, 2024 14:41
CHANGELOG.md Outdated Show resolved Hide resolved
specification/configuration/sdk-environment-variables.md Outdated Show resolved Hide resolved
@jpkrohling
Copy link
Member Author

I think this is ready for a final review. I added a small PoC using the Go SDK as base, showing how this could work:

jpkrohling/opentelemetry-go@c60efdf#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693

There's one open question, which I think we can handle as part of a follow-up discussion: what should a client do when there's already a trace context (like in a server<-server response). IMO, the incoming traceresponse should only be used to add a span link, but I'm not sure where this should be specified (if at all).

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@jpkrohling
Copy link
Member Author

Status update: I'm totally swamped right now, and asked for the Faro Web SDK folks here at Grafana Labs to help me with the PoC (cc @codecapitano). @johnbley , I shared your offer with them as well, I'd love to see some collaboration there :-)

Since I don't have any new events on my schedule until OTel Community Day, I should be able to clear my queue by the time the PoC gets done.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@scheler
Copy link
Contributor

scheler commented Apr 29, 2024

@jpkrohling Client Instrumentation SIG would be a good place to take this forward. We discussed this topic in the Client SIG with @johnbley a few weeks ago and we realized a couple things -

  1. That the traceresponse spec supports only span linking across browser span and server span, and not help with a parent-child relationship.
  2. The current document-load instrumentation suggests setting up a parent-child relationship in a single trace by having the server send its span's parent context in a meta tag by name traceparent.

While both have their pros and cons, it would be good to discuss how we want to take this forward.

@jpkrohling
Copy link
Member Author

Based on this latest information, I'm marking this as draft.

@jpkrohling jpkrohling marked this pull request as draft April 30, 2024 08:37
Copy link

github-actions bot commented May 8, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 8, 2024
@jpkrohling jpkrohling removed the Stale label May 8, 2024
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 29, 2024
@codecapitano
Copy link

codecapitano commented May 29, 2024

Small update:
@jpkrohling handed the task over and we (@codecapitano, @cedricziel) will continue the PoC work.

@github-actions github-actions bot removed the Stale label May 30, 2024
Copy link

github-actions bot commented Jun 6, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 6, 2024
@jpkrohling jpkrohling removed the Stale label Jun 7, 2024
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.