-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Resource to ShouldSample. #1658
Add Resource to ShouldSample. #1658
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@Oberon00 let's split this into 2 PRs to allow progress:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This (written by @jkwatson ) was the general feeling during a SIG call a few weeks ago (hence the comment), namely: do not add stuff to the API at all, unless absolutely necessary. @bogdandrutu would you mind answering the latest's answers on this topic? |
I agree that adding stuff to the API (the SDK's plugin API in this case) should be minimized generally. But one goal behind this is IMHO keeping the implementation flexible, allowing later changes, by not exposing too many internals. Crucially, this contradicts the fact that the readable span ("SpanData") has the Resource as a public property, which would allow the assumption that different Spans provided to the same SpanProcessor (exporter) can have different resources.
I think the last one doesn't matter much (though we seem to support it for spanprocessors/exporters). But the first and second one are a problem. |
@Oberon00 I think exposing the Resource in the SpanData was a mistake that hurts a lot the performance having to do the mapping for OTLP. So I would not repeat that mistake, also the reasoning was because SpanProcessor can be shared across multiple TraceProviders which in practice (never seen this case) could have been avoided and share exporter if needed. So I think as explained before unless we prove with clear real code example that this does not work we should not do it. And by real code I need to see the project that tries to do something with resource and cannot because of this problem. |
Also the example and the motivation that an object is shared between multiple instances is not a good reason to add unnecessary APIs that will limit performance, see #1658 (review) |
@anuraaga please give me a real example when sampling by resource attributes is useful in practice. I've seen extremely limited number of Applications with more than one Provider (personally 0, but let's say they exists) and usually in all these cases I expect that the attributes that are different in the Resource to be the "service" (or at least that was the motivation before to support multiple Providers) and if that is the case you can simply just ignore the resource attributes and set different Samplers per service instance of the Provider. |
I agree with that actually. I think exporters and span processors and samplers should only ever used with a single TracerProvider (span processor). There is also #508 open which would change how resources are passed (probably we need to close that one as its too late). But now we have the Resource in the SpanData and I feel we already have reached consensus that #1298 is worthwhile to implement. If we pass the resource to the sampler, we can just start passing the new resource with the new attributes. @anuraaga What do you think of adding an OnResourceChanged callback once we implement #1298? Would that improve usability? |
I think if the SDK provided a dependency injection mechanism like this (presumably the callback would be called on the first resource too so in practice a way to inject Resource) then that solves the usability issue too. It seems like a bigger change and I'd be worried such a change would get blocked for some other reason, but it seems like the only think to try. |
@Oberon00 if we were to do a breaking change (since this one is) I would suggest to better investigate the path where all of these components are associated with only one Resource and maybe have a hook when the "Resource" changes to support late Resource resolution :) This is my thought, I would better invest my time in doing something that we know is more performant and can improve things than adding more "simple" but not optimal changes. |
We have already "broken" this interface for 1.7 because we added the InstrumentationLibrary, which is why we are discussing this PR now. |
Could we resolve this PR with an additional statement that language SIGs MAY provide optimized Sampler interfaces that bind certain arguments such as Resource and InstrumentationLibrary when they are known to be constant? That way we allow this PR in but @bogdandrutu's reservations are met, and we don't block #1298 (because I wrote "when they are known to be constant", any optimized API the language SIGs create will have to contend with 1298 when the time comes). |
How could that optimized interface look like? |
There is already language in the SDK specification that offers SDKs a lot of freedom in how TraceID and SpanIDs are generated, I'm referring to "When asked to create a Span, the SDK MUST act as if doing the following in order:" Step 2 is "Query the I imagine the specification just saying what the act of calling In Go, |
I think @bogdandrutu has switched arguments over the course of the review: Originally, the suggestion was to add a more complex family of interfaces in order to gain some performance (#1658 (comment)). Now, the main issue with this PR seems to be that we should keep the API minimal and not add unneeded things (#1658 (comment)). |
I believe the underlying problem is that we do not have a data model for Resources. |
Hmm, we do have a data model for the resources themselves IMHO, but the current solution of how they are connected to the actual telemetry is unsatisfying. E.g. you pass it to a tracer provider and separately to a meter? |
I apologize, I over-stated that. I meant to say the data model we have is not strong enough, but that's a topic we should continue discussing in #1298. The resource SDK spec says:
We haven't made any requirements about the "entity" described there, and it means everyone has a different opinion about what this PR means.
Yes. Also, as with #1298, the resource data model is not strong enough to accommodate Prometheus and OpenMetrics (#1298 (comment)). In #1588 you wrote:
Let me suggest that the problem is you're trying to use a Sampler API to suppress Spans. When DynaTrace originally introduced the Named Tracer and instrumentation library concept, it was stated that an SDK may decide to suppress spans based on their library. I.e., it's the I've made statements about how treating Sampler as a means of filtering spans is causing confusion which is an obstacle for probability sampling here: #1844 (comment) If your goal is to suppress spans based on instrumentation library, that should happen before the Sampler is called. If your goal is to suppress spans based on resource, you can likewise do that before the Sampler is called. If your goal is to setup sampling rules that are applied on a per-instrumentation library basis, we should be specifying a View Configuration mechanism for spans that will configure a per-library Sampler (for example). Based on this above reasoning, I'm on the fence about whether InstrumentationLibrary belongs in the sampler API at all. The reason InstrumentationLibrary is different than Resource, and why it was easier to accept, is that InstrumentationLibrary is at least a variable across the "entity producing telemetry", whereas Resource is not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing my approval, now that I've understood the motivation better. If the goal is to suppress spans based on Resource (or Instrumentation Library), better to do that before the Sampler is called.
That's not really possible, the sampler is the only hook that is exposed for that purpose. Replacing the TracerProvider with a custom implementation is usually not practical (one reason being that Resource APIs are only defined in the "official" SDK). |
At least for me the goal is to change sample rate based on service name, or k8s pod name, or hostname, type of resource attributes. Often this is something like seeing metrics report failure on one service or host and wanting to crank up the sampling rate for that service temporarily. This works when sampling configuration is managed centrally, for example with X-ray remote sampling, or even if it just reads a file from etcd. So attributes in the resource seem useful to have available when configuring sampling. I guess instrumentation library may be more geared to suppression, not sure but not resource. |
@jmacd I have to agree with @Oberon00 here: where else can we suppress spans (and how) if not by calling Sampler and returning |
All these attributes you know them from outside (in case of a remote config) so you can decide which config to send to each pod/service instead of having a configuration for each attribute in the same file. Let's get |
This seems like a nice design, especially for an OTel configuration service API. AFAIK current remote sampling APIs can't accept a Resource so this approach doesn't seem to support them well. And I think for medium size teams with a dozen or so microservices on on-prem hosts, a single file pushed through Ansible or such seems reasonable - I've done similar for experiment configs before, not specifically sampling. I hope OTel doesn't just assume these are bad use cases. A span has many attributes, some of them happen to be on the span, others on the Resource. It seems inconsistent to only provide one of these when sampling though. |
I will close this PR if/when #1942 is merged. |
Closing due to merge of #1942. |
We've made an argument that performance suffers when every Sampler invocation considers the resource or instrumentation library. It sounds like you're asking for a standard mechanism. In the Go SDK, what you're asking for is two new interfaces:
Since all of the machinery implied by these two new interfaces is part of setting up and configuring an SDK, it falls outside the scope of OTel's current specification. You could build these and use them, and have what you want. Should OpenTelemetry offer these interfaces? Users want configurable tracer SDKs, so I think the answer is yes. For your simple application of a file containing resource and library parameters for sampling configuration, you can: the |
Fixes #1588
Changes