-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exemplars support for Prometheus Histogram #2812
Comments
@jonatan-ivanov I saw that you already pushed #2813 some time ago. Are you planning to propose a PR for his as well? Is there anything blocking you? Would you mind if I gave it a shot (similar to your implementation for Counter)? |
@neiser I am (was? 😃), but we are always happy to see contributions so if you want to work on it, please feel free to do so. |
@jonatan-ivanov Alright, then let me see if I can provide a PR in the coming days. |
@jonatan-ivanov I've looked into the code and figured that I'd need to start at Anyway, my question is now is: Should I attempt a refactoring which would make getting exemplars into histogram buckets easier? Or should I just hack a lot of stuff around the "prometheus" part of Micrometer to achieve whatever resolving this issue takes? |
Hello i did some test to integrate exemplars in histogram. ReadMe is in french but you can take a look at : github / micrometer-prometheus-examplar . If you want to check it, i'll be happy to try to find an implementation with you. It seems to work from a spring boot application (by defining PrometheusMeterRegistry Bean to override default actuator behaviour) but i always have a little problem.... |
@neiser If my memory serves me well, I wanted to extend Micrometer "abstracts away backends" in the sense that a
Some In case of Prometheus, there are multiple backends that support the Prometheus format and Prometheus itself supports multiple formats ("Prometheus Text" and OpenMetrics). VictoriaMetrics also supports the Prometheus format but they define their own histogram flavor, also this histogram flavor is reused in OpenTSDB. :) So if you use an XyzMeterRegistry, that does not mean you need use the Xyz backend. It rather means that you need to use a backend that supports the XyZ format (e.g.: with the Or in the other way: if you use Xyz as a backend you don't need to use XyzMeterRegistry but you can use a different one if that backend supports multiple standards (e.g.: if you use DataDog, you can use the |
@fscellos @jonatan-ivanov Thank you for your answers. This helped me a bit. But I can't promise that I can make progress before next week. So if you find time over the weekend to work on this, please let me know so that we don't do the work twice! |
Hello. As target is to apply trace sampling, i will take a look to see if it is possible to include TraceFlags (cf https://www.w3.org/TR/trace-context/#sampled-flag) information to choose if a specific trace is eligible to be set as exemplars in bucket (or counter). Because if you have client sampling (what can be done with Instrumentation sampling on client side), suppose 1%, we can't use 99 % of this traces as examplars as we're not be able to find them in storage backend (ie. in a prometheus instance, if you click on an exemplar, you won't find it in backend storage). So we must only keep traces that have a sampled flag set to "01" (other with TraceFlags set to "00" won't be store) |
Note that previous remarks il also available for Counter. |
Hello. That's cool. But i always don't understand how you can manage sampling on trace as my PR on prometheus java client haven't be merge yet (prometheus/client_java#766) |
@fscellos Haven't we discussed this here: #3019 (comment)? If the I tested this using Spring Cloud Sleuth ( |
@fscellos @jonatan-ivanov I've patched a demo spring boot app such that the actuator endpoint includes exemplars. As long as spring-projects/spring-boot#30472 isn't merged, it requires to explicitly pass the exemplar sampler into the Prometheus registry, assuming it's instrumented with the opentelemetry agent (see Dockerfile for details). Note that when testing all this locally with curl, you need to use an Accept header, for example Note that the demo uses micrometer with 1.9.0-SNAPSHOT, I hope the release will happen soon 🙂 Thanks y'all for finishing this feature. Exemplars are really something I've always wished I had when debugging some nasty cloud bugs. |
@neiser Please take a look at the description of the PR that belongs to this issue: #3091 In the samples (see the description of the PR) you will find a demo spring boot app (with the sampler bean), here's the boot PR to auto-configure it: spring-projects/spring-boot#30472
|
@jonatan-ivanov Thanks, I was aware of that I agree that spring-projects/spring-boot#30472 is the correct solution, however, as shown in my little Spring Boot Demo (see previous comment), you don't need to wait to have exemplars reported already now, if you're not using Spring Sleuth but OpenTelemetry agent instrumentation. Again, thanks for pushing this feature! |
Hello @neiser. My stack is as follows : Some Microservices (java, node js, go, php) and for some of them instrumentation through Open Telemetry Operator (i'm in k8s; this operator just inject otel agent in target pods). In my configuration i specify that trace sampling is set to 25% with a strategy of "parentbased_traceidratio".
For OpenTelemetry (for sleuth i don't know how it works), sampling cannot be ok if this "trace flags" information is not taking account. It's why is submit this PR (prometheus/client_java#766) to prometheus java client (the important point in this PR is the add of "spanContextSupplier.isSampled()" test in DefaultExemplarSampler that take account of this trace flags). So until, this PR for prometheus client is not merge and integrate, yes you will generate trace_id in exported metrics, but when you'll want to explore it from dashboard in grafana by example (with a link to tempo or jaeger), sometimes you'll retrieve a trace (when sampling decision put trace flags to 01) and sometimes you will get "trace not found" (when sampling decision put trace flags to 00). If you set sampling to 100%, you won't see this problem, but in production environnement, it is impossible to have such sampling strategy because you'll have to many traces post to you trace storage backend (ratio depends on how many request you will have but a 1% sampling seems to be a realistic value). If my explanation are not clear, i encourage you to apply a low sampling strategy and check your trace storage system (or grafana dashboard) to check by your own. |
@fscellos Thanks for your detailed answer. I've applied a little patch here which should respect the fact that not all traces are samples and thus not suitable for exemplars. |
@neiser : yes it should work like this. |
This issue belongs to #2672 (Support for Exemplars), see the details there.
The text was updated successfully, but these errors were encountered: