-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent results between prometheusexporter and prometheusremotewrite #4975
Comments
@albertteoh Could you make this work? Or any workaround? |
Hi @ankitnayan, my workaround was to filter out any latencies > 24 hours, which isn't nice but it does the job for my use case at least. |
@albertteoh I have a similar issue, but using Prometheus. spanMetricsProcessor is creating such bucket
I guess the code of spanMetricsProcessor need to deal with golang number conversion when using float64. Please have a look at these lines https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/spanmetricsprocessor/processor.go#L140 I've simulated here and the behavior was the same. So, when Prometheus read that "number" it behavior in such way. I don't know if this doc can help. |
@luistilingue I upgraded to |
@ankitnayan I was using 0.40.0 and I upgraded all my stack (otel collector to 0.46.0, prometheus to 2.33.5, and javaagent to 1.11.1), but the issue still occurs. |
Bump github.com/klauspost/compress from 1.14.4 to 1.15.0 Bump github.com/shirou/gopsutil/v3 from 3.22.1 to 3.22.2 Bump go.uber.org/multierr from 1.7.0 to 1.8.0
Hello all, I believe this is caused by a bug we found in prometheus that causes the We faced an issue whereby New Relic dropped our datapoints because of this. The issue existed with 0.61.0 but became much worse with 0.62.0. We built a custom image updating |
Actually, looking at it, it is not the same bug, but the same issue: The |
Pinging code owners: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
1 similar comment
Pinging code owners: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Describe the bug
When using prometheusremotewrite to export metrics to M3, I'm getting latencies that are over 200 years when queried from M3.
However, when scraping these metrics from prometheus, the latencies look correct.
Am I configuring something incorrectly?
Steps to reproduce
What did you expect to see?
Identical 95th percentile latencies, or at least close enough to one another.
What did you see instead?
Latencies from M3 were over 200 years, whereas from Prometheus, they were a more sensible ~200ms.
Here are two screenshots of the same query executed against Prometheus and M3 data sources respectively:
Prometheus
M3
To reduce the search space by ruling out M3 and spanmetrics processor as possible causes, I also checked the logs (these are from an earlier run):
Here, I log the total latency_count as well as the latency_bucket counts within spanmetrics processor. I've taken logs from two different times, 10 seconds apart and as you can see, the count is consistent with the sum of bucket_counts:
However, this is the log output from the last metrics pipeline in the config below, i.e.:
As you can see the total count is
1
but the bucket count total is2 + 1 = 3
, and so I believe the+Inf
tries to account for this discrepancy, resulting in-2
represented as the uint64 equivalent of18446744073709551614
. I have also seen versions in logs where the total count > sum of bucket counts, leading to a "positive" spillover+Inf
count.What version did you use?
Version: opentelemetry-collector-contrib@master
What config did you use?
Config: (e.g. the yaml config file)
Environment
OS: MacOS
Compiler(if manually compiled): go 1.16
Additional context
cc @bogdandrutu
The text was updated successfully, but these errors were encountered: