-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
histogram_quantile does not work correctly for higher quantiles (missing sorting of the buckets)? #3706
Comments
Is this the issue that 0d01d62 was intended to fix? |
Thanks @markj-db and @gibbscullen for following up. Mark, you are right lack of sorting is not the problem. The m3 implementation does sort the buckets by
I figured out what the problem is. I was using the following query: The problem is that the As a consequence, the Buckets 0.256 and above seem to have the same count. Here is the same data point but showing numbers with full precision: A a quick and dirty workaround is to get rid of the "noise" by rounding off some of the mantissa to make sum commutative and associative again. Unfortunately AFAIK Prometheus does not have a function for rounding number to certain precision so, as a workaround, I can simply multiply numbers by very large number, round to integer and then divide by the same very large number. I can think of a couple of ways this issue could be "mitigated" in Prometheus/m3. These are just brainstorming ideas, not fully thought through:
|
@gibbscullen I believe this is a pretty serious problem. AFAICT histogram_quantile queries for high percentiles will return wrong result. I'd love to help out but I am not a golang developer and onboarding (getting to the point where I can go through dev cycle: build, test, modify) could take me a significant amount of time. |
@jodzga the example you gave of https://github.com/m3db/m3/blob/master/src/query/executor/state.go#L192 This could well be a source of non-determinism if not handled carefully -- I'm not sure how the query is distributed or how any reduction operations are done. |
@jodzga @markj-db sounds like you may already be working on this together, but one idea would be to change the default here m3/src/cmd/services/m3query/config/config.go Lines 279 to 280 in 2e5700a
prometheus )
|
Function
histogram_quantile
does not work correctly for higher quantiles.The image below shows the edge case, for quantile 1.0 but the same is true for high quantiles, close to 1.0 e.g. 0.9999.
Above, you can see that buckets "0.256" and above have counts 4284. The 1-quantile should be 0.256 but on the top graph you can see that it is calculated to be 8.192.
The source code for the function that computes the quantile in Prometheus:
https://github.com/prometheus/prometheus/blob/main/promql/quantile.go#L73
The same for m3:
m3/src/query/functions/linear/histogram_quantile.go
Line 216 in 1843061
The sort.Search is supposed to return the smallest bucket but it seems that the m3 version does not sort the buckets first. Perhaps m3 version assumes that buckets are sorted somewhere outside the function?
The text was updated successfully, but these errors were encountered: