metrics: users should have control over histogram granularity for connection latencies #96000
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
Issues like #95833 have made clear that a static histogram bucket list may always produce confusing outcomes. On the other hand, an extremely large set of buckets creates performance problems for customers who are ingesting histograms via prometheus.
One possible implementation is to keep separate histograms for Prometheus output, and more granular hdrhistogram-based ones for computing internal percentiles. The latter should remain quite accurate, while the former could be more coarse to enable easier Grafana use by customers.
This problem is not limited to just connection latencies, but is most easily visible in that particular metric.
Jira issue: CRDB-23890
The text was updated successfully, but these errors were encountered: