metrics: users should have control over histogram granularity for connection latencies #96000

dhartunian · 2023-01-26T15:24:41Z

Issues like #95833 have made clear that a static histogram bucket list may always produce confusing outcomes. On the other hand, an extremely large set of buckets creates performance problems for customers who are ingesting histograms via prometheus.

One possible implementation is to keep separate histograms for Prometheus output, and more granular hdrhistogram-based ones for computing internal percentiles. The latter should remain quite accurate, while the former could be more coarse to enable easier Grafana use by customers.

This problem is not limited to just connection latencies, but is most easily visible in that particular metric.

Jira issue: CRDB-23890

dhartunian added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability-inf labels Jan 26, 2023

blathers-crl bot added the A-observability-inf label Jan 26, 2023

exalate-issue-sync bot added T-observability and removed T-observability-inf labels Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: users should have control over histogram granularity for connection latencies #96000

metrics: users should have control over histogram granularity for connection latencies #96000

dhartunian commented Jan 26, 2023 •

edited by cockroach-jira-scripts

Loading

metrics: users should have control over histogram granularity for connection latencies #96000

metrics: users should have control over histogram granularity for connection latencies #96000

Comments

dhartunian commented Jan 26, 2023 • edited by cockroach-jira-scripts Loading

dhartunian commented Jan 26, 2023 •

edited by cockroach-jira-scripts

Loading