-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(approx_topk): Reduce memory usage of HyperLogLog in approx_topk. #15559
Conversation
The overall performance is only improved slightly. See https://raintank-corp.slack.com/archives/C029V4SSS9L/p1735567230600989?thread_ts=1734692230.148749&cid=C029V4SSS9L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -585,7 +587,8 @@ func (b *LabelsBuilder) LabelsResult() LabelsResult { | |||
|
|||
// Get all labels at once and sort them | |||
b.buf = b.UnsortedLabels(b.buf) | |||
sort.Sort(b.buf) | |||
// sort.Sort(b.buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line can be removed
@@ -19,7 +19,7 @@ func NewCountMinSketch(w, d uint32) (*CountMinSketch, error) { | |||
Depth: d, | |||
Width: w, | |||
Counters: make2dslice(w, d), | |||
HyperLogLog: hyperloglog.New16(), | |||
HyperLogLog: hyperloglog.New16NoSparse(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should leave a comment.
// Sparse HLL sketches should result in less memory usage for cardinalities of 100k or less but the automatic transition from sparse
// to non-sparse sketches above that cardinality range results in significantly more memory allocs/bytes.
// Until we have a reliable way of estimating the cardinality set in advance, always use non-sparse for faster performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know that the auto transition in the HLL library is this expensive 👍
What this PR does / why we need it:
The count min sketch data structure backing the new
approx_topk
aggregation uses HyperLogLog (HLL) to track the actual cardinality of the aggregated vector.We were using the sparse version of the HLL. However, that would result in a memory and allocation overhead.
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR