perf(approx_topk): Reduce memory usage of HyperLogLog in approx_topk. #15559

jeschkies · 2024-12-30T09:47:44Z

What this PR does / why we need it:

The count min sketch data structure backing the new approx_topk aggregation uses HyperLogLog (HLL) to track the actual cardinality of the aggregated vector.

We were using the sparse version of the HLL. However, that would result in a memory and allocation overhead.

                               │ before.log  │              after.log              │
                               │   sec/op    │   sec/op     vs base                │
HeapCountMinSketchVectorAdd-16   425.4m ± 2%   357.7m ± 3%  -15.91% (p=0.000 n=10)

                               │  before.log   │              after.log               │
                               │     B/op      │     B/op      vs base                │
HeapCountMinSketchVectorAdd-16   12.098Mi ± 0%   2.627Mi ± 0%  -78.29% (p=0.000 n=10)

                               │ before.log  │             after.log              │
                               │  allocs/op  │  allocs/op   vs base               │
HeapCountMinSketchVectorAdd-16   116.9k ± 0%   108.8k ± 0%  -6.92% (p=0.000 n=10)

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

…gine

jeschkies · 2024-12-30T16:55:27Z

The overall performance is only improved slightly. See https://raintank-corp.slack.com/archives/C029V4SSS9L/p1735567230600989?thread_ts=1734692230.148749&cid=C029V4SSS9L

chaudum

LGTM

chaudum · 2025-01-07T16:16:27Z

pkg/logql/log/labels.go

@@ -585,7 +587,8 @@ func (b *LabelsBuilder) LabelsResult() LabelsResult {

 	// Get all labels at once and sort them
 	b.buf = b.UnsortedLabels(b.buf)
-	sort.Sort(b.buf)
+	// sort.Sort(b.buf)


Line can be removed

cstyan · 2025-01-09T01:08:45Z

pkg/logql/sketch/cms.go

@@ -19,7 +19,7 @@ func NewCountMinSketch(w, d uint32) (*CountMinSketch, error) {
 		Depth:       d,
 		Width:       w,
 		Counters:    make2dslice(w, d),
-		HyperLogLog: hyperloglog.New16(),
+		HyperLogLog: hyperloglog.New16NoSparse(),


I think we should leave a comment.

// Sparse HLL sketches should result in less memory usage for cardinalities of 100k or less but the automatic transition from sparse // to non-sparse sketches above that cardinality range results in significantly more memory allocs/bytes. // Until we have a reliable way of estimating the cardinality set in advance, always use non-sparse for faster performance.

cstyan

good to know that the auto transition in the HLL library is this expensive 👍

jeschkies added 7 commits December 24, 2024 11:26

Benchmark full pipeline.

0937d39

Define log generator

c2c4efa

Benchmark

388664d

Note todo

1127c3f

Avoid allocations by using slices.SortFunc

22835ff

Use non sparse HLL to avoid allocations

448aafe

Reduce HLL memory usage

5039b1d

jeschkies requested a review from a team as a code owner December 30, 2024 09:47

pull-request-size bot added the size/M label Dec 30, 2024

jeschkies mentioned this pull request Dec 30, 2024

Karsten/benchmark engine #15558

Closed

6 tasks

jeschkies requested review from chaudum and cstyan December 30, 2024 09:48

jeschkies changed the title ~~(perf) Reduce memory usage of HyperLogLog in approx_topk.~~ (perf): Reduce memory usage of HyperLogLog in approx_topk. Dec 30, 2024

jeschkies changed the title ~~(perf): Reduce memory usage of HyperLogLog in approx_topk.~~ perf(approx_topk): Reduce memory usage of HyperLogLog in approx_topk. Dec 30, 2024

jeschkies added 2 commits December 30, 2024 12:12

Merge remote-tracking branch 'grafana/main' into karsten/benchmark-en…

081ebef

…gine

Revert return

be07020

pull-request-size bot added size/S and removed size/M labels Dec 30, 2024

chaudum approved these changes Jan 7, 2025

View reviewed changes

cstyan reviewed Jan 9, 2025

View reviewed changes

cstyan approved these changes Jan 9, 2025

View reviewed changes

chaudum merged commit bef2043 into main Jan 9, 2025
60 checks passed

chaudum deleted the karsten/benchmark-engine branch January 9, 2025 10:27

loki-gh-app bot mentioned this pull request Jan 13, 2025

chore(k237): release 3.4.0 #15705

Closed

loki-gh-app bot mentioned this pull request Jan 20, 2025

chore(k238): release 3.4.0 #15847

Closed

This was referenced Feb 3, 2025

chore(k240): release 3.4.0 #16074

Closed

chore(k239): release 3.4.0 #16102

Merged

chore(k241): release 3.4.0 #16153

Closed

loki-gh-app bot mentioned this pull request Feb 12, 2025

chore(k239): release 3.4.0 (backport main) #16210

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(approx_topk): Reduce memory usage of HyperLogLog in approx_topk. #15559

perf(approx_topk): Reduce memory usage of HyperLogLog in approx_topk. #15559

jeschkies commented Dec 30, 2024 •

edited

Loading

jeschkies commented Dec 30, 2024

chaudum left a comment

chaudum Jan 7, 2025

cstyan Jan 9, 2025

cstyan left a comment

perf(approx_topk): Reduce memory usage of HyperLogLog in approx_topk. #15559

perf(approx_topk): Reduce memory usage of HyperLogLog in approx_topk. #15559

Conversation

jeschkies commented Dec 30, 2024 • edited Loading

jeschkies commented Dec 30, 2024

chaudum left a comment

Choose a reason for hiding this comment

chaudum Jan 7, 2025

Choose a reason for hiding this comment

cstyan Jan 9, 2025

Choose a reason for hiding this comment

cstyan left a comment

Choose a reason for hiding this comment

jeschkies commented Dec 30, 2024 •

edited

Loading