Make use of a per-call memory allocator for loading cached chunks #4074

56quarters · 2023-01-25T01:15:50Z

What this PR does

Inject a slab pool into the context used by caching BucketReader implementations that allows cache clients to reuse memory for results.

Signed-off-by: Nick Pillitteri [email protected]

Which issue(s) this PR fixes or relates to

See #3772
See #3968

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

56quarters · 2023-01-25T20:06:58Z

Initial testing looks promising (the change is being tested in zone b in the following screenshots).

Reduced allocations:

Marginally less CPU used:

Request latency is unchanged:

Profiling results:

Share of CPU usage from GC goes from 8% (zone c, control) to 6% (zone b, this change)
Share of bytes allocated by the Memcached client go from about 32% (zone c, control) to 4% (zone b, this change)

56quarters · 2023-01-25T23:45:32Z

I haven't included any tests for this change because the mock cache backend in combination with a mock allocator would have been more code than the change itself. I can add some if people feel strongly about it.

pkg/storegateway/bucket_chunk_reader.go

pracucci

Great work, I love it!

pracucci · 2023-01-27T16:10:53Z

pkg/storage/tsdb/bucketcache/caching_bucket.go

@@ -221,7 +244,13 @@ func (cb *CachingBucket) Get(ctx context.Context, name string) (io.ReadCloser, e
 	contentKey := cachingKeyContent(name)
 	existsKey := cachingKeyExists(name)

-	hits := cfg.cache.Fetch(ctx, []string{contentKey, existsKey})
+	var opts []cache.Option


[nit] This code is duplicated below. You may consider to move it to a function getCacheOptions(ctx).

pracucci · 2023-01-27T16:11:59Z

pkg/storegateway/bucket.go

@@ -1717,12 +1719,13 @@ func (b *bucketBlock) readIndexRange(ctx context.Context, off, length int64) ([]
 	return buf.Bytes(), nil
 }

-func (b *bucketBlock) readChunkRange(ctx context.Context, seq int, off, length int64, chunkRanges byteRanges) (*[]byte, error) {
+func (b *bucketBlock) readChunkRange(ctx context.Context, seq int, off, length int64, chunkRanges byteRanges, chunkSlabs *pool.SafeSlabPool[byte]) (*[]byte, error) {


[nit] I would name the new param chunksPool instead of chunkSlabs to keep consistency with the rest of the code. Really a nit!

dimitarvdimitrov · 2023-01-27T18:39:31Z

Code looks good 👍

I was testing some queries with sparse series (queries that touch series, which aren't one-after-the-other in the index or the chunk files, e.g. selecting every third series) in the same cluster as your tests. During that time zone-b had noticeably higher heap than zone-a and zone-c. Do you have an idea what might be causing this, should it be a concern?

56quarters · 2023-01-27T21:34:07Z

Based on further testing I'm having doubts about this change. I'm going to add some additional instrumentation and see if I can tell what's causing the unexpected heap usage.

pracucci

Good job! I left a minor comment.

pracucci · 2023-02-01T08:25:38Z

pkg/storage/tsdb/bucketcache/caching_bucket.go

 	return &getReader{
 		c:         cfg.cache,
 		ctx:       ctx,
 		r:         reader,
 		buf:       new(bytes.Buffer),
+		slabs:     slabs,


I don't think we need to pass the slabs at all here. Reason is that getReader is used on cache miss, so there will be no memory from the pool. I think in this case we can just release the pool once we exit Get().

Ah, good point.

Inject a slab pool into the context used by caching BucketReader implementations that allows cache clients to reuse memory for results. Signed-off-by: Nick Pillitteri <[email protected]>

Signed-off-by: Nick Pillitteri <[email protected]>

56quarters · 2023-02-01T19:41:25Z

For posterity, after the change to create and free the pool.SafeSlabPool following the lifecycle of the io.ReadCloser returned by CachedBucket methods, the heap and RSS looks much more reasonable for this change.

In the following screenshot: zone a = another experiment, zone b = this change, zone c = control.

This change results in a higher working set which is odd but seems to be explained by caching the kernel is doing.

/ # hostname -s && free -m
store-gateway-zone-b-0
              total        used        free      shared  buff/cache   available
Mem:          32112        6046       13715           4       12352       25737
Swap:             0           0           0

/ # hostname -s && free -m
store-gateway-zone-c-0
              total        used        free      shared  buff/cache   available
Mem:          32112        6932       15048           4       10133       24849
Swap:             0           0           0

There's no value to changing the default and it's possible to introduce subtle performance problems by not using the default. See #4074 (comment) Signed-off-by: Nick Pillitteri <[email protected]>

…ag (#4135) * Deprecate `blocks-storage.bucket-store.chunks-cache.subrange-size` flag There's no value to changing the default and it's possible to introduce subtle performance problems by not using the default. See #4074 (comment) Signed-off-by: Nick Pillitteri <[email protected]> * Update CHANGELOG.md Co-authored-by: Mauro Stettler <[email protected]> --------- Signed-off-by: Nick Pillitteri <[email protected]> Co-authored-by: Marco Pracucci <[email protected]> Co-authored-by: Mauro Stettler <[email protected]>

56quarters force-pushed the 56quarters/slab-caching-bucket branch 2 times, most recently from 4602368 to f14724e Compare January 25, 2023 23:04

56quarters marked this pull request as ready for review January 25, 2023 23:45

56quarters requested a review from a team as a code owner January 25, 2023 23:45

dimitarvdimitrov reviewed Jan 26, 2023

View reviewed changes

pkg/storegateway/bucket_chunk_reader.go Outdated Show resolved Hide resolved

56quarters requested a review from dimitarvdimitrov January 26, 2023 18:06

pracucci self-requested a review January 27, 2023 15:59

pracucci approved these changes Jan 27, 2023

View reviewed changes

56quarters force-pushed the 56quarters/slab-caching-bucket branch from 5ee98f3 to f212028 Compare January 27, 2023 17:02

56quarters marked this pull request as draft January 27, 2023 20:01

pracucci approved these changes Feb 1, 2023

View reviewed changes

56quarters added 10 commits February 1, 2023 09:40

Make use of a per-call memory allocator for loading cached chunks

c4715ed

Inject a slab pool into the context used by caching BucketReader implementations that allows cache clients to reuse memory for results. Signed-off-by: Nick Pillitteri <[email protected]>

Try a larger slab size for chunks

ef9fb48

Signed-off-by: Nick Pillitteri <[email protected]>

Adjust chunk slab size to 16M

f9d66a6

Signed-off-by: Nick Pillitteri <[email protected]>

Code review feedback

d84053d

Signed-off-by: Nick Pillitteri <[email protected]>

Adjust slab size again

0e85af2

Signed-off-by: Nick Pillitteri <[email protected]>

WIP: Instrumentation

f93f740

Signed-off-by: Nick Pillitteri <[email protected]>

Pool cache results at a much smaller scope

32087db

Signed-off-by: Nick Pillitteri <[email protected]>

Defunct counters

207f4ca

Signed-off-by: Nick Pillitteri <[email protected]>

Unused counter

61ff1c1

Signed-off-by: Nick Pillitteri <[email protected]>

No need for slabs in Get cache miss case

0e05715

Signed-off-by: Nick Pillitteri <[email protected]>

56quarters force-pushed the 56quarters/slab-caching-bucket branch from 3ba9b0d to 0e05715 Compare February 1, 2023 14:41

56quarters marked this pull request as ready for review February 1, 2023 18:25

Comment

f301cc2

56quarters merged commit c6a4d93 into main Feb 1, 2023

56quarters deleted the 56quarters/slab-caching-bucket branch February 1, 2023 20:03

56quarters mentioned this pull request Feb 1, 2023

store-gateway: Use a pooling memory allocator for loading chunks from memcached #3772

Closed

7 tasks

56quarters mentioned this pull request Feb 1, 2023

Deprecate blocks-storage.bucket-store.chunks-cache.subrange-size flag #4135

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make use of a per-call memory allocator for loading cached chunks #4074

Make use of a per-call memory allocator for loading cached chunks #4074

56quarters commented Jan 25, 2023 •

edited

Loading

56quarters commented Jan 25, 2023

56quarters commented Jan 25, 2023

pracucci left a comment

pracucci Jan 27, 2023

pracucci Jan 27, 2023

dimitarvdimitrov commented Jan 27, 2023

56quarters commented Jan 27, 2023

pracucci left a comment

pracucci Feb 1, 2023

56quarters Feb 1, 2023

56quarters commented Feb 1, 2023

Make use of a per-call memory allocator for loading cached chunks #4074

Make use of a per-call memory allocator for loading cached chunks #4074

Conversation

56quarters commented Jan 25, 2023 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

56quarters commented Jan 25, 2023

56quarters commented Jan 25, 2023

pracucci left a comment

Choose a reason for hiding this comment

pracucci Jan 27, 2023

Choose a reason for hiding this comment

pracucci Jan 27, 2023

Choose a reason for hiding this comment

dimitarvdimitrov commented Jan 27, 2023

56quarters commented Jan 27, 2023

pracucci left a comment

Choose a reason for hiding this comment

pracucci Feb 1, 2023

Choose a reason for hiding this comment

56quarters Feb 1, 2023

Choose a reason for hiding this comment

56quarters commented Feb 1, 2023

56quarters commented Jan 25, 2023 •

edited

Loading