Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of a per-call memory allocator for loading cached chunks #4074

Merged
merged 11 commits into from
Feb 1, 2023

Conversation

56quarters
Copy link
Contributor

@56quarters 56quarters commented Jan 25, 2023

What this PR does

Inject a slab pool into the context used by caching BucketReader implementations that allows cache clients to reuse memory for results.

Signed-off-by: Nick Pillitteri [email protected]

Which issue(s) this PR fixes or relates to

See #3772
See #3968

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@56quarters
Copy link
Contributor Author

Initial testing looks promising (the change is being tested in zone b in the following screenshots).

Reduced allocations:
Screenshot 2023-01-25 at 14-49-18 Explore - cortex-dev-01-dev-us-central-0 - Grafana

Marginally less CPU used:
Screenshot 2023-01-25 at 14-49-54 Explore - cortex-dev-01-dev-us-central-0 - Grafana

Request latency is unchanged:
Screenshot 2023-01-25 at 14-50-20 Explore - cortex-dev-01-dev-us-central-0 - Grafana

Profiling results:

  • Share of CPU usage from GC goes from 8% (zone c, control) to 6% (zone b, this change)
  • Share of bytes allocated by the Memcached client go from about 32% (zone c, control) to 4% (zone b, this change)

@56quarters 56quarters force-pushed the 56quarters/slab-caching-bucket branch 2 times, most recently from 4602368 to f14724e Compare January 25, 2023 23:04
@56quarters
Copy link
Contributor Author

I haven't included any tests for this change because the mock cache backend in combination with a mock allocator would have been more code than the change itself. I can add some if people feel strongly about it.

@56quarters 56quarters marked this pull request as ready for review January 25, 2023 23:45
@56quarters 56quarters requested a review from a team as a code owner January 25, 2023 23:45
@pracucci pracucci self-requested a review January 27, 2023 15:59
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I love it!

@@ -221,7 +244,13 @@ func (cb *CachingBucket) Get(ctx context.Context, name string) (io.ReadCloser, e
contentKey := cachingKeyContent(name)
existsKey := cachingKeyExists(name)

hits := cfg.cache.Fetch(ctx, []string{contentKey, existsKey})
var opts []cache.Option
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] This code is duplicated below. You may consider to move it to a function getCacheOptions(ctx).

@@ -1717,12 +1719,13 @@ func (b *bucketBlock) readIndexRange(ctx context.Context, off, length int64) ([]
return buf.Bytes(), nil
}

func (b *bucketBlock) readChunkRange(ctx context.Context, seq int, off, length int64, chunkRanges byteRanges) (*[]byte, error) {
func (b *bucketBlock) readChunkRange(ctx context.Context, seq int, off, length int64, chunkRanges byteRanges, chunkSlabs *pool.SafeSlabPool[byte]) (*[]byte, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I would name the new param chunksPool instead of chunkSlabs to keep consistency with the rest of the code. Really a nit!

@56quarters 56quarters force-pushed the 56quarters/slab-caching-bucket branch from 5ee98f3 to f212028 Compare January 27, 2023 17:02
@dimitarvdimitrov
Copy link
Contributor

Code looks good 👍

I was testing some queries with sparse series (queries that touch series, which aren't one-after-the-other in the index or the chunk files, e.g. selecting every third series) in the same cluster as your tests. During that time zone-b had noticeably higher heap than zone-a and zone-c. Do you have an idea what might be causing this, should it be a concern?

Screenshot 2023-01-27 at 19 34 22

@56quarters 56quarters marked this pull request as draft January 27, 2023 20:01
@56quarters
Copy link
Contributor Author

Based on further testing I'm having doubts about this change. I'm going to add some additional instrumentation and see if I can tell what's causing the unexpected heap usage.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I left a minor comment.

return &getReader{
c: cfg.cache,
ctx: ctx,
r: reader,
buf: new(bytes.Buffer),
slabs: slabs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to pass the slabs at all here. Reason is that getReader is used on cache miss, so there will be no memory from the pool. I think in this case we can just release the pool once we exit Get().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point.

Inject a slab pool into the context used by caching BucketReader
implementations that allows cache clients to reuse memory for results.

Signed-off-by: Nick Pillitteri <[email protected]>
Signed-off-by: Nick Pillitteri <[email protected]>
Signed-off-by: Nick Pillitteri <[email protected]>
Signed-off-by: Nick Pillitteri <[email protected]>
Signed-off-by: Nick Pillitteri <[email protected]>
Signed-off-by: Nick Pillitteri <[email protected]>
Signed-off-by: Nick Pillitteri <[email protected]>
Signed-off-by: Nick Pillitteri <[email protected]>
@56quarters 56quarters force-pushed the 56quarters/slab-caching-bucket branch from 3ba9b0d to 0e05715 Compare February 1, 2023 14:41
@56quarters 56quarters marked this pull request as ready for review February 1, 2023 18:25
@56quarters
Copy link
Contributor Author

For posterity, after the change to create and free the pool.SafeSlabPool following the lifecycle of the io.ReadCloser returned by CachedBucket methods, the heap and RSS looks much more reasonable for this change.

In the following screenshot: zone a = another experiment, zone b = this change, zone c = control.

Screenshot 2023-02-01 at 14-36-16 Mimir _ Reads resources - Mimir - Dashboards - Grafana

This change results in a higher working set which is odd but seems to be explained by caching the kernel is doing.

/ # hostname -s && free -m
store-gateway-zone-b-0
              total        used        free      shared  buff/cache   available
Mem:          32112        6046       13715           4       12352       25737
Swap:             0           0           0
/ # hostname -s && free -m
store-gateway-zone-c-0
              total        used        free      shared  buff/cache   available
Mem:          32112        6932       15048           4       10133       24849
Swap:             0           0           0

@56quarters 56quarters merged commit c6a4d93 into main Feb 1, 2023
@56quarters 56quarters deleted the 56quarters/slab-caching-bucket branch February 1, 2023 20:03
56quarters added a commit that referenced this pull request Feb 1, 2023
There's no value to changing the default and it's possible to introduce subtle
performance problems by not using the default.

See #4074 (comment)

Signed-off-by: Nick Pillitteri <[email protected]>
pracucci added a commit that referenced this pull request Feb 2, 2023
…ag (#4135)

* Deprecate `blocks-storage.bucket-store.chunks-cache.subrange-size` flag

There's no value to changing the default and it's possible to introduce subtle
performance problems by not using the default.

See #4074 (comment)

Signed-off-by: Nick Pillitteri <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Mauro Stettler <[email protected]>

---------

Signed-off-by: Nick Pillitteri <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Mauro Stettler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants