Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle User Order in UsersScanner #10513

Merged
merged 13 commits into from
Jan 27, 2025

Conversation

dmwilson-grafana
Copy link
Contributor

@dmwilson-grafana dmwilson-grafana commented Jan 24, 2025

If there is a rollout, or any kind of graceful shutdown of compactors, they terminate ongoing block cleanup and bucket index update jobs. This can lead to the bucket index not being updated for an extended period if jobs are terminated back-to-back.

What this PR does

This PR addresses the above by shuffling the order of users submitted to BlocksCleaner by UsersScanner. When users are always processed in lexicographic order, successive restarts can cause read path failures for tenants with stale indexes. By randomizing this order, we reduce the probability that these indexes become stale.

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@CLAassistant
Copy link

CLAassistant commented Jan 24, 2025

CLA assistant check
All committers have signed the CLA.

@dmwilson-grafana dmwilson-grafana changed the title Prioritize Users with Least Recently Updated Bucket Indexes in BlocksCleaner Shuffle User Order in BlocksCleaner Jan 24, 2025
@dmwilson-grafana dmwilson-grafana changed the title Shuffle User Order in BlocksCleaner Shuffle User Order in UsersScanner Jan 24, 2025
@dmwilson-grafana dmwilson-grafana marked this pull request as ready for review January 24, 2025 19:44
@dmwilson-grafana dmwilson-grafana requested a review from a team as a code owner January 24, 2025 19:44
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some small suggestions. I'll let @seizethedave approve

pkg/storage/tsdb/users_scanner_test.go Outdated Show resolved Hide resolved
}
}

func TestUsersScanner_ScanUsers_ShouldReturnRandomizedOrder(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this test is testing. It doesn't seem like it's ensuring a random order.

Copy link
Contributor Author

@dmwilson-grafana dmwilson-grafana Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the shuffle, the order of the input (users) would be the same as the order of actual. On L94, I check that we have the same set of users (as you mentioned, this can be changed to assert.ElementsMatch) and on L97 I check that it's actually been shuffled , i.e. users != actual.

A better test might involve setting a specific rand.Source and checking that it shuffles to a specific order.

@seizethedave seizethedave self-requested a review January 24, 2025 21:36
Copy link
Contributor

@seizethedave seizethedave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking good. Let me know what you think about these comments.

CHANGELOG.md Outdated
@@ -35,6 +35,7 @@
* `cortex_ingester_tsdb_block_postings_for_matchers_cache_misses_total`
* `cortex_ingester_tsdb_block_postings_for_matchers_cache_requests_total`
* `cortex_ingester_tsdb_block_postings_for_matchers_cache_skips_total`
* [ENHANCEMENT] Compactor: Shuffle order of users in `UsersScanner` before submitting to `BlocksCleaner`. #10513
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this briefly say "why" in addition to "how"? So that unfamiliar readers understand why they might upgrade, and so on. :)

@@ -46,6 +47,10 @@ func (s *UsersScanner) ScanUsers(ctx context.Context) (users, markedForDeletion
return nil, nil, err
}

rand.Shuffle(len(users), func(i, j int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've changed course on this a time or two, but ScanUsers is a general purpose accessor function used by many services, whereas this shuffle behavior is particular to the needs of compactor. I actually think we can move the shuffle to runCleanup after we call refreshOwnedUsers, with a brief handrail comment noting why we'd do such a thing.

Copy link
Contributor

@seizethedave seizethedave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@seizethedave seizethedave merged commit 6ac5d6d into main Jan 27, 2025
28 checks passed
@seizethedave seizethedave deleted the dwilson/run-bucket-index-cleanup-by-index-age branch January 27, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants