-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shuffle User Order in UsersScanner #10513
Shuffle User Order in UsersScanner #10513
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some small suggestions. I'll let @seizethedave approve
} | ||
} | ||
|
||
func TestUsersScanner_ScanUsers_ShouldReturnRandomizedOrder(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this test is testing. It doesn't seem like it's ensuring a random order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the shuffle, the order of the input (users
) would be the same as the order of actual
. On L94, I check that we have the same set of users (as you mentioned, this can be changed to assert.ElementsMatch
) and on L97 I check that it's actually been shuffled , i.e. users
!= actual
.
A better test might involve setting a specific rand.Source and checking that it shuffles to a specific order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is looking good. Let me know what you think about these comments.
CHANGELOG.md
Outdated
@@ -35,6 +35,7 @@ | |||
* `cortex_ingester_tsdb_block_postings_for_matchers_cache_misses_total` | |||
* `cortex_ingester_tsdb_block_postings_for_matchers_cache_requests_total` | |||
* `cortex_ingester_tsdb_block_postings_for_matchers_cache_skips_total` | |||
* [ENHANCEMENT] Compactor: Shuffle order of users in `UsersScanner` before submitting to `BlocksCleaner`. #10513 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this briefly say "why" in addition to "how"? So that unfamiliar readers understand why they might upgrade, and so on. :)
pkg/storage/tsdb/users_scanner.go
Outdated
@@ -46,6 +47,10 @@ func (s *UsersScanner) ScanUsers(ctx context.Context) (users, markedForDeletion | |||
return nil, nil, err | |||
} | |||
|
|||
rand.Shuffle(len(users), func(i, j int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we've changed course on this a time or two, but ScanUsers
is a general purpose accessor function used by many services, whereas this shuffle behavior is particular to the needs of compactor. I actually think we can move the shuffle to runCleanup
after we call refreshOwnedUsers
, with a brief handrail comment noting why we'd do such a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
If there is a rollout, or any kind of graceful shutdown of compactors, they terminate ongoing block cleanup and bucket index update jobs. This can lead to the bucket index not being updated for an extended period if jobs are terminated back-to-back.
What this PR does
This PR addresses the above by shuffling the order of users submitted to
BlocksCleaner
byUsersScanner
. When users are always processed in lexicographic order, successive restarts can cause read path failures for tenants with stale indexes. By randomizing this order, we reduce the probability that these indexes become stale.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.