-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.1: add checkpointing for raftAppliedIndexTermMigration
#84909
release-22.1: add checkpointing for raftAppliedIndexTermMigration
#84909
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth adding a test for? (I assume we've manually verified the change.)
3124f01
to
271d25c
Compare
I hear ya, but it's a pretty annoying test to write, since it's going to need a bunch of testing knobs and stuff -- it'd be easier to test if we generalized this, but I don't really have time to get sidetracked before stability. And yes, I've manually verified that this works both on migration errors, coordinator failures, and pause/resume. |
271d25c
to
db989ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more words in the release note for the observed symptoms before this patch would be helpful me thinks.
a62dfb1
to
c9db8d7
Compare
How's this? |
It's great, thanks! |
The `raftAppliedIndexTermMigration` upgrade migration could be unreliable. It iterates over all ranges and runs a `Migrate` request which must be applied on all replicas. However, if any ranges merge or replicas are unavailable, the migration fails and starts over from the beginning. In large clusters with many ranges, this meant that it might never complete. This patch makes the upgrade more robust, by retrying each `Migrate` request 5 times, and checkpointing the progress after every fifth batch (1000 ranges), allowing resumption on failure. At some point this should be made part of the migration infrastructure. Release note (bug fix): the 22.1 upgrade migration "21.2-56: populate RangeAppliedState.RaftAppliedIndexTerm for all ranges" is now more resilient to failures, by checkpointing its progress in case it fails and must be restarted. This migration must be applied across all ranges and replicas in the system, and can fail with 'operation "wait for Migrate application" timed out' if any replicas are temporarily unavailable, which is increasingly likely to happen in large clusters with many ranges -- previously, this would restart the migration from the start, and might never make it all the way through.
c9db8d7
to
0508edc
Compare
84875: backupccl: handle range keys in BACKUP r=erikgrinaker a=msbutler Previously BACKUP would not back up range tombstones. With this patch, BACKUPs with revision_history will backup range tombstones. Non-revision history backups are not affected by this diff because MVCCExportToSST filters all tombstones out of the backup already. Specifically, this patch replaces the iterators used in the backup_processor with the pebbleIterator, which has baked in range key support. This refactor introduces a 5% regression in backup runtime, even when the backup has no range keys, though #83051 hopes to address this gap. See details below on the benchmark experiment. At this point a backup with range keys is restorable, thanks to #84214. Note that the restore codebase still touches iterators that are not range key aware. This is not a problem because restored data does not have range keys, nor do the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher and in CheckSSTConflicts) will be updated when #70428 gets fixed. Fixes #71155 Release note: none To benchmark this diff, the following commands were used on the following sha a5ccdc3, with and without this commit, over three trials: ``` roachprod create -n 5 --gce-machine-type=n2-standard-16 $CLUSTER roachprod put $CLUSTER [build] cockroach roachprod wipe $CLUSTER; roachprod start $CLUSTER; roachprod run $CLUSTER:1 -- "./cockroach workload init bank --rows 1000000000" roachprod sql $CLUSTER:1 -- -e "BACKUP INTO 'gs://somebucket/michael-rangkey?AUTH=implicit'" ``` The backup on the control binary took on average 478 seconds with a stdev of 13 seconds, while the backup with the treatment binary took on average 499 seconds with stddev of 8 seconds. 84883: kvserver: add server-side transaction retry metrics r=arulajmani a=arulajmani This patch adds a few new metrics to track successful/failed server-side transaction retries. Specifically, whenever we attempt to retry a read or write batch or run into a read within uncertainty interval error, we increment specific counters indicating if the retry was successful or not. Release note: None 85074: upgrades: add checkpointing for `raftAppliedIndexTermMigration` r=irfansharif a=erikgrinaker Forward-port of #84909, for posterity. ---- The `raftAppliedIndexTermMigration` upgrade migration could be unreliable. It iterates over all ranges and runs a `Migrate` request which must be applied on all replicas. However, if any ranges merge or replicas are unavailable, the migration fails and starts over from the beginning. In large clusters with many ranges, this meant that it might never complete. This patch makes the upgrade more robust, by retrying each `Migrate` request 5 times, and checkpointing the progress after every fifth batch (1000 ranges), allowing resumption on failure. At some point this should be made part of the migration infrastructure. NB: This fix was initially submitted for 22.1, and even though the migration will be removed for 22.2, it is forward-ported for posterity. Release note: None 85086: eval: stop ignoring all ResolveOIDFromOID errors r=ajwerner a=rafiss fixes #84448 The decision about whether an error is safe to ignore is made at the place where the error is created/returned. This way, the callers don't need to be aware of any new error codes that the implementation may start returning in the future. Release note (bug fix): Fixed incorrect error handling that could cause casts to OID types to fail in some cases. Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
The
raftAppliedIndexTermMigration
upgrade migration could beunreliable. It iterates over all ranges and runs a
Migrate
requestwhich must be applied on all replicas. However, if any ranges merge or
replicas are unavailable, the migration fails and starts over from the
beginning. In large clusters with many ranges, this meant that it might
never complete.
This patch makes the upgrade more robust, by retrying each
Migrate
request 5 times, and checkpointing the progress after every fifth batch
(1000 ranges), allowing resumption on failure. At some point this should
be made part of the migration infrastructure.
Resolves #84073.
Release note (bug fix): the 22.1 upgrade migration "21.2-56: populate
RangeAppliedState.RaftAppliedIndexTerm for all ranges" is now more
resilient to failures, by checkpointing its progress in case it fails
and must be restarted. This migration must be applied across all
ranges and replicas in the system, and can fail with 'operation "wait
for Migrate application" timed out' if any replicas are temporarily
unavailable, which is increasingly likely to happen in large clusters
with many ranges -- previously, this would restart the migration from
the start, and might never make it all the way through.
Release justification: makes upgrades more robust.