Ensure green in RecoveryIT#testHistoryUUIDIsGenerated #31542
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
RecoveryIT#testHistoryUUIDIsGenerated
test failed because the primary and replica have different history_uuid after a rolling upgrade from 5.6 to 6.3.This failure should be very infrequent and can happen in 6.3+ in the following scenario.
Have both primary and replica assigned on v5.6 nodes
Shutdown and upgrade the replica node to 6.3
When the replica comes back, it will execute a file-based recovery. Since the commit from the primary does not have history_uuid, the replica will generate a new one via
Store#ensureIndexHasHistoryUUID
.There is no ensureGreen in the mixed cluster mode in the test; thus if we restart the primary before the phase2 of the recovery finished, the replica shard will be failed. The commit of the replica might have a history_uuid already.
When the primary node comes back, it will be assigned as the primary. Since the commit in the primary store does not have history_uuid, the primary generates a new one via
Store#bootstrapNewHistory
.The replica node will recover from the primary again with its current commit. This commit has a different history_uuid and is considered safe (max_seqno = -1, global_checkpoint = -1). The replica executes an operation-based recovery, then the primary and replica will have different history_uuids.
Note that having different history_uuid is not a severe problem because the history_uuid is merely used to prevent ops-based recovery shards with a different history.
There was a change (#28676) in 6.3 that causes this failure. Previously the global_checkpoint of the translog checkpoint of an empty index commit in the replica in step 6 would be -2 instead of -1. This change makes the replica execute an ops-based instead of file-based recovery in step 6.
This commit ensures that we wait for the recovery in the mixed cluster mode in
RecoveryIT#testHistoryUUIDIsGenerated
.Closes #31291