Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure green in RecoveryIT#testHistoryUUIDIsGenerated #31542

Closed
wants to merge 1 commit into from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jun 23, 2018

RecoveryIT#testHistoryUUIDIsGenerated test failed because the primary and replica have different history_uuid after a rolling upgrade from 5.6 to 6.3.

RecoveryIT.testHistoryUUIDIsGenerated <<< FAILURES!
07:49:28    > Throwable #1: java.lang.AssertionError: different history uuid found for shard on BPdfH1zoREy4bR0ga9iksg
07:49:28    > Expected: "D9AAXTrJT7WOJsynoFB0LQ"
07:49:28    >      but: was "JsLgmYhvTEyjm_x9NWESGQ"

This failure should be very infrequent and can happen in 6.3+ in the following scenario.

  1. Have both primary and replica assigned on v5.6 nodes

  2. Shutdown and upgrade the replica node to 6.3

  3. When the replica comes back, it will execute a file-based recovery. Since the commit from the primary does not have history_uuid, the replica will generate a new one via Store#ensureIndexHasHistoryUUID.

  4. There is no ensureGreen in the mixed cluster mode in the test; thus if we restart the primary before the phase2 of the recovery finished, the replica shard will be failed. The commit of the replica might have a history_uuid already.

  5. When the primary node comes back, it will be assigned as the primary. Since the commit in the primary store does not have history_uuid, the primary generates a new one via Store#bootstrapNewHistory.

  6. The replica node will recover from the primary again with its current commit. This commit has a different history_uuid and is considered safe (max_seqno = -1, global_checkpoint = -1). The replica executes an operation-based recovery, then the primary and replica will have different history_uuids.

Note that having different history_uuid is not a severe problem because the history_uuid is merely used to prevent ops-based recovery shards with a different history.

There was a change (#28676) in 6.3 that causes this failure. Previously the global_checkpoint of the translog checkpoint of an empty index commit in the replica in step 6 would be -2 instead of -1. This change makes the replica execute an ops-based instead of file-based recovery in step 6.

This commit ensures that we wait for the recovery in the mixed cluster mode in RecoveryIT#testHistoryUUIDIsGenerated.

Closes #31291

@dnhatn dnhatn added >test Issues or PRs that are addressing/adding tests >upgrade :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.4.0 v6.3.1 labels Jun 23, 2018
@dnhatn dnhatn requested review from bleskes and ywelsch June 23, 2018 21:20
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn
Copy link
Member Author

dnhatn commented Jun 23, 2018

The argument is flawed. I am closing this and taking another look.

@dnhatn dnhatn closed this Jun 23, 2018
@dnhatn dnhatn deleted the ensure_green_test_history_uuid branch June 23, 2018 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test Issues or PRs that are addressing/adding tests >upgrade
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants