Track local checkpoint on primary immediately #25434

jasontedor · 2017-06-27T18:31:32Z

Today after a primary shard is recovered or promoted from a replica, it will not be tracking its own local checkpoint until the first indexing operation on the shard. This leads to situations where a shard is relocated and the knowledge it sends in the primary context about its own local checkpoint violates the global checkpoint. This commit rectifies this situation by tracking the local checkpoint on such shards immediately on recovery/promotion/relocation.

Closes #25415, relates #10708, relates #25355

…state

Today after a primary shard is recovered or promoted from a replica, it will not be tracking its own local checkpoint until the first indexing operation on the shard. This leads to situations where a shard is relocated and the knowledge it sends in the primary context about its own local checkpoint violates the global checkpoint. This commit rectifies this situation by tracking the local checkpoint on such shards immediately on recovery/promotion/relocation.

jasontedor · 2017-06-27T18:31:54Z

@ywelsch I'm opening this for discussion.

* master: Do not swallow exception when relocating Docs: Fix typo for request cache (elastic#25444) Remove implicit 32-bit support [DOCS] reworded to prevent code span rendering glitch (elastic#25442) Disallow multiple concurrent recovery attempts for same target shard (elastic#25428) Update global checkpoint when increasing primary term on replica (elastic#25422) Add backwards compatibility indices for 5.4.3 Add version 5.4.3 after release Update MSI installer images (elastic#25414) Add missing newline at end of SetsTests.java Rename handoff primary context transport handler correct expected thrown exception in mappingMetaData to ElasticsearchParseException (elastic#25410) test: Make many percolator integration tests real integration tests [DOCS] Update docs to use shared attribute file (elastic#25403) Add Javadocs and tests for set difference methods Tests: Add parsing test for AggregationsTests (elastic#25396) test: get upgrade status for all indices Mute SignificantTermsAggregatorTests#testSignificance()

…cal-checkpoint * enhance/single-updateshardstate-method: Some cleanup Do not swallow exception when relocating Docs: Fix typo for request cache (elastic#25444) Remove implicit 32-bit support [DOCS] reworded to prevent code span rendering glitch (elastic#25442) Disallow multiple concurrent recovery attempts for same target shard (elastic#25428) Update global checkpoint when increasing primary term on replica (elastic#25422) Add backwards compatibility indices for 5.4.3 Add version 5.4.3 after release Update MSI installer images (elastic#25414) Add missing newline at end of SetsTests.java fix test Rename handoff primary context transport handler Provide single IndexShard method to update state on incoming cluster state

* master: Use a single method to update shard state

jasontedor · 2017-06-28T14:06:47Z

@ywelsch I think this is ready for you after integrating #25431.

ywelsch

I've made a suggestion around the update condition and suggested to use unit tests instead.

ywelsch · 2017-06-28T14:25:13Z

core/src/main/java/org/elasticsearch/index/seqno/GlobalCheckpointTracker.java

@@ -360,6 +365,7 @@ synchronized void updateAllocationIdsFromPrimaryContext(final PrimaryContext pri
                .allMatch(e -> e.value == SequenceNumbersService.UNASSIGNED_SEQ_NO) : inSyncLocalCheckpoints;
        assert StreamSupport
                .stream(trackingLocalCheckpoints.spliterator(), false)
+                .filter(e -> !e.key.equals(allocationId)) // during primary relocation a shard can already know its local checkpoint


I think we should not filter out the allocation id, but keep the method as before and instead assert in the IndexShard.updateAllocationIdsFromPrimaryContext method that the local checkpoint that is provided as part of the primary context is equal to the local checkpoint that exists locally.

ywelsch · 2017-06-28T14:31:02Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+                    final SequenceNumbersService seqNoService = engine.seqNoService();
+                    seqNoService.updateAllocationIdsFromMaster(applyingClusterStateVersion, activeAllocationIds, initializingAllocationIds);
+                    if ((currentState == IndexShardState.POST_RECOVERY && state == IndexShardState.STARTED) ||
+                            recoveryState.getRecoverySource().getType().equals(RecoverySource.Type.PEER)) {


With the change I suggested above we can leave out the == PEER condition and instead add && currentRouting.isRelocationTarget == false to currentState == IndexShardState.POST_RECOVERY && state == IndexShardState.STARTED

ywelsch · 2017-06-28T14:36:44Z

core/src/test/java/org/elasticsearch/index/shard/LocalCheckpointIT.java

+ * Tests that a primary shard tracks its own local checkpoint after starting.
+ */
+@ESIntegTestCase.ClusterScope(scope = Scope.TEST, numDataNodes = 0)
+public class LocalCheckpointIT extends ESIntegTestCase {


I wonder if we need an integration test for this. A test under IndexShardTests would be able to check the same.

Yeah, now that we are not doing this via indices cluster state service, this is now possible. I will update.

jasontedor · 2017-06-28T15:49:06Z

This will be superseded by a forthcoming PR.

ywelsch and others added 5 commits June 27, 2017 17:38

Provide single IndexShard method to update state on incoming cluster …

9f63c72

…state

Remove test logging

d3fc284

Remove imports

52b7997

Remove test logging and imports

c4143e6

jasontedor added :Sequence IDs blocker >bug v6.0.0 labels Jun 27, 2017

jasontedor requested a review from ywelsch June 27, 2017 18:31

bleskes mentioned this pull request Jun 27, 2017

Sequence Numbers related work slated for 6.0.0 #25355

Closed

9 tasks

jasontedor and others added 8 commits June 27, 2017 14:58

Checkstyle

c734303

License header

b8ecb62

fix test

792f093

Some cleanup

95e25ab

Merge branch 'master' into primary-local-checkpoint

11fcf6f

* master: Use a single method to update shard state

Remove newline change

0de7a23

Improve condition

79d1251

ywelsch suggested changes Jun 28, 2017

View reviewed changes

jasontedor closed this Jun 28, 2017

jasontedor deleted the primary-local-checkpoint branch June 28, 2017 15:49

jasontedor restored the primary-local-checkpoint branch July 4, 2017 13:54

jasontedor deleted the primary-local-checkpoint branch July 4, 2017 13:54

jasontedor restored the primary-local-checkpoint branch July 4, 2017 13:54

colings86 added v6.0.0-beta1 and removed v6.0.0 labels Jul 31, 2017

clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track local checkpoint on primary immediately #25434

Track local checkpoint on primary immediately #25434

jasontedor commented Jun 27, 2017

jasontedor commented Jun 27, 2017

jasontedor commented Jun 28, 2017

ywelsch left a comment

ywelsch Jun 28, 2017

ywelsch Jun 28, 2017

ywelsch Jun 28, 2017

jasontedor Jun 28, 2017

jasontedor commented Jun 28, 2017

Track local checkpoint on primary immediately #25434

Track local checkpoint on primary immediately #25434

Conversation

jasontedor commented Jun 27, 2017

jasontedor commented Jun 27, 2017

jasontedor commented Jun 28, 2017

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Jun 28, 2017

Choose a reason for hiding this comment

ywelsch Jun 28, 2017

Choose a reason for hiding this comment

ywelsch Jun 28, 2017

Choose a reason for hiding this comment

jasontedor Jun 28, 2017

Choose a reason for hiding this comment

jasontedor commented Jun 28, 2017