auto_expand_replicas might lead to premature shard deletion #21717

ywelsch · 2016-11-21T20:32:28Z

Assume a 2-node cluster with an index with 1 primary and 1 replica. The index has auto_expand_replicas set 0-all. The second node drops, which leads to the first node automatically resetting the number_of_replicas to 0. This is a process that's triggered in a delayed fashion. MetaDataUpdateSettingsService listens on cluster state changed events and submits a cluster state update task to adjust number_of_replicas when it detects that the number of data nodes is smaller/greater than the number of currently-configured replicas. Assume this update successfully completes. When the second node rejoins, it sees a shard routing table that has all shards active (= primary only) and starts deleting it's local shard copy. Shortly thereafter (maybe a few milliseconds?) the first node updates the cluster state by auto-expanding the number of replicas back to one. The second node however has deleted the data and needs to resync the complete shard.

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-12-09T10:22:54Z

Duplicate of #1873

ywelsch added the :Allocation label Nov 21, 2016

clintongormley added help wanted adoptme >bug labels Nov 22, 2016

clintongormley closed this as completed Dec 9, 2016

lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto_expand_replicas might lead to premature shard deletion #21717

auto_expand_replicas might lead to premature shard deletion #21717

ywelsch commented Nov 21, 2016

clintongormley commented Dec 9, 2016

auto_expand_replicas might lead to premature shard deletion #21717

auto_expand_replicas might lead to premature shard deletion #21717

Comments

ywelsch commented Nov 21, 2016

clintongormley commented Dec 9, 2016