Curious case of draining a data node in the presence of X-Pack Security #32340

jordansissel · 2018-07-24T18:06:03Z

Describe the feature:

When x-pack security is enabled and at least one data node is excluded with cluster.routing.allocation.exclude._ip, the cluster state will never show green because a single .security shard remains unallocated.

Ideally, cluster state green would be usable in a draining scenario even when X-Pack Security is active.

Elasticsearch version (bin/elasticsearch --version): docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.3

Plugins installed: x-pack and whatever else comes on the docker image.

JVM version (java -version): docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.3

OS version (uname -a if on a Unix-like system): docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.3

Description of the problem including expected versus actual behavior:

A few weeks ago, I migrated my Elasticisearch data nodes to bigger machines. To do this, I roughly followed these steps:

Bring new nodes online and join the cluster
Drain one old machine at a time with shard allocation filtering
Proceed with additional drains, one at a time, only when the cluster finishes moving shards. I intended to observe this with cluster state "green".

My plan was to drain a single node and wait for the cluster to be green. However, it seems the way X-Pack Security configures the .security index prevents this. If there are 6 data nodes, the number of replicas for .security will be 5. When draining a node with shard allocation filtering, this means the cluster will complete draining and still be yellow!

I understand solving this may be a challenge given the complexities (replica count based on data nodes, shard allocation filtering to exclude whole nodes, etc), but I wanted to raise it for discussion.

My current workaround is to use /_cat/shards, looking for UNASSIGNED, and concluding "draining complete" only when exactly 1 UNASSIGNED shard exists and that shard must belong to the .security index.

Example:

index                              shard prirep state          docs   store ip          node
.security-6                        0     r      UNASSIGNED                              
.security-6                        0     r      UNASSIGNED                              
.security-6                        0     r      UNASSIGNED                              
.security-6                        0     r      UNASSIGNED

For the above, I expect one UNASSIGNED per drained data node. In the case above, I have 4 drained data nodes, so 4 .security replica shards are unassigned.

Steps to reproduce:

Enable X-Pack Security
Use shard allocation filtering to exclude a single node
Observe the cluster state will never go green because a single .security shard will remain unassigned.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-07-24T18:27:55Z

Pinging @elastic/es-security

elasticmachine · 2018-07-24T18:27:58Z

Pinging @elastic/es-distributed

jasontedor · 2018-07-25T01:31:07Z

@jaymode I think this will be sufficiently addressed by the plan to stop using auto-expand replicas, is that right?

jaymode · 2018-07-25T01:45:32Z

That is correct

tvernum · 2018-07-25T01:55:34Z

You guys just managed to beat me to this.
Related issue: #29933

ywelsch · 2018-07-25T07:48:25Z

Closed in favor of #29933

dnhatn added the :Security/Security Security issues without another label label Jul 24, 2018

dnhatn added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Jul 24, 2018

ywelsch closed this as completed Jul 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curious case of draining a data node in the presence of X-Pack Security #32340

Curious case of draining a data node in the presence of X-Pack Security #32340

jordansissel commented Jul 24, 2018 •

edited

Loading

elasticmachine commented Jul 24, 2018

elasticmachine commented Jul 24, 2018

jasontedor commented Jul 25, 2018

jaymode commented Jul 25, 2018

tvernum commented Jul 25, 2018

ywelsch commented Jul 25, 2018

Curious case of draining a data node in the presence of X-Pack Security #32340

Curious case of draining a data node in the presence of X-Pack Security #32340

Comments

jordansissel commented Jul 24, 2018 • edited Loading

elasticmachine commented Jul 24, 2018

elasticmachine commented Jul 24, 2018

jasontedor commented Jul 25, 2018

jaymode commented Jul 25, 2018

tvernum commented Jul 25, 2018

ywelsch commented Jul 25, 2018

jordansissel commented Jul 24, 2018 •

edited

Loading