Curious case of draining a data node in the presence of X-Pack Security #32340
Labels
:Distributed Coordination/Allocation
All issues relating to the decision making around placing a shard (both master logic & on the nodes)
:Security/Security
Security issues without another label
Describe the feature:
When x-pack security is enabled and at least one data node is excluded with
cluster.routing.allocation.exclude._ip
, the cluster state will never showgreen
because a single .security shard remains unallocated.Ideally, cluster state
green
would be usable in a draining scenario even when X-Pack Security is active.Elasticsearch version (
bin/elasticsearch --version
): docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.3Plugins installed: x-pack and whatever else comes on the docker image.
JVM version (
java -version
): docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.3OS version (
uname -a
if on a Unix-like system): docker.elastic.co/elasticsearch/elasticsearch-platinum:6.2.3Description of the problem including expected versus actual behavior:
A few weeks ago, I migrated my Elasticisearch data nodes to bigger machines. To do this, I roughly followed these steps:
My plan was to drain a single node and wait for the cluster to be green. However, it seems the way X-Pack Security configures the
.security
index prevents this. If there are 6 data nodes, the number of replicas for.security
will be 5. When draining a node with shard allocation filtering, this means the cluster will complete draining and still be yellow!I understand solving this may be a challenge given the complexities (replica count based on data nodes, shard allocation filtering to exclude whole nodes, etc), but I wanted to raise it for discussion.
My current workaround is to use
/_cat/shards
, looking for UNASSIGNED, and concluding "draining complete" only when exactly 1 UNASSIGNED shard exists and that shard must belong to the.security
index.Example:
For the above, I expect one UNASSIGNED per drained data node. In the case above, I have 4 drained data nodes, so 4 .security replica shards are unassigned.
Steps to reproduce:
The text was updated successfully, but these errors were encountered: