Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.security with status yellow when cluster.routing.allocation.same_shard.host enabled #29933

Closed
elasticmachine opened this issue Apr 28, 2017 · 9 comments
Assignees
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) :Security/Security Security issues without another label

Comments

@elasticmachine
Copy link
Collaborator

Original comment by @MarxDimitri:

Enabling cluster.routing.allocation.same_shard.host on deployments with multiple nodes on a single host puts .security to yellow. Tested with the version 5.3.

GET /_cat/indices
health status index                             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
…
yellow open   .security                         bpl6HgGdRcqmwZEqAAGBnw   1  11          1            0     17.3kb          2.8kb
…
GET /_cluster/allocation/explain
{
  "index": ".security",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "REPLICA_ADDED",
    "at": "2017-04-24T11:50:54.057Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "9-hnXjLEThy-c4TbfgqUTg",
      "node_name": "esindex-il03_warm_2",
      "transport_address": "172.27.130.206:9300",
      "node_attributes": {
        "firezone": "l",
        "box_type": "warm"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[.security][0], node[9-hnXjLEThy-c4TbfgqUTg], [R], s[STARTED], a[id=yutvMaxhTfCHI6lUgyvCDQ]]"
        }
      ]
    },
    {
      "node_id": "FSGDMHoIQwWtypVKi-Tf2A",
      "node_name": "esindex-il05_warm_2",
      "transport_address": "172.27.130.208:9301",
      "node_attributes": {
        "firezone": "l",
        "box_type": "warm"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[.security][0], node[FSGDMHoIQwWtypVKi-Tf2A], [R], s[STARTED], a[id=ffHbyCIaT0WcNpB25Xu17g]]"
        }
      ]
    },
    {
      "node_id": "G7FHRbvPSWKgrbLdwgP5Cw",
      "node_name": "esindex-il05_warm_1",
      "transport_address": "172.27.130.208:9300",
      "node_attributes": {
        "firezone": "l",
        "box_type": "warm"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated on host address [172.27.130.208], where it already exists on node [G7FHRbvPSWKgrbLdwgP5Cw]; set cluster setting [cluster.routing.allocation.same_shard.host] to false to allow multiple nodes on the same host to hold the same shard copies"
        }
      ]
    },
    {
      "node_id": "I0IBpNXYQj63CtqFzXEQng",
      "node_name": "esindex-il01_hot_2",
      "transport_address": "172.27.130.198:9301",
      "node_attributes": {
        "firezone": "l",
        "box_type": "hot"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated on host address [172.27.130.198], where it already exists on node [I0IBpNXYQj63CtqFzXEQng]; set cluster setting [cluster.routing.allocation.same_shard.host] to false to allow multiple nodes on the same host to hold the same shard copies"
        }
      ]
    },
    {
      "node_id": "Wht2Q3m4ShCG9fDb1_WPAA",
      "node_name": "esindex-il04_warm_2",
      "transport_address": "172.27.130.207:9301",
      "node_attributes": {
        "firezone": "l",
        "box_type": "warm"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated on host address [172.27.130.207], where it already exists on node [Wht2Q3m4ShCG9fDb1_WPAA]; set cluster setting [cluster.routing.allocation.same_shard.host] to false to allow multiple nodes on the same host to hold the same shard copies"
        }
      ]
    },
    {
      "node_id": "XZMimYlLThSbMY3L9sYFbA",
      "node_name": "esindex-il03_warm_1",
      "transport_address": "172.27.130.206:9301",
      "node_attributes": {
        "firezone": "l",
        "box_type": "warm"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated on host address [172.27.130.206], where it already exists on node [XZMimYlLThSbMY3L9sYFbA]; set cluster setting [cluster.routing.allocation.same_shard.host] to false to allow multiple nodes on the same host to hold the same shard copies"
        }
      ]
    },
    {
      "node_id": "XiYyR0iMR6Kmb49DYoeuTw",
      "node_name": "esindex-il00_hot_2",
      "transport_address": "172.27.130.197:9300",
      "node_attributes": {
        "firezone": "l",
        "box_type": "hot"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[.security][0], node[XiYyR0iMR6Kmb49DYoeuTw], [R], s[STARTED], a[id=0IuYqdzeRs-FXqHpzNDSjQ]]"
        }
      ]
    },
    {
      "node_id": "ZSk0zfN3SJeg09PDkK8E6Q",
      "node_name": "esindex-il01_hot_1",
      "transport_address": "172.27.130.198:9300",
      "node_attributes": {
        "firezone": "l",
        "box_type": "hot"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[.security][0], node[ZSk0zfN3SJeg09PDkK8E6Q], [R], s[STARTED], a[id=yDdnf-wkSlySVhwhm2JsNA]]"
        }
      ]
    },
    {
      "node_id": "ceG6PFJ6Qq2s2ia8dAB68w",
      "node_name": "esindex-il02_hot_2",
      "transport_address": "172.27.130.199:9301",
      "node_attributes": {
        "firezone": "l",
        "box_type": "hot"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated on host address [172.27.130.199], where it already exists on node [ceG6PFJ6Qq2s2ia8dAB68w]; set cluster setting [cluster.routing.allocation.same_shard.host] to false to allow multiple nodes on the same host to hold the same shard copies"
        }
      ]
    },
    {
      "node_id": "eE8aD3CwTJeEEg30KbfFDg",
      "node_name": "esindex-il00_hot_1",
      "transport_address": "172.27.130.197:9301",
      "node_attributes": {
        "firezone": "l",
        "box_type": "hot"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated on host address [172.27.130.197], where it already exists on node [eE8aD3CwTJeEEg30KbfFDg]; set cluster setting [cluster.routing.allocation.same_shard.host] to false to allow multiple nodes on the same host to hold the same shard copies"
        }
      ]
    },
    {
      "node_id": "uKD4Gy6ZRe-SolYYQNVMmw",
      "node_name": "esindex-il02_hot_1",
      "transport_address": "172.27.130.199:9300",
      "node_attributes": {
        "firezone": "l",
        "box_type": "hot"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[.security][0], node[uKD4Gy6ZRe-SolYYQNVMmw], [R], s[STARTED], a[id=Yln4Q0N6TrGD1FCuFvXgMg]]"
        }
      ]
    },
    {
      "node_id": "ui7i2lC3S7K_hvjF8_BMtw",
      "node_name": "esindex-il04_warm_1",
      "transport_address": "172.27.130.207:9300",
      "node_attributes": {
        "firezone": "l",
        "box_type": "warm"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[.security][0], node[ui7i2lC3S7K_hvjF8_BMtw], [P], s[STARTED], a[id=cNLniAwgS4mrZDFDPRxc3g]]"
        }
      ]
    }
  ]
}
@elasticmachine
Copy link
Collaborator Author

Original comment by @abeyad:

I don't follow - this is expected behavior. If you set cluster.routing.allocation.same_shard.host to true, that means even if there are two different nodes on the same host, and a shard copy exists on one of those nodes, then a replica copy won't be allocated to the other node, because they are on the same physical host. By default, the value for this setting is false, so if two nodes are run on the same host, they can each be allocated a copy of the same shard.

Does that make sense?

@elasticmachine
Copy link
Collaborator Author

Original comment by @tvernum:

I think the issue (which we might not consider to be a problem) is that the security template sets

    "auto_expand_replicas" : "0-all",

So (I assume, without having actually verified it myself) there's an inherent incompatibility between x-pack security and cluster.routing.allocation.same_shard.host

If you are running multiple nodes on the same host, and want to set cluster.routing.allocation.same_shard.host to true in order to ensure your replicas are physically distributed, then your security index will always be yellow.

I think the issue if there is one, is that cluster.routing.allocation.same_shard.host is a blunt instrument - it applies to all indices, even those that are set to auto expand to all nodes.
In this particularly case it looks like the desired effect would be to have it apply to the user's own indices, but be ignored by the security index.

@elasticmachine
Copy link
Collaborator Author

Original comment by @jaymode:

I think this ties into whether or not we should use auto expand replicas for the security index (another issue open about that).

@elasticmachine
Copy link
Collaborator Author

Original comment by @abeyad:

@tvernum thanks for the explanation, I follow what you're saying. Yes, I think then given auto_expand_replicas is used in this manner, then setting cluster.routing.allocation.same_shard.host to true will always guarantee yellow health for the .security index

@elasticmachine
Copy link
Collaborator Author

Original comment by @robin13:

There's a lot of people having multiple nodes on one machine: all of them should be using allocation awareness to avoid primary/replica on the same machine (cluster.routing.allocation.same_shard.host) so for all of them it seems that using security will be a problem...

@elasticmachine elasticmachine added :Security/Security Security issues without another label discuss labels Apr 25, 2018
@colings86 colings86 added the >bug label Apr 25, 2018
@DaveCTurner DaveCTurner added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Jun 15, 2018
@elasticmachine
Copy link
Collaborator Author

Pinging @elastic/es-distributed

@ywelsch
Copy link
Contributor

ywelsch commented Jun 29, 2018

The root cause for this is a well-known issue: auto-expand-replicas and shard allocation settings such as filtering, awareness and same_shard.host do not play well together: #2869

@jaymode We discussed this during FixitFriday and wondered if we need to solve the larger issue (linked above) or whether we could just not auto-expand the .security index to all nodes. IIRC you mentioned performance concerns a year ago or so. Is this still a current concern or could we change "auto_expand_replicas" : "0-all" for example to "auto_expand_replicas" : "0-1"? I've marked it as team-discuss for the security team (and would be happy to attend the security meeting when this is discussed)

@jaymode jaymode self-assigned this Jun 29, 2018
@jaymode
Copy link
Member

jaymode commented Jun 29, 2018

@ywelsch I think we should just go ahead with 0-1, especially since it can be updated by the user if necessary.

@ywelsch ywelsch added help wanted adoptme and removed help wanted adoptme labels Jun 29, 2018
@ywelsch
Copy link
Contributor

ywelsch commented Jun 29, 2018

OK, thanks @jaymode

jaymode added a commit to jaymode/elasticsearch that referenced this issue Aug 24, 2018
This change removes the use 0-all for auto expand replicas for the
security index. The use of 0-all causes some unexpected behavior with
certain allocation settings. This change allows us to avoid these with
a default install. If necessary, the number of replicas can be tuned by
the user.

Closes elastic#29933
Closes elastic#29712
jaymode added a commit that referenced this issue Aug 24, 2018
This change removes the use of 0-all for auto expand replicas for the
security index. The use of 0-all causes some unexpected behavior with
certain allocation settings. This change allows us to avoid these with
a default install. If necessary, the number of replicas can be tuned by
the user.

Closes #29933
Closes #29712
jaymode added a commit that referenced this issue Aug 24, 2018
This change removes the use of 0-all for auto expand replicas for the
security index. The use of 0-all causes some unexpected behavior with
certain allocation settings. This change allows us to avoid these with
a default install. If necessary, the number of replicas can be tuned by
the user.

Closes #29933
Closes #29712
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) :Security/Security Security issues without another label
Projects
None yet
Development

No branches or pull requests

5 participants