-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use auto_expand_replicas #3580
Do not use auto_expand_replicas #3580
Conversation
The following is simple reproduction script: # Start two ES nodes and for clarity make sure each node stores data to different location
# ES 5.2.2
./bin/elasticsearch -E path.data=./node1/data -E path.logs=./node1/logs
./bin/elasticsearch -E path.data=./node2/data -E path.logs=./node2/logs
# ES 2.4.4
# ./bin/elasticsearch -Dpath.data=./node1/data -Dpath.logs=./node1/logs
# ./bin/elasticsearch -Dpath.data=./node2/data -Dpath.logs=./node2/logs After nodes are up and cluster is formed: ES1=http://localhost:9200
ES2=http://localhost:9201
# curl $ES1/_cluster/health?pretty
# curl $ES2/_cluster/health?pretty
# Prepare index template using the "auto_expand_replicas"
curl -X POST $ES1/_template/auto_index -d '{
"template": "auto*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"auto_expand_replicas": "0-3"
}
}'
# Another index template using 1/1 shard/replica schema
curl -X POST $ES1/_template/fixed_index -d '{
"template": "fixed*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}'
# Index two documents. Each goes to different index.
# - "auto" index uses 'auto_index_expand'
# - "fixed" index uses 1/1 schema
curl -X POST $ES1/auto/type -d '{ "name": "foo" }'
curl -X POST $ES1/fixed/type -d '{ "name": "foo" }'
# ... and let's make sure the data is flushed to disk
# so we can clearly track Lucene segment files on the FS
curl -X POST "$ES1/_flush?wait_if_ongoing=true&force=true" At this point we can check curl $ES1/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open auto 6OLcLCRhT8GM34aExedFkg 1 1 1 0 6.6kb 3.3kb
green open fixed miBt1Om1SsegDb67XNpmIg 1 1 1 0 6.6kb 3.3kb And we can also check data store for each node: $ cd ./node1/data/nodes/0/indices
$ du -h
4.0K ./6OLcLCRhT8GM34aExedFkg/0/_state
16K ./6OLcLCRhT8GM34aExedFkg/0/index
8.0K ./6OLcLCRhT8GM34aExedFkg/0/translog
28K ./6OLcLCRhT8GM34aExedFkg/0
4.0K ./6OLcLCRhT8GM34aExedFkg/_state
32K ./6OLcLCRhT8GM34aExedFkg
4.0K ./miBt1Om1SsegDb67XNpmIg/0/_state
16K ./miBt1Om1SsegDb67XNpmIg/0/index
8.0K ./miBt1Om1SsegDb67XNpmIg/0/translog
28K ./miBt1Om1SsegDb67XNpmIg/0
4.0K ./miBt1Om1SsegDb67XNpmIg/_state
32K ./miBt1Om1SsegDb67XNpmIg
64K . $ cd ./node2/data/nodes/0/indices
$ du -h
4.0K ./6OLcLCRhT8GM34aExedFkg/0/_state
16K ./6OLcLCRhT8GM34aExedFkg/0/index
8.0K ./6OLcLCRhT8GM34aExedFkg/0/translog
28K ./6OLcLCRhT8GM34aExedFkg/0
4.0K ./6OLcLCRhT8GM34aExedFkg/_state
32K ./6OLcLCRhT8GM34aExedFkg
4.0K ./miBt1Om1SsegDb67XNpmIg/0/_state
16K ./miBt1Om1SsegDb67XNpmIg/0/index
8.0K ./miBt1Om1SsegDb67XNpmIg/0/translog
28K ./miBt1Om1SsegDb67XNpmIg/0
4.0K ./miBt1Om1SsegDb67XNpmIg/_state
32K ./miBt1Om1SsegDb67XNpmIg
64K . Both index data on both indices. Now, let's disable allocation: curl -X PUT $ES1/_cluster/settings -d '{
"transient": {"cluster.routing.allocation.enable": "none"}
}'
# {"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"enable":"none"}}}}} Shutdown the node2. # The following will appear in log of node1:
# [CJxI8Hy] updating number_of_replicas to [0] for indices [auto]
# [CJxI8Hy] [auto/6OLcLCRhT8GM34aExedFkg] auto expanded replicas to [0] Start the node2 again: ./bin/elasticsearch -E path.data=./node2/data -E path.logs=./node2/logs
# its log will stop at:
# [mQvaM96] started Now the data for index $ du -h
4.0K ./6OLcLCRhT8GM34aExedFkg/_state
4.0K ./6OLcLCRhT8GM34aExedFkg
4.0K ./miBt1Om1SsegDb67XNpmIg/0/_state
16K ./miBt1Om1SsegDb67XNpmIg/0/index
8.0K ./miBt1Om1SsegDb67XNpmIg/0/translog
28K ./miBt1Om1SsegDb67XNpmIg/0
4.0K ./miBt1Om1SsegDb67XNpmIg/_state
32K ./miBt1Om1SsegDb67XNpmIg
36K . Now, let's enable allocation: curl -X PUT $ES1/_cluster/settings -d '{
"transient": {"cluster.routing.allocation.enable": "all"}
}'
# {"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"enable":"all"}}}}} And the data for index $ du -h
4.0K ./6OLcLCRhT8GM34aExedFkg/0/_state
16K ./6OLcLCRhT8GM34aExedFkg/0/index
8.0K ./6OLcLCRhT8GM34aExedFkg/0/translog
28K ./6OLcLCRhT8GM34aExedFkg/0
4.0K ./6OLcLCRhT8GM34aExedFkg/_state
32K ./6OLcLCRhT8GM34aExedFkg
4.0K ./miBt1Om1SsegDb67XNpmIg/0/_state
16K ./miBt1Om1SsegDb67XNpmIg/0/index
8.0K ./miBt1Om1SsegDb67XNpmIg/0/translog
28K ./miBt1Om1SsegDb67XNpmIg/0
4.0K ./miBt1Om1SsegDb67XNpmIg/_state
32K ./miBt1Om1SsegDb67XNpmIg
64K . For more detailed investigation it is possible to turn on logging like this:
This brings related low level info into ES logs. |
@lukas-vlcek the intent with using the Are we going to start recommending to customers that they should be updating the number of replicas based on their cluster size and also providing commands/documentation on how they would accomplish this? That is my concern with us no longer using |
@ewolinetz I understand and welcome effort to minimize user maintenance work and I am not against it. But this ticket is not about maintenance ease, this ticket tries to describe issues associated with using From what I understand when //cc @portante |
@ewolinetz As for the second part of your question:
I think unless we can reliably automate this for users then providing tooling sounds like good option to me. What is wrong with cluster in yellow state compared to single node cluster in green state? With single node cluster aren't you in permanent risk of loosing service availability anyway? |
I think we just need to document the cases where having a yellow cluster is OK. It is really a problem for large deployments if nodes have to be unnecessarily initialized with Elasticsearch data. |
@lukas-vlcek can you also open a PR for this against release-1.5? |
@@ -7,8 +7,7 @@ script: | |||
|
|||
index: | |||
number_of_shards: 1 | |||
number_of_replicas: 0 | |||
auto_expand_replicas: 0-2 | |||
number_of_replicas: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this 0
by default, and give the customers a setting to change it to the value they want.
I would think both of these numbers (shard and replica counts) should be templated to allow a variable to change their values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I make the argument that this is something we can specify as part of the install?
Or if a customer states they want an ES cluster size of 2
or more we would then change this value?
Or or - if not specified we would change this value based on the intended cluster size (0 - 2) but then give users the ability to overwrite it if they so desire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the number_of_replicas configurable instead?
@lukas-vlcek It looks like this PR is to address https://bugzilla.redhat.com/show_bug.cgi?id=1430910. Could you also update the commit message to 'bug 1430910. MESSAGE HERE' which will add commit info to PR |
@ewolinetz If we are to make this configurable, we should consider how to incorporate these changes as part of a map. It is going to become exceedingly unweildy to create a var like 'openshift_logging_ADDYOURCONFIGOPTIONHERE' everytime we add these tweeks. It probably should be something like:
|
@jcantrill i'd just be hesitant to increase the complexity simply to reduce verbosity |
@ewolinetz I dont see this is being that much more additionally complex. Ansible already supports the notion of variable files which are yaml as opposed to inventory files. This seems like a more advanced configuration anyway where we might expect users to have or be able to utilize variable files. |
@lukas-vlcek can you update the template for the es config to use two variables, one for primary shards one for replicas? Something along the lines of The defaults should probably be 1 primary and 0 or 1 replicas (either should be fine, customers are currently used to having yellow state if they only have one node). |
@ewolinetz I will do it on Monday (on PTO today 😎 ). |
80381c1
to
c4a615e
Compare
@ewolinetz let me know if this is what you meant (
|
@ewolinetz May be I should add used variables also to both |
@lukas-vlcek, I think the default for shard replication should be " |
c4a615e
to
e4f8ffb
Compare
@portante updated |
@lukas-vlcek if you could just add to We likely should also have ops instances of these as well, since customers may want to configure them differently. That would mean that we would then pass in the values to the templates in Can you also update the README for the role just to explain what these vars do? Otherwise LGTM |
e4f8ffb
to
b203e12
Compare
@ewolinetz I think I need some help figuring out how to modify |
072947d
to
b73bd39
Compare
@ewolinetz I did another update, PTAL |
roles/openshift_logging/README.md
Outdated
@@ -72,6 +72,8 @@ When both `openshift_logging_install_logging` and `openshift_logging_upgrade_log | |||
- `openshift_logging_es_recover_after_time`: The amount of time ES will wait before it tries to recover. Defaults to '5m'. | |||
- `openshift_logging_es_storage_group`: The storage group used for ES. Defaults to '65534'. | |||
- `openshift_logging_es_nodeselector`: A map of labels (e.g. {"node":"infra","region":"west"} to select the nodes where the pod will land. | |||
- `openshift_logging_es_number_of_shards`: The number of shards for every new index created in ES. Defaults to '1'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The number of primary shards"
roles/openshift_logging/README.md
Outdated
@@ -88,6 +90,8 @@ same as above for their non-ops counterparts, but apply to the OPS cluster insta | |||
- `openshift_logging_es_ops_pvc_prefix`: logging-es-ops | |||
- `openshift_logging_es_ops_recover_after_time`: 5m | |||
- `openshift_logging_es_ops_storage_group`: 65534 | |||
- `openshift_logging_es_ops_number_of_shards`: The number of shards for every new index created in ES. Defaults to '1'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The number of primary shards"
82526cb
to
44a5dcc
Compare
@ewolinetz updated |
aos-ci-test |
@@ -134,6 +136,8 @@ | |||
openshift_logging_es_recover_after_time: "{{openshift_logging_es_ops_recover_after_time}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be openshift_logging_es_recover_after_time
? Or should that be openshift_logging_es_ops_recover_after_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@portante It can make sense, I will look at this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@portante I think it is correct.
There is {{openshift_logging_es_recover_after_time}}
value set to ${RECOVER_AFTER_TIME}
env variable in roles/openshift_logging/templates/es.j2
which is then used in elasticsearch.yml.j2
template.
//cc @ewolinetz (Erik see below please)
However, I do not understand why this value is set only for DeploymentConfig for Ops in tasks/install_elasticsearch.yaml
script. Please compare vars
sections for Ops vs vars
section for Non-Ops.
Why the following four vars
are set only for Ops DeploymentConfig?
es_node_quorum
es_recover_after_nodes
es_recover_expected_nodes
openshift_logging_es_recover_after_time
Apart from this if there is any issue with openshift_logging_es_recover_after_time
usage and configuration then I suggest to address it in separated ticket/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my IRC comment, the difference is when running the 'ops' tasks, we update the variables to be the 'ops' variables. Technically, we should have written these tasks to run twice with a single set of tasks: once with non-ops and the other with ops. The four variables in question are found in the vars/main.yaml
44a5dcc - State: success - All Test Contexts: aos-ci-jenkins/OS_unit_tests - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-2-unit-tests-1179/44a5dcc62e582724a79dfe83bebe7c30a4d89ab2.txt |
44a5dcc - State: error - All Test Contexts: aos-ci-jenkins/OS_3.4_containerized - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1182/44a5dcc62e582724a79dfe83bebe7c30a4d89ab2.txt |
44a5dcc - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1182/44a5dcc62e582724a79dfe83bebe7c30a4d89ab2.txt |
44a5dcc - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_NOT_containerized, aos-ci-jenkins/OS_3.4_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1182/44a5dcc62e582724a79dfe83bebe7c30a4d89ab2.txt |
44a5dcc - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1182/44a5dcc62e582724a79dfe83bebe7c30a4d89ab2.txt |
44a5dcc
to
a59661b
Compare
rebased |
a59661b
to
9460def
Compare
aos-ci-test |
9460def - State: success - All Test Contexts: aos-ci-jenkins/OS_unit_tests - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-2-unit-tests-1199/9460defa2c2764b279322efb88a65149416683f5.txt |
9460def - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_NOT_containerized, aos-ci-jenkins/OS_3.4_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1202/9460defa2c2764b279322efb88a65149416683f5.txt |
9460def - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.4_containerized, aos-ci-jenkins/OS_3.4_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.4,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1202/9460defa2c2764b279322efb88a65149416683f5.txt |
9460def - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_NOT_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster,TargetBranch=master,nodes=openshift-ansible-slave-1202/9460defa2c2764b279322efb88a65149416683f5.txt |
9460def - State: success - All Test Contexts: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" - Logs: https://aos-ci.s3.amazonaws.com/openshift/openshift-ansible/jenkins-openshift-ansible-3-test-matrix-CONTAINERIZED=_containerized,OSE_VER=3.5,PYTHON=System-CPython-2.7,TOPOLOGY=openshift-cluster-containerized,TargetBranch=master,nodes=openshift-ansible-slave-1202/9460defa2c2764b279322efb88a65149416683f5.txt |
[merge] |
Evaluated for openshift ansible merge up to 9460def |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/88/) (Base Commit: fdd0889) |
number_of_replicas: 0 | ||
auto_expand_replicas: 0-2 | ||
number_of_shards: {{ es_number_of_shards | default ('1') }} | ||
number_of_replicas: {{ es_number_of_replicas | default ('0') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly sure why, but because of the way we use this template, the values have to be quoted. Please change this to be:
number_of_shards: "{{ es_number_of_shards | default ('1') }}"
number_of_replicas: "{{ es_number_of_replicas | default ('0') }}"
You'll have to submit a new PR since this one was merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use
auto_expand_replicas
feature. It is associated with unnecessary data recovery from other nodes.See for details: elastic/elasticsearch#1873