-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make REST API check stricter #882
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -128,29 +128,40 @@ def create(self): | |
return elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options) | ||
|
||
|
||
def wait_for_rest_layer(es, max_attempts=20): | ||
def wait_for_rest_layer(es, max_attempts=40): | ||
""" | ||
Waits for ``max_attempts`` until Elasticsearch's REST API is available. | ||
|
||
:param es: Elasticsearch client to use for connecting. | ||
:param max_attempts: The maximum number of attempts to check whether the REST API is available. | ||
:return: True iff Elasticsearch is available. | ||
:return: True iff Elasticsearch's REST API is available. | ||
""" | ||
# assume that at least the hosts that we expect to contact should be available. Note that this is not 100% | ||
# bullet-proof as a cluster could have e.g. dedicated masters which are not contained in our list of target hosts | ||
# but this is still better than just checking for any random node's REST API being reachable. | ||
expected_node_count = len(es.transport.hosts) | ||
logger = logging.getLogger(__name__) | ||
for attempt in range(max_attempts): | ||
logger.debug("REST API is available after %s attempts", attempt) | ||
import elasticsearch | ||
try: | ||
es.info() | ||
# see also WaitForHttpResource in Elasticsearch tests. Contrary to the ES tests we consider the API also | ||
# available when the cluster status is RED (as long as all required nodes are present) | ||
es.cluster.health(wait_for_nodes=">={}".format(expected_node_count)) | ||
logger.info("REST API is available for >= [%s] nodes after [%s] attempts.", expected_node_count, attempt) | ||
return True | ||
except elasticsearch.ConnectionError as e: | ||
if "SSL: UNKNOWN_PROTOCOL" in str(e): | ||
raise exceptions.SystemSetupError("Could not connect to cluster via https. Is this an https endpoint?", e) | ||
else: | ||
time.sleep(1) | ||
logger.debug("Got connection error on attempt [%s]. Sleeping...", attempt) | ||
time.sleep(3) | ||
except elasticsearch.TransportError as e: | ||
if e.status_code == 503: | ||
time.sleep(1) | ||
elif e.status_code == 401: | ||
time.sleep(1) | ||
# cluster block, x-pack not initialized yet, our wait condition is not reached | ||
if e.status_code in (503, 401, 408): | ||
logger.debug("Got status code [%s] on attempt [%s]. Sleeping...", e.status_code, attempt) | ||
time.sleep(3) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine increasing this although I'd opt for more retries instead of a larger sleep period. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More retries is fine by me too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've increased the number of retries to 40 now in cbf6dec. |
||
else: | ||
logger.warning("Got unexpected status code [%s] on attempt [%s].", e.status_code, attempt) | ||
raise e | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very useful comment to have.
Question: would it be dangerous to trigger a sniff of the eligible http hosts (e.g. via a helper method in our EsClientFactory invoking #elasticsearch.Transport#sniff_hosts())? I was thinking if we explicitly ask for a fresh list of hosts before the check, then no unavailable hosts should be reachable. Then the same call could be invoked before the load driver starts. The caveat with this approach would be that it could potentially override the explicitly list provided by
--target-hosts
. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try not to build any smartness into this? Without reading all of the involved code I don't think we can reason what nodes will be returned by the
sniff_hosts
call on cluster bootstrap (let's assume not all nodes are up yet or not all of them might have opened the HTTP port). I was even considering exposing an explicit command line parameter but thought that this would be a good compromise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Let's keep things simple here.
For the record I had a look what the elasticsearch py client does when sniff gets invoked here and it collects a list of eligible http hosts via
/_nodes/_all/http
.