-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential backoff of failed allocation #24530
Comments
I think it's good to explore this. We can still keep the hard limit (and may increase it) - we built the feature for configuration mistakes - but delay the speed of re-assignment. @clintongormley did you run into a specific issue that triggered this? |
@bleskes Just from user feedback |
Previously, a failed allocation was retried in a tight loop that filled up log files and caused the cluster be unstable. We solved this problem by limiting the number of retries. However, this solution requires manual intervention when the environment is adjusted. This PR aims to reduce user intervention by increasing the number of retries and adding some exponential backoff delays between retries. Closes elastic#24530
Pinging @elastic/es-distributed |
FWIW I think we should lose the limit and just keep trying, at sufficiently low frequency for it not to be disruptive (e.g. back off until once-per-hour) |
Hello, |
Thanks @dhwanilpatel. I've already started working on this. I've removed the misleading |
As far as I've been able to tell, the only case where we need indefinite retries is where the allocation repeatedly fails due to a The effect of the proposal here would be to keep retrying until the shard eventually shuts down, no matter how long that takes. I would prefer that we address the underlying causes of slow shard shutdowns, because this will bring the cluster back to health much more quickly and will result in fewer full-shard recoveries after a network wobble. |
Another reason for failing allocations that eventually succeed is A related point is that we typically only repeatedly try allocation on one or two nodes, because we only avoid the very last failed node in the |
Hi, any update on this? Has this been picked yet? |
Work continues on making it so we no longer need this feature, yes. |
I doubt that it would be possible to avoid all scenarios of need exponential backoff on retries. If we're bothering to retry shard allocation anyway, why not do it right and have a backoff system? By the way, using AWS' ES service on CN-Northeast unassigned shards come up after N retries more frequently than it seems reasonable. Something like once a month. |
Seeing this on 7.5.2, I guess nodes were out of space but it stopped retrying? That seems like something that should be permanently retried rather than requiring a manual curl POST.
|
Pinging @elastic/es-distributed (Team:Distributed) |
Today when applying a new cluster state we block the cluster applier thread for up to 5s while waiting to acquire each shard lock. Failure to acquire the shard lock is treated as an allocation failure, so after 5 retries (by default) we give up on the allocation. The shard lock may be held by some other actor, typically the previous incarnation of the shard which is still shutting down, but it will eventually be released. Yet, 5 retries of 5s each is sometimes not enough time to wait. Instead it makes more sense to wait indefinitely. Moreover there's no reason why we have to create the `IndexShard` while applying the cluster state, because the shard remains in state `INITIALIZING`, and therefore unused, while it coordinates its own recovery. With this commit we try and acquire the shard lock during cluster state application, but do not wait if the lock is unavailable. Instead, we schedule a retry (also executed on the cluster state applier thread) and proceed with the rest of the cluster state application process. Relates elastic#24530
Today when applying a new cluster state we block the cluster applier thread for up to 5s while waiting to acquire each shard lock. Failure to acquire the shard lock is treated as an allocation failure, so after 5 retries (by default) we give up on the allocation. The shard lock may be held by some other actor, typically the previous incarnation of the shard which is still shutting down, but it will eventually be released. Yet, 5 retries of 5s each is sometimes not enough time to wait. Knowing that the shard lock will eventually be released, we can retry much more tenaciously. Moreover there's no reason why we have to create the `IndexShard` while applying the cluster state, because the shard remains in state `INITIALIZING`, and therefore unused, while it coordinates its own recovery. With this commit we try and acquire the shard lock during cluster state application, but do not wait if the lock is unavailable. Instead, we schedule a retry (also executed on the cluster state applier thread) and proceed with the rest of the cluster state application process. Relates #24530
Today when applying a new cluster state we block the cluster applier thread for up to 5s while waiting to acquire each shard lock. Failure to acquire the shard lock is treated as an allocation failure, so after 5 retries (by default) we give up on the allocation. The shard lock may be held by some other actor, typically the previous incarnation of the shard which is still shutting down, but it will eventually be released. Yet, 5 retries of 5s each is sometimes not enough time to wait. Knowing that the shard lock will eventually be released, we can retry much more tenaciously. Moreover there's no reason why we have to create the `IndexShard` while applying the cluster state, because the shard remains in state `INITIALIZING`, and therefore unused, while it coordinates its own recovery. With this commit we try and acquire the shard lock during cluster state application, but do not wait if the lock is unavailable. Instead, we schedule a retry (also executed on the cluster state applier thread) and proceed with the rest of the cluster state application process. Relates elastic#24530 Backport of elastic#94545 and elastic#94623 (and a little bit of elastic#94417) to 8.7
Today when applying a new cluster state we block the cluster applier thread for up to 5s while waiting to acquire each shard lock. Failure to acquire the shard lock is treated as an allocation failure, so after 5 retries (by default) we give up on the allocation. The shard lock may be held by some other actor, typically the previous incarnation of the shard which is still shutting down, but it will eventually be released. Yet, 5 retries of 5s each is sometimes not enough time to wait. Knowing that the shard lock will eventually be released, we can retry much more tenaciously. Moreover there's no reason why we have to create the `IndexShard` while applying the cluster state, because the shard remains in state `INITIALIZING`, and therefore unused, while it coordinates its own recovery. With this commit we try and acquire the shard lock during cluster state application, but do not wait if the lock is unavailable. Instead, we schedule a retry (also executed on the cluster state applier thread) and proceed with the rest of the cluster state application process. Relates #24530 Backport of #94545 and #94623 (and a little bit of #94417) to 8.7
Recent changes such as #95121 and #108145 have greatly diminished the failure rate for shard allocation due to unavailable shard locks, and other miscellaneous changes have made it less susceptible to memory pressure too. We'll continue to address other reasons for failed allocations, but as a general rule we'd rather make the recovery process resilient to failure at lower levels and avoid retrying the top-level allocation completely. Therefore I'm closing this. |
In #18467 we solved the problem where the failed allocation of a shard is retried in a tight loop, filling up the log file with exceptions. Now, after five failures, the allocation is no longer attempted until the user triggers it.
The downside of this approach is that is requires user intervention.
Would it be possible to add some kind of exponential backoff so that allocation attempts continue to be made, but with less frequency. That way we still avoid flooding the logs but if the situation resolves itself, the shard will be allocated automatically.
The text was updated successfully, but these errors were encountered: