-
-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restart_on_update considered harmful #288
Comments
Perhaps a better solution for server nodes would be to send |
I have not used it before, but this is one thought ~ https://github.com/websterclay/chef-dominodes |
I am comfortable with making the change to using SIGHUP. |
I think that having any kind of cluster mutex is a code smell. I would much rather leave it to the system administrators to upgrade the agent. What I'll probably end up doing is making sure that the configuration resource sends a reload (SIGHUP) and that updates to the installation don't actually change the running service. |
Please note consul itself is designed to handle and support these use cases. See examples below for a method which would prevent multiple nodes from restarting simultaneously. Second thing which is more a policy issue - IMHO at the end of the day restricting service restarts and associated behavior is NOT the responsibility of individual cookbooks (e.g. the consul-cookbook) - or if there's need, it should be configurable with a default to restart. It's up to the END user to manage server/cluster state safely vs. a specific cookbook to manage that state. PagerDuty uses etcd to orchestrate cluster changes. See the RFC for an example on where they added cluster state management support to chef. https://github.com/chef/chef-rfc/blob/master/rfc039-event-handler-dsl.md This way for systems that do NOT care about such a restart, this becomes a moot point. And it continues to work for those of us who DO have cluster state tackled already ;) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This cookbook restarts Consul if the configuration changes. That seems reasonable at first.
But Consul relies heavily on maintaining quorum; if quorum is not maintained, the Consul cluster goes into an outage. All that's needed is for
ceil(N/2)
servers to restart at the same time. Allowing Chef to restart Consul servers dramatically increases the likelihood of encountering this scenario. I believe that I have experienced this several times now. Here's what happens.At that point, my only option is to rebuild the Consul cluster. Not good.
I recommend that this cookbook make one of the following changes:
false
and come with a stern warning.Isn't it strange that Consul doesn't have a built-in way of safely restarting individual servers, or even the entire cluster?
The text was updated successfully, but these errors were encountered: