Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OCF HA] Change master score computation & split-brain detection logic #929

Merged
merged 2 commits into from
Aug 22, 2016

Conversation

dmitrymex
Copy link
Contributor

The proposed two commits fix the following 2 bugs opened in Fuel downstream:
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894

For the explanation, please see the commit messages.

Right now we assign 1000 to the oldest nodes and 1 to others. That
creates a problem when Master restarts and no node is promoted until
that node starts back. In that case the returned node will have score
of 1, like all other slaves and Pacemaker will select to promote it
again. The node is clean empty and afterwards other slaves join to
it, wiping their data as well. As a result, we loose all the messages.

The new algorithm actually ranks nodes, not just selects the oldest
one. It also maintains the invariant that if node A started later
than node B, then node A score must be smaller than that of
node B. As a result, freshly started node has no chance of being
selected in preference to older node. If several nodes start
simultaneously, among them an older node might temporarily receive
lower score than a younger one, but that is neglectable.

Also remove any action on demote or demote notification - all of
these duplicate actions done in stop or stop notification. With these
removed, changing master on a running cluster does not affect RabbitMQ
cluster in any way - we just declare another node master and that is
it. It is important for the current change because master score might
change after initial cluster start up causing master migration from
one node to another.

This fix is a prerequsite for fix to Fuel bugs
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894
Previous split brain logic worked as follows: each slave checked
that it is connected to master. If check fails, slave restarts. The
ultimate flaw in that logic is that there is little guarantee that
master is alive at the moment. Moreover, if master dies, it is very
probable that during the next monitor check slaves will detect its
death and restart, causing complete RabbitMQ cluster downtime.

With the new approach master node checks that slaves are connected to
it and orders them to restart if they are not. The check is performed
after master node health check, meaning that at least that node
survives. Also, orders expire in one minute and freshly started node
ignores orders to restart for three minutes to give cluster time to
stabilize.

Also corrected the problem, when node starts and is already clustered.
In that case OCF script forgot to start the RabbitMQ app, causing
subsequent restart. Now we ensure that RabbitMQ app is running.

The two introduced attributes rabbit-start-phase-1-time and
rabbit-ordered-to-restart are made private. In order to allow master
to set node's order to restart, both ocf_update_private_attr and
ocf_get_private_attr signatures are expanded to allow passing
node name.

Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute,
attrd_updater returns empty string instead of "(null)", when an
attribute is not defined on needed node, but is defined on some other
node. Correspondingly changed code to expect empty string, not a
"(null)".

This fix is a fix for Fuel bugs
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894
@dmitrymex
Copy link
Contributor Author

dmitrymex commented Aug 22, 2016

That is a version of commits which passed through several iterations of review in downstream Fuel in https://review.openstack.org/#/c/324646/ and https://review.openstack.org/#/c/324647/.

@bogdando, @binarin : guys, what do you think?

@bogdando
Copy link

We had a long discussion on that topic, I checked the changes against network partitions (by Jepsen's Nemesis) and for all of the cases the cluster was auto recovered. Dmitry also prepared docs update (UML flow changes). So nothing to add from my side, LGTM. Thank you for a hard work, folks!

@michaelklishin michaelklishin self-assigned this Aug 22, 2016
@michaelklishin
Copy link
Member

Thank you!

@michaelklishin michaelklishin merged commit cc8a3c7 into rabbitmq:stable Aug 22, 2016
dmitrymex added a commit to dmitrymex/rabbitmq-website that referenced this pull request Aug 23, 2016
This makes documentation correspond to the current state of the
OCF script after PR
rabbitmq/rabbitmq-server#929
was merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants