[OCF HA] Change master score computation & split-brain detection logic #929

dmitrymex · 2016-08-22T11:35:30Z

The proposed two commits fix the following 2 bugs opened in Fuel downstream:
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894

For the explanation, please see the commit messages.

Right now we assign 1000 to the oldest nodes and 1 to others. That creates a problem when Master restarts and no node is promoted until that node starts back. In that case the returned node will have score of 1, like all other slaves and Pacemaker will select to promote it again. The node is clean empty and afterwards other slaves join to it, wiping their data as well. As a result, we loose all the messages. The new algorithm actually ranks nodes, not just selects the oldest one. It also maintains the invariant that if node A started later than node B, then node A score must be smaller than that of node B. As a result, freshly started node has no chance of being selected in preference to older node. If several nodes start simultaneously, among them an older node might temporarily receive lower score than a younger one, but that is neglectable. Also remove any action on demote or demote notification - all of these duplicate actions done in stop or stop notification. With these removed, changing master on a running cluster does not affect RabbitMQ cluster in any way - we just declare another node master and that is it. It is important for the current change because master score might change after initial cluster start up causing master migration from one node to another. This fix is a prerequsite for fix to Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894

Previous split brain logic worked as follows: each slave checked that it is connected to master. If check fails, slave restarts. The ultimate flaw in that logic is that there is little guarantee that master is alive at the moment. Moreover, if master dies, it is very probable that during the next monitor check slaves will detect its death and restart, causing complete RabbitMQ cluster downtime. With the new approach master node checks that slaves are connected to it and orders them to restart if they are not. The check is performed after master node health check, meaning that at least that node survives. Also, orders expire in one minute and freshly started node ignores orders to restart for three minutes to give cluster time to stabilize. Also corrected the problem, when node starts and is already clustered. In that case OCF script forgot to start the RabbitMQ app, causing subsequent restart. Now we ensure that RabbitMQ app is running. The two introduced attributes rabbit-start-phase-1-time and rabbit-ordered-to-restart are made private. In order to allow master to set node's order to restart, both ocf_update_private_attr and ocf_get_private_attr signatures are expanded to allow passing node name. Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute, attrd_updater returns empty string instead of "(null)", when an attribute is not defined on needed node, but is defined on some other node. Correspondingly changed code to expect empty string, not a "(null)". This fix is a fix for Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894

dmitrymex · 2016-08-22T11:37:47Z

That is a version of commits which passed through several iterations of review in downstream Fuel in https://review.openstack.org/#/c/324646/ and https://review.openstack.org/#/c/324647/.

@bogdando, @binarin : guys, what do you think?

bogdando · 2016-08-22T12:55:39Z

We had a long discussion on that topic, I checked the changes against network partitions (by Jepsen's Nemesis) and for all of the cases the cluster was auto recovered. Dmitry also prepared docs update (UML flow changes). So nothing to add from my side, LGTM. Thank you for a hard work, folks!

michaelklishin · 2016-08-22T14:31:36Z

Thank you!

This makes documentation correspond to the current state of the OCF script after PR rabbitmq/rabbitmq-server#929 was merged.

dmitrymex added 2 commits August 22, 2016 14:27

michaelklishin self-assigned this Aug 22, 2016

michaelklishin merged commit cc8a3c7 into rabbitmq:stable Aug 22, 2016

dmitrymex added a commit to dmitrymex/rabbitmq-website that referenced this pull request Aug 23, 2016

Update images and notes in Pacemaker guide

fddf13a

This makes documentation correspond to the current state of the OCF script after PR rabbitmq/rabbitmq-server#929 was merged.

dmitrymex mentioned this pull request Aug 23, 2016

Update images and notes in Pacemaker guide rabbitmq/rabbitmq-website#247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OCF HA] Change master score computation & split-brain detection logic #929

[OCF HA] Change master score computation & split-brain detection logic #929

dmitrymex commented Aug 22, 2016

dmitrymex commented Aug 22, 2016 •

edited

Loading

bogdando commented Aug 22, 2016

michaelklishin commented Aug 22, 2016

[OCF HA] Change master score computation & split-brain detection logic #929

[OCF HA] Change master score computation & split-brain detection logic #929

Conversation

dmitrymex commented Aug 22, 2016

dmitrymex commented Aug 22, 2016 • edited Loading

bogdando commented Aug 22, 2016

michaelklishin commented Aug 22, 2016

dmitrymex commented Aug 22, 2016 •

edited

Loading