From 5febcc6dbba307bf0d74fd535370d2fe26eca9d6 Mon Sep 17 00:00:00 2001
From: Tahlia Richardson <3069029+tahliar@users.noreply.github.com>
Date: Fri, 9 Sep 2022 17:22:33 +1000
Subject: [PATCH 1/7] Move SBD requirements to requirements section
jsc#DOCTEAM-94
---
xml/article_installation.xml | 104 ++++++++++-------------------------
1 file changed, 30 insertions(+), 74 deletions(-)
diff --git a/xml/article_installation.xml b/xml/article_installation.xml
index 91583eb6d..8ac39a3b9 100644
--- a/xml/article_installation.xml
+++ b/xml/article_installation.xml
@@ -99,12 +99,8 @@
Two servers with software as specified in .
- &sys-req-hw-nodes;
-
+
+ &sys-req-hw-nodes;
@@ -113,7 +109,28 @@
Node fencing/&stonith;
- &sys-req-hw-stonith;
+
+ &sys-req-hw-stonith;
+ To use SBD, the following requirements must be met:
+
+
+ The path to the shared storage device must be persistent and
+ consistent across all nodes in the cluster. Use stable device names
+ such as /dev/disk/by-id/dm-uuid-part1-mpath-abcedf12345.
+
+
+
+ The SBD device must not
+ use host-based RAID, LVM2, nor reside on a DRBD* instance.
+
+
+
+
+ For details of how to set up shared storage, refer to the
+
+ &storage_guide; for &sls; &productnumber;.
+
+
@@ -324,10 +341,6 @@
Using SBD as fencing mechanism
-
If you have shared storage, for example, a SAN (Storage Area Network),
@@ -336,69 +349,16 @@
and the external/sbd &stonith; resource agent.
-
- Requirements for SBD
During setup of the first node with crm cluster init, you
can decide whether to use SBD. If yes, you need to enter the path to the shared
storage device. By default, crm cluster init will automatically
create a small partition on the device to be used for SBD.
- To use SBD, the following requirements must be met:
-
-
- The path to the shared storage device must be persistent and
- consistent across all nodes in the cluster. Use stable device names
- such as /dev/disk/by-id/dm-uuid-part1-mpath-abcedf12345.
-
-
-
- The SBD device must not
- use host-based RAID, LVM2, nor reside on a DRBD* instance.
-
-
-
-
-
- For details of how to set up shared storage, refer to the
-
- &storage_guide; for &sls; &productnumber;.
-
-
-
-
-
- Enabling the softdog watchdog for SBD
- In &sls;, watchdog support in the kernel is enabled by default: It ships
+ In &sls;, watchdog support in the kernel is enabled by default: it ships
with several kernel modules that provide hardware-specific
watchdog drivers. The &hasi; uses the SBD daemon as the software component
that feeds the watchdog.
@@ -412,10 +372,11 @@
&important-softdog-limit;
+ Enabling the softdog watchdog for SBD
- Create a persistent, shared storage as described in .
+ Create a persistent, shared storage as described in . Don't need this step if it's a prereq
@@ -427,10 +388,6 @@
&prompt.root;systemctl restart systemd-modules-load
-
Test if the softdog module is loaded correctly:
&prompt.root;lsmod | grep dog
@@ -445,11 +402,10 @@ softdog 16384 1
Usually it boils down to "sbd -d DEV list" and "sbd -d DEV message &node2; test"
- We highly recommend to test the SBD fencing mechanism for proper function
- to prevent a split scenario. Such a test can be done by blocking the &corosync;
+ We highly recommend testing the SBD fencing mechanism for proper function
+ to prevent a split brain scenario. Such a test can be done by blocking the &corosync;
cluster communication.
-
From 98728c6af8b52303dd2f966025ebe9d2b9094936 Mon Sep 17 00:00:00 2001
From: Tahlia Richardson <3069029+tahliar@users.noreply.github.com>
Date: Tue, 13 Sep 2022 16:52:00 +1000
Subject: [PATCH 2/7] Move whole commands inside command tags
See Style Guide chapter 7.4.1 Commands
---
xml/article_installation.xml | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/xml/article_installation.xml b/xml/article_installation.xml
index 8ac39a3b9..5b1970b9d 100644
--- a/xml/article_installation.xml
+++ b/xml/article_installation.xml
@@ -313,7 +313,7 @@
Install it via command line using Zypper:
-&prompt.root;zypper install -t pattern ha_sles
+&prompt.root;zypper install -t pattern ha_sles
@@ -384,14 +384,14 @@
Enable the softdog watchdog:
- &prompt.root;echo softdog > /etc/modules-load.d/watchdog.conf
-&prompt.root;systemctl restart systemd-modules-load
+ &prompt.root;echo softdog > /etc/modules-load.d/watchdog.conf
+&prompt.root;systemctl restart systemd-modules-loadTest if the softdog module is loaded correctly:
- &prompt.root;lsmod | grep dog
-softdog 16384 1
+ &prompt.root;lsmod | grep dog
+softdog 16384 1
@@ -428,7 +428,7 @@ softdog 16384 1
Start the bootstrap script by executing:
- &prompt.root;crm cluster init --name CLUSTERNAME
+ &prompt.root;crm cluster init --name CLUSTERNAMEReplace the CLUSTERNAME
placeholder with a meaningful name, like the geographical location of your
cluster (for example, &cluster1;).
@@ -551,7 +551,7 @@ softdog 16384 1
Secure passwordReplace the default password with a secure one as soon as possible:
- &prompt.root;passwd hacluster
+ &prompt.root;passwd hacluster
@@ -683,7 +683,7 @@ softdog 16384 1
Open a terminal and ping &subnetII;.1,
your virtual IP address:
- &prompt.root;ping &subnetII;.1
+ &prompt.root;ping &subnetII;.1
@@ -805,7 +805,7 @@ Active resources:
stonith-sbd (stonith:external/sbd): Started &node1;
-&prompt.root;&cmd.test.script; --fence-node &node2;
+&prompt.root;&cmd.test.script; --fence-node &node2;
==============================================
Testcase: Fence node &node2;
From 4f3063b930efcabe3d31ed2205f146cf26ed1b98 Mon Sep 17 00:00:00 2001
From: Tahlia Richardson <3069029+tahliar@users.noreply.github.com>
Date: Wed, 14 Sep 2022 17:36:35 +1000
Subject: [PATCH 3/7] Adjusted node fencing/stonith requirements section
---
xml/article_installation.xml | 30 +++++++++++++++++-------------
xml/phrases-decl.ent | 4 ++--
2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/xml/article_installation.xml b/xml/article_installation.xml
index 5b1970b9d..8d09f03cc 100644
--- a/xml/article_installation.xml
+++ b/xml/article_installation.xml
@@ -110,9 +110,21 @@
Node fencing/&stonith;
- &sys-req-hw-stonith;
- To use SBD, the following requirements must be met:
+
+ A node fencing (&stonith;) device to avoid split brain scenarios. This can
+ be either a physical device (a power switch) or a mechanism like SBD
+ (&stonith; by disk) in combination with a watchdog.
+
+ This document describes using SBD for node fencing. To use SBD, the
+ following requirements must be met:
+
+
+ Shared storage. For information on setting up shared storage, see the
+
+ &storage_guide; for &sls;.
+
+ The path to the shared storage device must be persistent and
consistent across all nodes in the cluster. Use stable device names
@@ -126,9 +138,8 @@
- For details of how to set up shared storage, refer to the
-
- &storage_guide; for &sls; &productnumber;.
+ For more information on &stonith;, see .
+ For more information on SBD, see .
@@ -340,8 +351,7 @@
- Using SBD as fencing mechanism
-
+ Using SBD for node fencing
If you have shared storage, for example, a SAN (Storage Area Network),
you can use it to avoid split brain scenarios. To do so, configure SBD
@@ -373,12 +383,6 @@
Enabling the softdog watchdog for SBD
-
-
- Create a persistent, shared storage as described in . Don't need this step if it's a prereq
-
-
Enable the softdog watchdog:
diff --git a/xml/phrases-decl.ent b/xml/phrases-decl.ent
index 93662ebbf..e7de26771 100644
--- a/xml/phrases-decl.ent
+++ b/xml/phrases-decl.ent
@@ -384,7 +384,7 @@
connection). A fencing mechanism isolates the node in question
(usually by resetting or powering off the node). This is also called
&stonith; (Shoot the other node in the head). A node fencing
- mechanism can be either a physical device (a power switch) or a mechanism
+ mechanism can be either a physical device (a power switch) or a mechanism
like SBD (&stonith; by disk) in combination with a watchdog. Using SBD
requires shared storage.
" >
@@ -640,7 +640,7 @@
working even if all CPUs are stuck.
Before using the cluster in a production environment, we highly
- recommend to replace the softdog module with the
+ recommend replacing the softdog module with the
hardware module that best fits your hardware.
However, if no watchdog matches your hardware,
From f62f40b691bda54b3c1f1d6b7e976e74c1efe443 Mon Sep 17 00:00:00 2001
From: Tahlia Richardson <3069029+tahliar@users.noreply.github.com>
Date: Thu, 15 Sep 2022 17:41:03 +1000
Subject: [PATCH 4/7] Streamline SBD section in quickstart
Also removed the testing para as there's a whole testing section
---
xml/article_installation.xml | 42 ++++++++++--------------------------
1 file changed, 11 insertions(+), 31 deletions(-)
diff --git a/xml/article_installation.xml b/xml/article_installation.xml
index 8d09f03cc..5cab6ea31 100644
--- a/xml/article_installation.xml
+++ b/xml/article_installation.xml
@@ -120,7 +120,7 @@
- Shared storage. For information on setting up shared storage, see the
+ A shared storage device. For information on setting up shared storage, see the
&storage_guide; for &sls;.
@@ -353,23 +353,8 @@
Using SBD for node fencing
- If you have shared storage, for example, a SAN (Storage Area Network),
- you can use it to avoid split brain scenarios. To do so, configure SBD
- as node fencing mechanism. SBD uses watchdog support
- and the external/sbd &stonith; resource agent.
-
-
-
- During setup of the first node with crm cluster init, you
- can decide whether to use SBD. If yes, you need to enter the path to the shared
- storage device. By default, crm cluster init will automatically
- create a small partition on the device to be used for SBD.
-
-
-
-
- In &sls;, watchdog support in the kernel is enabled by default: it ships
- with several kernel modules that provide hardware-specific
+ Before you can configure SBD with the bootstrap script, you must enable a watchdog.
+ &sls; ships with several kernel modules that provide hardware-specific
watchdog drivers. The &hasi; uses the SBD daemon as the software component
that feeds the watchdog.
@@ -398,18 +383,6 @@
softdog 16384 1
-
- toms 2018-04-05: we need to add a bit more info here how you do
- the tests and what to do when it fails.
- However, this needs some further info from our developers. Some info can
- be found in ha_storage_protection.xml.
- Usually it boils down to "sbd -d DEV list" and "sbd -d DEV message &node2; test"
-
-
- We highly recommend testing the SBD fencing mechanism for proper function
- to prevent a split brain scenario. Such a test can be done by blocking the &corosync;
- cluster communication.
-
@@ -444,7 +417,7 @@ softdog 16384 1
communication, use the option (or ).
- The scripts checks for NTP configuration and a hardware watchdog service.
+ The script checks for NTP configuration and a hardware watchdog service.
It generates the public and private SSH keys used for SSH access and
&csync; synchronization and starts the respective services.
@@ -480,6 +453,7 @@ softdog 16384 1
Enter a persistent path to the partition of your block device that
you want to use for SBD, see .
The path must be consistent across all nodes in the cluster.
+ The script creates a small partition on the device to be used for SBD.
@@ -673,6 +647,12 @@ softdog 16384 1
to Lars Pinnes iptable rules
-->
+
Testing resource failover
From 8def1199e3e15f27b10efbd2ebfc3e6dabbd0f24 Mon Sep 17 00:00:00 2001
From: Tahlia Richardson <3069029+tahliar@users.noreply.github.com>
Date: Fri, 16 Sep 2022 16:37:11 +1000
Subject: [PATCH 5/7] Restructure testing chapter intro
---
xml/article_installation.xml | 52 +++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 24 deletions(-)
diff --git a/xml/article_installation.xml b/xml/article_installation.xml
index 5cab6ea31..627151d3e 100644
--- a/xml/article_installation.xml
+++ b/xml/article_installation.xml
@@ -632,34 +632,39 @@ softdog 16384 1
Testing the cluster
- is a simple test to check if the
- cluster moves the virtual IP address to the other node if the node
- that currently runs the resource is set to standby.
+ The following tests can help you identify issues with the cluster setup.
+ However, a realistic test involves specific use cases and scenarios.
+ Before using the cluster in a production environment, test it thoroughly
+ according to your use cases.
- However, a realistic test involves specific use cases and scenarios,
- including testing of your fencing mechanism to avoid a split brain
- situation. If you have not set up your fencing mechanism correctly, the cluster
- will not work properly.
- Before using the cluster in a production environment, test it thoroughly
- according to your use cases or by using the &cmd.test.script;
- command.
-
-
-
+
+
+
+ The command sbd -d DEVICE_NAME list
+ lists all of the nodes that are visible to SBD. For the setup described in
+ this document, the output should show both &node1;
+ and &node2;.
+
+
+
+
+ is a simple test
+ to check if the cluster moves the virtual IP address to the other node if
+ the node that currently runs the resource is set to standby.
+
+
+
+
+ simulates cluster
+ failures and reports the results.
+
+
+ Testing resource failover
As a quick test, the following procedure checks on resource failovers:
- toms 2016-07-27: Fate#321073
- Tool for Standardize Testing of Basic Cluster FunctionalityTesting resource failover
@@ -682,8 +687,7 @@ softdog 16384 1
Resources
,
check which node the virtual IP address (resource
- admin_addr) is running on.
+ admin_addr) is running on.
We assume the resource is running on &node1;.
From 837b91aa5cb5c85a6adde9723a9a008833e4e5b6 Mon Sep 17 00:00:00 2001
From: Tahlia Richardson <3069029+tahliar@users.noreply.github.com>
Date: Fri, 16 Sep 2022 17:56:19 +1000
Subject: [PATCH 6/7] Apply some minimalism
Also changed the link for installing extensions as the old one is no
longer valid in 15SP4
---
xml/article_installation.xml | 113 ++++++++++++-----------------------
1 file changed, 38 insertions(+), 75 deletions(-)
diff --git a/xml/article_installation.xml b/xml/article_installation.xml
index 627151d3e..491762920 100644
--- a/xml/article_installation.xml
+++ b/xml/article_installation.xml
@@ -48,10 +48,9 @@
- A floating, virtual IP address (&subnetII;.1) which
- allows clients to connect to the service no matter which physical node it
- is running on.
+ A floating, virtual IP address (&subnetII;.1)
+ that allows clients to connect to the service no matter which node it is running on.
+ This IP address is used to connect to the graphical management tool &hawk2;.
@@ -66,13 +65,6 @@
-
- After setup of the cluster with the bootstrap scripts, we will monitor
- the cluster with the graphical &hawk2;. It is one of the cluster management
- tools included with &productnamereg;. As a basic test of whether failover of resources
- works, we will put one of the nodes into standby mode and check if the
- virtual IP address is migrated to the second node.
-
You can use the two-node cluster for testing purposes or as a minimal
cluster configuration that you can extend later on. Before using the
@@ -301,29 +293,21 @@
- Installing &productname;
-
- The packages for configuring and managing a cluster with the
- &hasi; are included in the &ha; installation
- pattern (named sles_ha on the command line).
- This pattern is only available after &productname; has been
- installed as an extension to &slsreg;.
-
+ Installing the &productname;
- For information on how to install
- extensions, see the
- &deploy; for &sls; &productnumber;.
+ The packages for configuring and managing a cluster
+ are included in the &ha; installation pattern.
+ This pattern is only available after the &productname; has been installed.
+ For information on how to install extensions, see the
+
+ &modulesquick;.
Installing the &ha; pattern
-
- If the pattern is not installed yet, proceed as follows:
-
- Install it via command line using Zypper:
+ Install the &ha; pattern via command line using Zypper:
&prompt.root;zypper install -t pattern ha_sles
@@ -342,7 +326,7 @@
- Register the machines at &scc;. Find more information in the
&upgrade_guide; for &sls; &productnumber;.
@@ -403,7 +387,7 @@ softdog 16384 1
- Start the bootstrap script by executing:
+ Start the bootstrap script:
&prompt.root;crm cluster init --name CLUSTERNAMEReplace the CLUSTERNAME
@@ -413,7 +397,7 @@ softdog 16384 1
as it simplifies the identification of a site.
- If you need multicast instead of unicast (the default) for your cluster
+ If you need to use multicast instead of unicast (the default) for your cluster
communication, use the option (or ).
@@ -444,23 +428,21 @@ softdog 16384 1
- Set up SBD as node fencing mechanism:
+ Set up SBD as the node fencing mechanism:
Confirm with y that you want to use SBD.Enter a persistent path to the partition of your block device that
- you want to use for SBD, see .
+ you want to use for SBD.
The path must be consistent across all nodes in the cluster.The script creates a small partition on the device to be used for SBD.
- Configure a virtual IP address for cluster administration with
- &hawk2;. (We will use this virtual IP resource for testing successful
- failover later on).
+ Configure a virtual IP address for cluster administration with &hawk2;:Confirm with y that you want to configure a
@@ -499,11 +481,8 @@ softdog 16384 1
cookies are enabled.
- As URL, enter the IP address or host name of any cluster node running
- the &hawk; Web service. Alternatively, enter the address of the virtual
- IP address that you configured in
- of :
- https://HAWKSERVER:7630/
+ As URL, enter the virtual IP address that you configured with the bootstrap script:
+ https://192.168.2.1:7630/Certificate warning If a certificate warning appears when you try to access the URL for
@@ -513,16 +492,12 @@ softdog 16384 1
certificate. To proceed anyway, you can add an exception in the browser to bypass
the warning.
-
On the &hawk2; login screen, enter the
Username and Password of the
- user that has been created during the bootstrap procedure (user hacluster, password
linux).
@@ -534,9 +509,8 @@ softdog 16384 1
- Click Log In. After login, the &hawk2; Web interface
- shows the Status screen by default, displaying the current cluster
- status at a glance:
+ Click Log In. The &hawk2; Web interface
+ shows the Status screen by default:
Status of the one-node cluster in &hawk2;
@@ -553,37 +527,31 @@ softdog 16384 1
Adding the second node
- If you have a one-node cluster up and running, add the second cluster
- node with the crm cluster join bootstrap
- script, as described in .
+ Add a second node to the cluster with the crm cluster join
+ bootstrap script.
The script only needs access to an existing cluster node and
will complete the basic setup on the current machine automatically.
- For details, refer to the crm cluster join man page.
- The bootstrap scripts take care of changing the configuration specific to
- a two-node cluster, for example, SBD, &corosync;.
+ For more information, see the crm cluster join man page.
Adding the second node (&node2;) with
crm cluster join
- Log in as &rootuser; to the physical or virtual machine supposed to
- join the cluster.
+ Log in as &rootuser; to the physical or virtual machine you want to add to the cluster.
- Start the bootstrap script by executing:
+ Start the bootstrap script:
&prompt.root;crm cluster join
If NTP has not been configured to start at boot time, a message
- appears. The script also checks for a hardware watchdog device (which
- is important in case you want to configure SBD). You are warned if none
- is present.
+ appears. The script also checks for a hardware watchdog device.
+ You are warned if none is present.
@@ -596,15 +564,15 @@ softdog 16384 1
- If you have not already configured a passwordless SSH access between
+ If you have not already configured passwordless SSH access between
both machines, you will be prompted for the &rootuser; password
of the existing node.
After logging in to the specified node, the script will copy the
- &corosync; configuration, configure SSH, &csync;, and will
- bring the current machine online as new cluster node. Apart from that,
- it will start the service needed for &hawk2;.
@@ -615,8 +583,7 @@ softdog 16384 1
Check the cluster status in &hawk2;. Under StatusNodes
- you should see two nodes with a green status (see
- ).
+ you should see two nodes with a green status:
@@ -676,25 +643,21 @@ softdog 16384 1
- Log in to your cluster as described in .
+ Log in to &hawk2;.
- In &hawk2;
- Status
- Resources
- ,
+ Under StatusResources,
check which node the virtual IP address (resource
admin_addr) is running on.
- We assume the resource is running on &node1;.
+ This procedure assumes the resource is running on &node1;.
Put &node1; into
- Standby mode (see ).
+ Standby mode:
Node &node1; in standby mode
From 6c7bd6cb99d239e941795bab39d5a1c754e3845c Mon Sep 17 00:00:00 2001
From: Tahlia Richardson <3069029+tahliar@users.noreply.github.com>
Date: Wed, 21 Sep 2022 10:53:23 +1000
Subject: [PATCH 7/7] split brain -> split-brain
Co-authored-by: Daria Vladykina
---
xml/article_installation.xml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xml/article_installation.xml b/xml/article_installation.xml
index 491762920..ca7d58703 100644
--- a/xml/article_installation.xml
+++ b/xml/article_installation.xml
@@ -103,7 +103,7 @@
Node fencing/&stonith;
- A node fencing (&stonith;) device to avoid split brain scenarios. This can
+ A node fencing (&stonith;) device to avoid split-brain scenarios. This can
be either a physical device (a power switch) or a mechanism like SBD
(&stonith; by disk) in combination with a watchdog.