Skip to content

Commit 2b3ee71

Browse files
committed
documentation improvements from code review, changelog
1 parent 9df4858 commit 2b3ee71

File tree

3 files changed

+96
-57
lines changed

3 files changed

+96
-57
lines changed

.changelog/11572.txt

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
```release-note:improvement
2+
raft: The default raft protocol version is now 3.
3+
```
4+
5+
```release-note:deprecation
6+
Raft protocol version 2 is deprecated and will be removed in Nomad 1.4.0.
7+
```

website/content/docs/upgrade/index.mdx

+80
Original file line numberDiff line numberDiff line change
@@ -153,3 +153,83 @@ differences may require specific steps.
153153
[node-status]: /docs/commands/node/status
154154
[server-members]: /docs/commands/server/members
155155
[upgrade-specific]: /docs/upgrade/upgrade-specific
156+
157+
## Upgrading to Raft Protocol 3
158+
159+
This section provides details on upgrading to Raft Protocol 3. Raft
160+
protocol version 3 requires Nomad running 0.8.0 or newer on all
161+
servers in order to work. Raft protocol version 2 will be removed in
162+
Nomad 1.4.0.
163+
164+
To see the version of the Raft protocol in use on each server, use the
165+
`nomad operator raft list-peers` command.
166+
167+
Note that the format of `peers.json` used for outage recovery is
168+
different when running with the latest Raft protocol. See [Manual
169+
Recovery Using
170+
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
171+
for a description of the required format.
172+
173+
When using Raft protocol version 3, servers are identified by their
174+
`node-id` instead of their IP address when Nomad makes changes to its
175+
internal Raft quorum configuration. This means that once a cluster has
176+
been upgraded with servers all running Raft protocol version 3, it
177+
will no longer allow servers running any older Raft protocol versions
178+
to be added.
179+
180+
### Upgrading a Production Cluster to Raft Version 3
181+
182+
For production raft clusters with 3 or more memebrs, the easiest way
183+
to upgrade servers is to have each server leave the cluster, upgrade
184+
its [`raft_protocol`] version in the `server` stanza, and then add it
185+
back. Make sure the new server joins successfully and that the cluster
186+
is stable before rolling the upgrade forward to the next server. It's
187+
also possible to stand up a new set of servers, and then slowly stand
188+
down each of the older servers in a similar fashion.
189+
190+
For in-place raft protocol upgrades, perform the following for each
191+
server, leaving the leader until last to reduce the chance of leader
192+
elections that will slow down the process:
193+
194+
* Stop the server
195+
* Run `nomad server force-leave $server_name`
196+
* Update the `raft_protocol` in the server's configuration file to 3.
197+
* Restart the server
198+
* Run `nomad operator raft list-peers` to verify that the `raft_vsn`
199+
for the server is now 3.
200+
* On the server, run `nomad agent-info` and check that the
201+
`last_log_index` is of a similar value to the other servers. This
202+
step ensures that raft is healthy and changes are replicating to the
203+
new server.
204+
205+
### Upgrading a Single Server Cluster to Raft Version 3
206+
207+
If you are running a single Nomad server, restarting it in-place will
208+
result in that server not being able to elect itself as a leader. To
209+
avoid this, create a new [`raft.peers`][peers-json] file before
210+
restarting the server with the new configuration. If you have `jq`
211+
installed you can run the following script on the server's host to
212+
write the correct `raft.peers` file:
213+
214+
```
215+
#!/usr/bin/env bash
216+
217+
NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
218+
NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
219+
NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")
220+
221+
cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
222+
[
223+
{
224+
"id": "$NODE_ID",
225+
"address": "$NOMAD_ADDR",
226+
"non_voter": false
227+
}
228+
]
229+
EOF
230+
```
231+
232+
After running this script, update the `raft_protocol` in the server's
233+
configuration to 3 and restart the server.
234+
235+
[peers-json]: https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson

website/content/docs/upgrade/upgrade-specific.mdx

+9-57
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,15 @@ used to document those details separately from the standard upgrade flow.
1515

1616
## Nomad 1.3.0
1717

18-
#### Default Raft Protocol Version
18+
#### Raft Protocol Version 2 Deprecation
1919

20-
In Nomad 1.3.0, the default raft protocol version has been updated
21-
to 3. If the [`raft_protocol_version`] is not explicitly set,
22-
upgrading a server will automatically upgrade that server's raft
23-
protocol. See the [Upgrading to Raft Protocol 3] guide below.
20+
Raft protocol version 2 will be removed from Nomad in the next major
21+
release of Nomad, 1.4.0.
22+
23+
In Nomad 1.3.0, the default raft protocol version has been updated to
24+
3. If the [`raft_protocol_version`] is not explicitly set, upgrading a
25+
server will automatically upgrade that server's raft protocol. See the
26+
[Upgrading to Raft Protocol 3] guide.
2427

2528
## Nomad 1.2.2
2629

@@ -973,57 +976,6 @@ In order to enable all
973976
servers in a Nomad cluster must be running with Raft protocol version 3 or
974977
later.
975978

976-
#### Upgrading to Raft Protocol 3
977-
978-
This section provides details on upgrading to Raft Protocol 3 in Nomad 0.8 and
979-
higher. Raft protocol version 3 requires Nomad running 0.8.0 or newer on all
980-
servers in order to work. See [Raft Protocol Version
981-
Compatibility](/docs/upgrade/upgrade-specific#raft-protocol-version-compatibility)
982-
for more details. Also the format of `peers.json` used for outage recovery is
983-
different when running with the latest Raft protocol. See [Manual Recovery Using
984-
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
985-
for a description of the required format.
986-
987-
Please note that the Raft protocol is different from Nomad's internal protocol
988-
as shown in commands like `nomad server members`. To see the version of the Raft
989-
protocol in use on each server, use the `nomad operator raft list-peers`
990-
command.
991-
992-
When using Raft protocol version 3, servers are identified by their `node-id`
993-
instead of their IP address when Nomad makes changes to its internal Raft quorum
994-
configuration. This means that once a cluster has been upgraded with servers all
995-
running Raft protocol version 3, it will no longer allow servers running any
996-
older Raft protocol versions to be added.
997-
998-
~> **Warning:** If you are running a single Nomad server, restarting it
999-
in-place will result in that server not being able to elect itself as
1000-
a leader. To avoid this, either set the Raft protocol back to 2, or
1001-
use [Manual Recovery Using
1002-
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
1003-
to map the server to its node ID in the Raft quorum configuration.
1004-
1005-
The easiest way to upgrade servers is to have each server leave the cluster,
1006-
upgrade its [`raft_protocol`] version in the `server` stanza, and then add it
1007-
back. Make sure the new server joins successfully and that the cluster is stable
1008-
before rolling the upgrade forward to the next server. It's also possible to
1009-
stand up a new set of servers, and then slowly stand down each of the older
1010-
servers in a similar fashion.
1011-
1012-
For in-place raft protocol upgrades, perform the following for each
1013-
server, leaving the leader until last to reduce the chance of leader
1014-
elections that will slow down the process:
1015-
1016-
* Stop the server
1017-
* Run `nomad server force-leave $server_name`
1018-
* Update the `raft_protocol` in the server's configuration file to 3.
1019-
* Restart the server
1020-
* Run `nomad operator raft list-peers` to verify that the `raft_vsn`
1021-
for the server is now 3.
1022-
* On the server, run `nomad agent-info` and check that the
1023-
`last_log_index` is of a similar value to the other servers. This
1024-
step ensures that raft is healthy and changes are replicating to the
1025-
new server.
1026-
1027979
### Node Draining Improvements
1028980

1029981
Node draining via the [`node drain`][drain-cli] command or the [drain
@@ -1243,4 +1195,4 @@ deleted and then Nomad 0.3.0 can be launched.
12431195
[cap_add_exec]: /docs/drivers/exec#cap_add
12441196
[cap_drop_exec]: /docs/drivers/exec#cap_drop
12451197
[`log_file`]: /docs/configuration#log_file
1246-
[Upgrading to Raft Protocol 3]: /docs/upgrade/upgrade-specific#upgrading-to-raft-protocol-3
1198+
[Upgrading to Raft Protocol 3]: /docs/upgrade#upgrading-to-raft-protocol-3

0 commit comments

Comments
 (0)