harvester · w13915984028 · Aug 6, 2024 · Aug 26, 2024 · Aug 28, 2024
diff --git a/docs/advanced/storageclass.md b/docs/advanced/storageclass.md
@@ -44,6 +44,12 @@ The number of replicas created for each volume in Longhorn. Defaults to `3`.
 
 ![](/img/v1.2/storageclass/create_storageclasses_replicas.png)
 
+:::info important
+
+When the value is `1`, the created volume from this `StorageClass` has only one replica, it may block the [Node Maintenance](../host/host.md#node-maintenance), check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes).
+
+:::
+
 #### Stale Replica Timeout
 
 Determines when Longhorn would clean up an error replica after the replica's status is ERROR. The unit is minute. Defaults to `30` minutes in Harvester.

diff --git a/docs/host/host.md b/docs/host/host.md
@@ -21,10 +21,36 @@ Because Harvester is built on top of Kubernetes and uses etcd as its database, t
 
 ## Node Maintenance
 
-For admin users, you can click **Enable Maintenance Mode** to evict all VMs from a node automatically. It will leverage the `VM live migration` feature to migrate all VMs to other nodes automatically. Note that at least two active nodes are required to use this feature.
+Migrating or shutting down workloads (and in some cases, also shutting down the underlying node) may be necessary during the following activities:
+
+- Replacing, adding, and removing hardware
+
+- Changing network settings
+
+- Troubleshooting issues
+
+- Removing a node from the cluster
+
+If your cluster has two or more active nodes, you can enable **Maintenance Mode** on nodes that are affected by the planned changes. Maintenance Mode runs a series of checks and leverages **Live Migration** functionality to automatically migrate all VMs to other nodes.
+
+You can enable Maintenance Mode on the **Hosts** screen of the Harvester UI. Select the target node, and then select **⋮** > **Enable Maintenance Mode**.
 
 ![node-maintenance.png](/img/v1.2/host/node-maintenance.png)
 
+After some time, the state of the node changes to *Maintenance*.
+
+![node-enter-maintenance-mode.png](/img/v1.3/troubleshooting/node-enter-maintenance-mode.png)
+
+:::info important
+
+Check the list of [known limitations and workarounds](../troubleshooting/host.md#node-in-maintenance-mode-becomes-stuck-in-cordoned-state) before enabling Maintenance Mode and whenever you encounter related issues.
+
+Volumes that are [manually attached](../troubleshooting/host.md#manually-attached-volumes) to the node may prevent you from enabling Maintenance Mode.
+
+If you have any single-replica volume, it may block the `Node Maintenance`, check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes).
+
+:::
+
 ## Cordoning a Node
 
 Cordoning a node marks it as unschedulable. This feature is useful for performing short tasks on the node during small maintenance windows, like reboots, upgrades, or decommissions. When you’re done, power back on and make the node schedulable again by uncordoning it.
@@ -42,6 +68,8 @@ Before removing a node from a Harvester cluster, determine if the remaining node
 
 If the remaining nodes do not have enough resources, VMs might fail to migrate and volumes might degrade when you remove a node.
 
+To ensure that [single-replica](../advanced/storageclass.md#number-of-replicas) volumes can be restored or rebuilt after the node is deleted, either back up those volumes or redeploy the related workloads to other nodes in advance so that the volumes are scheduled to other nodes.
+
 :::
 
 ### 1. Check if the node can be removed from the cluster.
@@ -522,4 +550,4 @@ status:
 ```
 
 The `harvester-node-manager` pod(s) in the `harvester-system` namespace may also contain some hints as to why it is not rendering a file to a node.
-This pod is part of a daemonset, so it may be worth checking the pod that is running on the node of interest.
+This pod is part of a daemonset, so it may be worth checking the pod that is running on the node of interest.
diff --git a/docs/troubleshooting/host.md b/docs/troubleshooting/host.md
@@ -0,0 +1,139 @@
+---
+sidebar_position: 6
+sidebar_label: Host
+title: "Host"
+---
+
+<head>
+  <link rel="canonical" href="https://docs.harvesterhci.io/v1.3/troubleshooting/host"/>
+</head>
+
+## Node in Maintenance Mode Becomes Stuck in Cordoned State
+
+When you enable `Maintenance Mode` on a node using the Harvester UI, the node becomes stuck in the `Cordoned` state and the menu shows the **Enable Maintenance Mode** option instead of **Disable Maintenance Mode**.
+
+![node-stuck-cordoned.png](/img/v1.3/troubleshooting/node-stuck-cordoned.png)
+
+The Harvester pod logs contain messages similar to the following:
+
+```
+time="2024-08-05T19:03:02Z" level=info msg="evicting pod longhorn-system/instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7"
+time="2024-08-05T19:03:02Z" level=info msg="error when evicting pods/\"instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7\" -n \"longhorn-system\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget."
+
+time="2024-08-05T19:03:07Z" level=info msg="evicting pod longhorn-system/instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7"
+time="2024-08-05T19:03:07Z" level=info msg="error when evicting pods/\"instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7\" -n \"longhorn-system\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget."
+
+time="2024-08-05T19:03:12Z" level=info msg="evicting pod longhorn-system/instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7"
+time="2024-08-05T19:03:12Z" level=info msg="error when evicting pods/\"instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7\" -n \"longhorn-system\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget."
+```
+
+The Longhorn Instance Manager uses a PodDisruptionBudget (PDB) to protect itself from accidental eviction, which results in loss of volume data. When the above error occurs, it indicates that the `instance-manager` pod is still serving volumes or replicas.
+
+The following sections describe the known causes and their corresponding workarounds.
+
+### Manually Attached Volumes
+
+A volume that is attached to a node using the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards) can cause the error. This is because the object is attached to a node name instead of the pod name.
+
+You can check it from the embedded Longhorn UI.
+
+![attached-volume.png](/img/v1.3/troubleshooting/attached-volume.png)
+
+You can also use the CLI to retrieve the details of the CRD object `VolumeAttachment`.
+
+Example of a volume that was attached using the Longhorn UI:
+
+```
+- apiVersion: longhorn.io/v1beta2
+  kind: VolumeAttachment
+...
+  spec:
+    attachmentTickets:
+      longhorn-ui:
+        id: longhorn-ui
+        nodeID: node-name
+...
+    volume: pvc-9b35136c-f59e-414b-aa55-b84b9b21ff89
+```
+
+Example of a volume that was attached using the Longhorn CSI driver:
+
+```
+- apiVersion: longhorn.io/v1beta2
+  kind: VolumeAttachment
+  spec:
+    attachmentTickets:
+      csi-b5097155cddde50b4683b0e659923e379cbfc3873b5b2ee776deb3874102e9bf:
+        id: csi-b5097155cddde50b4683b0e659923e379cbfc3873b5b2ee776deb3874102e9bf
+        nodeID: node-name
+...
+    volume: pvc-3c6403cd-f1cd-4b84-9b46-162f746b9667
+```
+
+:::note
+
+Manually attaching a volume to the node is not recommended.
+
+Harvester automatically attaches and detaches volumes during operations such as VM creation and migration.
+
+:::
+
+#### Workaround 1: Set `Detach Manually Attached Volumes When Cordoned` to `True`
+
+The Longhorn setting [Detach Manually Attached Volumes When Cordoned](https://longhorn.io/docs/1.6.0/references/settings/#detach-manually-attached-volumes-when-cordoned) blocks node draining when there are volumes manually attached to the node.
+
+The default value of this setting depends on the embedded Longhorn version:
+
+| Harvester version | Embedded Longhorn version | Default value |
+| --- | --- | --- |
+| v1.3.1 | v1.6.0 | `true` |
+| v1.4.0 | v1.7.0 | `false` |
+
+Set the value to `true` using the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards).
+
+#### Workaround 2: Manually Detach the Volume
+
+Detach the volume using the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards).
+
+![detached-volume.png](/img/v1.3/troubleshooting/detached-volume.png)
+
+Once the volume is detached, you can successfully enable Maintenance Mode on the node.
+
+![node-enter-maintenance-mode.png](/img/v1.3/troubleshooting/node-enter-maintenance-mode.png)
+
+### Single-Replica Volumes
+
+Harvester allows you to create customized StorageClasses that describe how Longhorn must provision volumes. If necessary, you can create a StorageClass with the [Number of Replicas](../advanced/storageclass.md#number-of-replicas) parameter set to `1`.
+
+When a volume is created using such a StorageClass and is attached to a node using the CSI driver or other methods, the single replica stays on that node even after the volume is detached.
+
+You can check this using the CRD object `Volume`.
+
+```
+- apiVersion: longhorn.io/v1beta2
+  kind: Volume
+...
+  spec:
+...
+    numberOfReplicas: 1  // the replica number
+...
+  status:
+...
+    ownerID: nodeName
+...
+    state: attached
+```
+
+#### Workaround: Set `Node Drain Policy`
+
+The Longhorn [Node Drain Policy](https://longhorn.io/docs/1.6.0/references/settings/#node-drain-policy) is set to `block-if-contains-last-replica` by default. This option forces Longhorn to block node draining when the node contains the last healthy replica of a volume.
+
+To address the issue, change the value to `allow-if-replica-is-stopped` using the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards).
+
+:::info important
+
+If you plan to remove the node after Maintenance Mode is enabled, backup those single-replica volumes or redeploy the related workloads to other nodes in advance so that the volumes are scheduled to other nodes.
+
+:::
+
+Starting with Harvester v1.4.0, the Node Drain Policy is set to `allow-if-replica-is-stopped` by default.
diff --git a/docs/volume/create-volume.md b/docs/volume/create-volume.md
@@ -29,6 +29,14 @@ description: Create a volume from the Volume page.
 
 ![create-empty-volume](/img/v1.2/volume/create-empty-volume.png)
 
+:::info important
+
+Harvester automatically attaches and detaches volumes during operations such as VM creation and migration.
+
+Manually attaching a volume to the node is not recommended because it may prevent you from enabling [Maintenance Mode](../host/host.md#node-maintenance). For troubleshooting information, see [Manually Attached Volumes](../troubleshooting/host.md#manually-attached-volumes).
+
+:::
+
 </TabItem>
 <TabItem value="api" label="API">
 

diff --git a/static/img/v1.3/troubleshooting/attached-volume.png b/static/img/v1.3/troubleshooting/attached-volume.png
diff --git a/static/img/v1.3/troubleshooting/detached-volume.png b/static/img/v1.3/troubleshooting/detached-volume.png
diff --git a/static/img/v1.3/troubleshooting/node-enter-maintenance-mode.png b/static/img/v1.3/troubleshooting/node-enter-maintenance-mode.png
diff --git a/static/img/v1.3/troubleshooting/node-stuck-cordoned.png b/static/img/v1.3/troubleshooting/node-stuck-cordoned.png
diff --git a/versioned_docs/version-v1.3/advanced/storageclass.md b/versioned_docs/version-v1.3/advanced/storageclass.md
@@ -41,6 +41,12 @@ The number of replicas created for each volume in Longhorn. Defaults to `3`.
 
 ![](/img/v1.2/storageclass/create_storageclasses_replicas.png)
 
+:::info important
+
+When the value is `1`, the created volume from this `StorageClass` has only one replica, it may block the [Node Maintenance](../host/host.md#node-maintenance), check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes).
+
+:::
+
 #### Stale Replica Timeout
 
 Determines when Longhorn would clean up an error replica after the replica's status is ERROR. The unit is minute. Defaults to `30` minutes in Harvester.
@@ -148,4 +154,4 @@ Then, create a new `StorageClass` for the HDD (use the above disk tags). For har
 
 You can now create a volume using the above `StorageClass` with HDDs mostly for cold storage or archiving purpose.
 
-![](/img/v1.2/storageclass/create_volume_hdd.png)
+![](/img/v1.2/storageclass/create_volume_hdd.png)
diff --git a/versioned_docs/version-v1.3/host/host.md b/versioned_docs/version-v1.3/host/host.md
@@ -21,10 +21,36 @@ Because Harvester is built on top of Kubernetes and uses etcd as its database, t
 
 ## Node Maintenance
 
-For admin users, you can click **Enable Maintenance Mode** to evict all VMs from a node automatically. It will leverage the `VM live migration` feature to migrate all VMs to other nodes automatically. Note that at least two active nodes are required to use this feature.
+Migrating or shutting down workloads (and in some cases, also shutting down the underlying node) may be necessary during the following activities:
+
+- Replacing, adding, and removing hardware
+
+- Changing network settings
+
+- Troubleshooting issues
+
+- Removing a node from the cluster
+
+If your cluster has two or more active nodes, you can enable **Maintenance Mode** on nodes that are affected by the planned changes. Maintenance Mode runs a series of checks and leverages **Live Migration** functionality to automatically migrate all VMs to other nodes.
+
+You can enable Maintenance Mode on the **Hosts** screen of the Harvester UI. Select the target node, and then select **⋮** > **Enable Maintenance Mode**.
 
 ![node-maintenance.png](/img/v1.2/host/node-maintenance.png)
 
+After some time, the state of the node changes to *Maintenance*.
+
+![node-enter-maintenance-mode.png](/img/v1.3/troubleshooting/node-enter-maintenance-mode.png)
+
+:::info important
+
+Check the list of [known limitations and workarounds](../troubleshooting/host.md#node-in-maintenance-mode-becomes-stuck-in-cordoned-state) before enabling Maintenance Mode and whenever you encounter related issues.
+
+Volumes that are [manually attached](../troubleshooting/host.md#manually-attached-volumes) to the node may prevent you from enabling Maintenance Mode.
+
+If you have any single-replica volume, it may block the `Node Maintenance`, check the section [Single-Replica Volumes](../troubleshooting/host.md#single-replica-volumes).
+
+:::
+
 ## Cordoning a Node
 
 Cordoning a node marks it as unschedulable. This feature is useful for performing short tasks on the node during small maintenance windows, like reboots, upgrades, or decommissions. When you’re done, power back on and make the node schedulable again by uncordoning it.
@@ -42,6 +68,8 @@ Before removing a node from a Harvester cluster, determine if the remaining node
 
 If the remaining nodes do not have enough resources, VMs might fail to migrate and volumes might degrade when you remove a node.
 
+To ensure that [single-replica](../advanced/storageclass.md#number-of-replicas) volumes can be restored or rebuilt after the node is deleted, either back up those volumes or redeploy the related workloads to other nodes in advance so that the volumes are scheduled to other nodes.
+
 :::
 
 ### 1. Check if the node can be removed from the cluster.
@@ -522,4 +550,4 @@ status:
 ```
 
 The `harvester-node-manager` pod(s) in the `harvester-system` namespace may also contain some hints as to why it is not rendering a file to a node.
-This pod is part of a daemonset, so it may be worth checking the pod that is running on the node of interest.
+This pod is part of a daemonset, so it may be worth checking the pod that is running on the node of interest.