-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] No validation on Maintenance Strategy when Edit as YAML #6835
Comments
The Pod deletion filter identifies the Pods belonging to VMs, which need to be deleted during a node drain by looking at their maintenance-mode strategy label. The default value for this label is `Migrate`, which indicates that the Pod should be deleted during the node drain. Pods with an invalid label, i.e. where the value of the maintenance-mode strategy label is not one of: - Migrate - ShutdownAndRestartAfterEnable - ShutdownAndRestartAfterDisable - Shutdown are now also being treated with the default behavior. This fixes the problem that if a VM contains an invalied value in this label, nodes can become stuck in `Cordoned` state when transitioning to maintenance mode, as the node drain controller won't shut down the VM, but it's also not migrated away from the node. Therefore the VM keeps running, preventing the node from completely transitioning into maintenance mode. related-to: harvester#6835 Signed-off-by: Moritz Röhrich <[email protected]>
Add checks for maintenance-mode strategy to VM webhooks. The Admission webhooks for the VirtualMachine resource needs to make sure that the maintenance-mode strategy for the VM is set to a sane value. To do this, there are two checks: The first one is in the mutating webhook, which is executed first, and it just makes sure that the label, which defines the maintenance-mode strategy, is set. If the label is not set, the mutating webhook will set it with the default value of `Migrate`. If the label is set, the mutating webhook will not modify it, even if it has an invalid value. This ensures that the maintenance-mode strategy is never set unintentionally to a wrong value. The second check will ensure that in this case the request is rejected with an error message, so the user can correct the value of the maintenance-mode strategy label. The second check happens in the validating webhook. This check ensures that the maintenance-mode strategy label is set to a valid maintenance-mode strategy, i.e. one of the values: - Migrate - ShutdownAndRestartAfterEnable - ShutdownAndRestartAfterDisable - Shutdown This webhook will deny a CREATE or UPDATE, if the new VirtualMachine resource does not contain the maintenance-mode strategy label at all, or if it contains an invalid value. The only exception is an UPDATE to a VirtualMachine resource that already contains an invalid value in the maintenance-mode strategy label and where the value does not change. In this case, the webhook will accept the request. This is crucial, in case the controller needs to deal with an existing VM that has an invalid value in this label (e.g. on a cluster that has been upgraded from an old version, before this label was checked by the admission webhook). In this case, the controller still needs to be able to perform UPDATE operations on the resource, to operate the VM. Together, these two checks ensure that no VirtualMachine resource can be created with an invalid maintenance-mode strategy, or with no maintenance-mode strategy at all. They also make sure that the maintenance-mode strategy can not be removed or changed to an invalid value for existing VirtualMachine resources. related-to: harvester#6835 Signed-off-by: Moritz Röhrich <[email protected]>
There are several problems here:
|
Add checks for maintenance-mode strategy to VM webhooks. The Admission webhooks for the VirtualMachine resource needs to make sure that the maintenance-mode strategy for the VM is set to a sane value. To do this, there are two checks: The first one is in the mutating webhook, which is executed first, and it just makes sure that the label, which defines the maintenance-mode strategy, is set. If the label is not set, the mutating webhook will set it with the default value of `Migrate`. If the label is set, the mutating webhook will not modify it, even if it has an invalid value. This ensures that the maintenance-mode strategy is never set unintentionally to a wrong value. The second check will ensure that in this case the request is rejected with an error message, so the user can correct the value of the maintenance-mode strategy label. The mutating webhook will also ensure that the maintenance-mode strategy label is copied from the `.metadata.labels` to `.spec.template.metadata.labels`. This is necessary to ensure that the Pod in which the virtual machine will run will be labeled correctly. The second check happens in the validating webhook. This check ensures that the maintenance-mode strategy label is set to a valid maintenance-mode strategy, i.e. one of the values: - Migrate - ShutdownAndRestartAfterEnable - ShutdownAndRestartAfterDisable - Shutdown This webhook will deny a CREATE or UPDATE, if the new VirtualMachine resource does not contain the maintenance-mode strategy label at all, or if it contains an invalid value. The only exception is an UPDATE to a VirtualMachine resource that already contains an invalid value in the maintenance-mode strategy label and where the value does not change. In this case, the webhook will accept the request. This is crucial, in case the controller needs to deal with an existing VM that has an invalid value in this label (e.g. on a cluster that has been upgraded from an old version, before this label was checked by the admission webhook). In this case, the controller still needs to be able to perform UPDATE operations on the resource, to operate the VM. Together, these two checks ensure that no VirtualMachine resource can be created with an invalid maintenance-mode strategy, or with no maintenance-mode strategy at all. They also make sure that the maintenance-mode strategy can not be removed or changed to an invalid value for existing VirtualMachine resources. related-to: harvester#6835 Signed-off-by: Moritz Röhrich <[email protected]>
The maintenance-mode strategy was introduced in v1.4.0. @innobead should this be fixed on |
yes, fix this in the master and add a backport label, and the corresponding backport issues will be created. |
added |
Describe the bug
During validate #5069, we found that no validation on Maintenance Strategy when Edit as YAML.
And the VM with unexpected configuration will be created successfully...
Warning
If the wrongly configured VM locates on the first node node-0, Enter Maintenance Mode on node-0 will cause it stuck in
Cordoned
.See Additional context.
To Reproduce
Steps to reproduce the behavior:
ubuntu-22.04-server-cloudimg-amd64.img
mgmt-vlan1
harvesterhci.io/maintain-mode-strategy: xxx
inmetadata.labels
=> CreateRunning
(Should NOT) and with empty Maintenance StrategyExpected behavior
Should FAIL to create VM cuz validation error since there are only 4 valid values:
Migrate
(default)ShutdownAndRestartAfterEnable
ShutdownAndRestartAfterDisable
Shutdown
Support bundle
Environment
v1.4.0-rc3
Additional context
If the wrongly configured VM locates on the first node node-0, Enter Maintenance Mode on node-0 will cause it stuck in
Cordoned
.Cordoned
The text was updated successfully, but these errors were encountered: