Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 284: Add PRR for volume expansion feature #3195

Merged
merged 5 commits into from
Feb 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions keps/prod-readiness/sig-storage/284.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
kep-number: 284
stable:
approver: "@deads2k"
260 changes: 260 additions & 0 deletions keps/sig-storage/284-enable-volume-expansion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,17 @@
- [PVC API Change](#pvc-api-change)
- [StorageClass API change](#storageclass-api-change)
- [Other API changes](#other-api-changes)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
- [Monitoring Requirements](#monitoring-requirements)
- [Dependencies](#dependencies)
- [Scalability](#scalability)
- [Troubleshooting](#troubleshooting)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
<!-- /toc -->

## Release Signoff Checklist
Expand Down Expand Up @@ -344,3 +355,252 @@ type StorageClass struct {

This proposal relies on ability to update PVC status from kubelet. While updating PVC's status
a PATCH request must be made from kubelet to update the status.

## Production Readiness Review Questionnaire

### Feature Enablement and Rollback

###### How can this feature be enabled / disabled in a live cluster?

Volume expansion as a feature has been in beta for too long and as a result has gathered
different feature gates that control various aspects of expansion.

- [x] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: ExpandPersistentVolumes
- description: |
This feature is required for `pvc.Spec.Resources` to be editable and must be
enabled for other expansion related feature gates to work.
- Components depending on the feature gate:
- kube-apiserver
- kubelet
- kube-controller-manager
- Feature gate name: ExpandInUsePersistentVolumes
- description: Enables online expansion. Requires ExpandPersistentVolumes feature gate.
- Components depending on the feature gate:
- kube-apiserver
- kubelet
- kube-controller-manager
- Feature gate name: ExpandCSIVolumes
- description: Enables CSI expansion.
- Components depending on the feature gate:
- kube-apiserver
- kubelet
- kube-controller-manager
- [ ] Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control
plane?
Enabling/Disabling this feature does not require complete downtime of control-plane
and feature gates can be enabled progressively on different control-plane nodes.
- Will enabling / disabling the feature require downtime or reprovisioning
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you answer this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Enabling this feature can be enabled progressively on nodes and as expansion is enabled
on the node then volume expansion will happen on kubelet.

###### Does enabling the feature change any default behavior?

Enabling the feature gate allows users to increase size of pvc by editing `pvc.Spec.Resources` which results
in Kubernetes trying to fulfill the request by actually expanding the volume in controller and then performing
file system or any other kind of expansion needed on the node.

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes - it can be disabled. It just means users can no longer expand their PVCs.

###### What happens if we reenable the feature if it was previously rolled back?

It should be safe to do that. It will just re-enable the feature.

###### Are there any tests for feature enablement/disablement?

There aren't any e2e but there are unit tests that cover this behaviour.

### Rollout, Upgrade and Rollback Planning

###### How can a rollout or rollback fail? Can it impact already running workloads?

The feature gate should not impact existing workloads but since we try to expand the
file system(or perform node-expansion) during volume mount and if expansion fails with
some kind of terminal error then it may prevent mount operation from succeeding.

###### What specific metrics should inform a rollback?

The `volume_mount` operation failure metric - `storage_operation_duration_seconds{operation_name=volume_mount, status=fail-unknown}`
combined with `storage_operation_duration_seconds{operation_name=volume_fs_resize, status=fail-unknown}` should tell us
if expansion is failing on the node and if it is causing mount failures.

Also `csi_sidecar_operations_seconds` and `csi_operations_seconds` metrics with high failure rates for expansion operation should indicate
that expansion is not working in the cluster and hence feature should be rolled back.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

There are no e2e for upgrade-downgrade-upgrade tests for this specific feature but since volume expansion has been
in beta since 1.11, we have tested the feature manually.

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

This feature does not deprecate any existing features.

### Monitoring Requirements

###### How can an operator determine if the feature is in use by workloads?

A PVC that is being expanded should have `pvc.Status.Conditions` set.

###### How can someone using this feature know that it is working for their instance?

- [x] Events
- Resizing (on PVC)
- Event Reason: External resizer is resizing volume pvc-a71483ed-a5bc-48fa-9151-ca41e7e6634e
- VolumeResizeSuccessful (on PVC)
- Event Reason: Volume resize is successful
- FileSystemResizeSuccessful (on PVC)
- Event Reason: Volume resize is successful. This event is emitted when resizing finishes on kubelet.
- [x API .status
- Condition name:
- Other field:
- [x] Other (treat as last resort)
- Details: `pvc.Status.Capacity` should reflect user requested size after expansion is complete.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Enabling this feature should not negatively impact volume mount timings in general cases and hence percentile determined by `storage_operation_duration_seconds{operation_name=volume_mount}` metric should not change.

Having said that if file system requires expansion during mount then it is obviously going to take longer for mount operation to finish.

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

- [ ] Metrics
- controller expansion operation duration:
- Metric name: storage_operation_duration_seconds{operation_name=expand_volume, status=success|fail-unknown}
- [Optional] Aggregation method: percentile
- Components exposing the metric: kube-controller-manager
- node expansion operation duration:
- Metric name: storage_operation_duration_seconds{operation_name=volume_fs_resize, status=success|fail-unknown}
- [Optional] Aggregation method: percentile
- Components exposing the metric: kubelet
- CSI operation metrics in controller:
- Metric name: csi_sidecar_operations_seconds
- [Optional] Aggregation method: percentile
- Components exposing the metric: external-resizer
- CSI operation metrics in kubelet:
- Metric Name: csi_operations_seconds
- [Optional] Aggregation method: percentile
- Components exposing the metric: kubelet

- [ ] Other (treat as last resort)
- Details:

###### Are there any missing metrics that would be useful to have to improve observability of this feature?
All the intree operations from control plane emit `storage_operation_duration_seconds{operation_name=expand_volume, status=success|fail-unknown}` metrics but CSI equivalent from external-resizer is `csi_sidecar_operations_seconds` which will be
documented as alternative if CSI migration is enabled or driver being used is CSI driver.
We don't need to emit new metrics but we do need to document the naming change in metric names.

### Dependencies

<!--
This section must be completed when targeting beta to a release.
-->

###### Does this feature depend on any specific services running in the cluster?

This feature requires external-resizer running in the cluster for CSI volume expansion.

### Scalability

###### Will enabling / using this feature result in any new API calls?

Yes enabling this feature requires new API calls.

- Updates to PVs
- API operations
- PATCH PV
- GET PV
- List PVs
- originating components: kubelet, kube-controller-manager, external-resizer
- resync duration: 10mins (also user configurable)
- Update to PVCs:
- API operations
- PATCH PVC
- GET PVC
- List PVC
- originating components: kubelet, kube-controller-manager, external-resizer
- resync duration: 10mins (also user configurable)

If user enables protection for not expanding PVCs that are in-use, external-resizer will
also watch *all* pods in the cluster. This is an optional flag in external-resizer and generally
only needed when some CSI drivers don't want to handle expansion calls for volumes which are potentially in-use by a pod.

###### Will enabling / using this feature result in introducing new API types?

No

###### Will enabling / using this feature result in any new calls to the cloud provider?

Yes, we expect new calls to modify existing volume objects.

###### Will enabling / using this feature result in increasing size or count of the existing API objects?

Describe them, providing:
- API type(s): PVC
- Estimated increase in size: A PVC with conditions could have its size increased by anywhere between 100 to 250B.
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
- API type(s): StorageClass
- Estimated increase in size: A StorageClass with `AllowVolumeExpansion` has its size increased by 26bytes almost.
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

If expansion happens because of pending file system during mount, then it would increase mount time of volume.

###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

Enabling this feature should not result in resource usage by significant margin, but since we are talking about new controller and an external resize controller for CSI, the resource usage is not negligible either. Having said that - this feature has been in beta since 1.11 and enabled by default(and used in production) - we do not expect resource usage to be a problem.

### Troubleshooting

###### How does this feature react if the API server and/or etcd is unavailable?
Since this feature is user driven and API server or etcd becomes unavailable then users won't be able to expand the PVC.
But if API server becomes unavailable midway through the expansion process then the expansion controller may not be able
save updated PVC in api-server but control-flow is designed to retry and recover from such failures.

###### What are other known failure modes?

- Expansion can be permanently stuck:
- Detection: Check conditions on `pvc.status`
- Mitigations: If expansion is stuck permanently because of issues in backend and can not be recovered then, it requires manual intervention. Steps to recover from expansion failures are documented in - https://kubernetes.io/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the recover from resize feature eliminate these manual steps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in some cases, if no recovery is possible - say when volume was expanded in controller but failing on the node, then recover from resize failure feature will not help. So admins may still have to take some action if that happens.

- Diagnostics: Conditions on `pvc.Status` and events on PVC should clearly indicate that expansion is failing.
- Testing: There are some unit tests for failure mode but no e2e.


###### What steps should be taken if SLOs are not being met to determine the problem?

If expansion is affecting pod startup time or causing other issues. It can be disabled by editing storageclass and setting `allowVolumeExpansion` to `false`.

## Implementation History

- 1.8: Alpha
- 1.11: Beta
- 1.24 GA

## Drawbacks

<!--
Why should this KEP _not_ be implemented?
-->

## Alternatives

<!--
What other approaches did you consider, and why did you rule them out? These do
not need to be as detailed as the proposal, but should include enough
information to express the idea and why it was not acceptable.
-->

## Infrastructure Needed (Optional)

<!--
Use this section if you need things from the project/SIG. Examples include a
new subproject, repos requested, or GitHub details. Listing these here allows a
SIG to get the process for these resources started right away.
-->
16 changes: 14 additions & 2 deletions keps/sig-storage/284-enable-volume-expansion/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,25 @@ see-also:
replaces:
superseded-by:

latest-milestone: "v1.19"
stage: "alpha"
latest-milestone: "v1.24"
stage: "stable"
milestone:
alpha: "v1.8"
beta: "v1.11"
stable: "v1.24"
feature-gates:
- name: ExpandPersistentVolumes
components:
- kube-apiserver
- kubelet
- kube-controller-manager
- name: ExpandInUsePersistentVolumes
components:
- kube-apiserver
- kubelet
- kube-controller-manager
- name: ExpandCSIVolumes
components:
- kube-apiserver
- kubelet
- kube-controller-manager