-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CSI migration logic for volume resize #39
Implement CSI migration logic for volume resize #39
Conversation
@leakingtapan is there anything I can do to help unblock this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments added. Further, for looking up secrets around resizing, I think we need to call something along the lines of TranslateInTreeStorageClassToCSI
to make sure any in-tree secrets (if necessary) is converted to the CSI equivalents. The storage class parameters names csi.storage.k8s.io/resizer-secret-name
and csi.storage.k8s.io/resizer-secret-namespace
are not expected to be present for the in-tree storage classes.
586e2e0
to
8f75a0f
Compare
Finished the rest of the logic and implemented unit test. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small testing and comment comments. Implementation LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suggestion around code structure. LGTM otherwise (modulo David's comments)
8f75a0f
to
11eb8af
Compare
/hold |
After testing the new change with synchronization, I found a design flaw of using resizer key from annotation. When the new resizer key The current workaround is fallback to provisioner key. WDYT? |
83d5dff
to
59fb87a
Compare
@gnufied addressed comment and updated the condition to add PVC into work queue when the reszier annotation is added from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. Just requested some minor changes in tests.
59fb87a
to
b1b48db
Compare
Updated PR |
/hold cancel |
/hold cancel |
/assign @msau42 |
pkg/controller/controller.go
Outdated
// | ||
// We add the PVC into work queue when the new size is larger then the old size | ||
// or when the resizer name annotation is missing from old PVC | ||
// but present in the new PVC. This is needed for CSI migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe more in detail why it is needed for migration? Even when migration is enabled, wouldn't the new and old sizes be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is - when first time a migrated PVC is expanded, it does not yet have the annotation because annotation is only added by in-tree resizer when it receives a volume expansion request. So first update event that will be received by external-resizer will be ignored because it won't know how to support resizing of a "un-annotated" in-tree PVC.
When in-tree resizer does add the annotation, a second update even will be received and we add the pvc to workqueue. If annotation matches the registered driver name in csi_resizer object, we proceeds with expansion internally or we discard the PVC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To determine if we should process the PVC, should we be comparing spec/status of size, instead of new/old PVC size? How do we retry resizing operation if it fails?
Regardless, if this is the way it needs to be, let's explain it in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @gnufied for detailed explanation, added you reply into the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do compare spec/status size later - https://github.com/kubernetes-csi/external-resizer/blob/master/pkg/controller/controller.go#L213 but having new vs old size check in update prevents processing of events which aren't size related updates for PVC.
pkg/resizer/csi_resizer.go
Outdated
return *resource.NewQuantity(newSizeBytes, resource.BinarySI), nodeResizeRequired, err | ||
} | ||
|
||
return *resource.NewQuantity(newSizeBytes, resource.BinarySI), false, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any block plugins that only implement node resize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that Im aware of, could it be a case for local volume plugin? And how is this going to affect this highlighted line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing in the csi spec that limits node expansion to only filesystem volumes. For a block plugin that only supports node resize, then is setting nodeResizeRequired=false going to cause nodeResize to never be called?
Should we instead always call node expansion (based off of nodeResizeRequired)? If the plugin gets a node expand call for a block device, can they just return success immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the responsibility of determining whether node expansion is needed or not should be delegated to the plugin, but in control-plane the plugin may not have the necessary information to know that volume is being used in block device mode.
For now - I think it makes sense to return whatever nodeResizeRequired
value was returned by the plugin. In future I am thinking we may want to update CSI spec to pass volume capability with ControllerExpandVolume
request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question here is more related to node expand. It looks like we don't pass volume capability to node expansion, and maybe we need to add that?
I guess what a plugin could do now is check if the volume has a filesystem on it. If not, then assume it's raw block, otherwise assume it's filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't a plugin be able to determine that given target_path
in NodeExpandVolume
RPC call is a device or a mounted filesystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @gnufied here, ControllerExpand
should have volume capability in the future so the driver can determine nodeResizeRequired
better.
NodeExpandVolume
may be determinable by the target_path
but it might be nice to know the volume capability too.
Either way, for this PR I think just returning the plugins nodeResizeRequired
makes sense instead of overwriting with false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if we should be dynamically determining whether or not to call node expand. It is simpler to base it strictly on the plugin's capabilities. Especially across controller and node services, this makes testing/supporting version skew more complicated.
b1b48db
to
0870251
Compare
Added more comments @msau42 |
0870251
to
74df227
Compare
* Using PVC annotation for sychronication between in-tree resizer and extternal resizer * Modify enqueue condition for resizer migration * Add unit tests
74df227
to
a380b39
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: leakingtapan, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
What this PR does / why we need it:
This is a continuation of the CSI migration for volume resizing. It implements migration logic for in-tree plugin to use CSI driver for volume resize:
See kubernetes/community#3624 for WIP migration design for volume resize.
Feature tracking issue: kubernetes/enhancements#625
/cc @ddebroy @davidz627 @gnufied @msau42
Which issue(s) this PR fixes:
Fixes #41
Special notes for your reviewer:
Does this PR introduce a user-facing change?: