Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CSI migration logic for volume resize #39

Merged
merged 1 commit into from
Jun 25, 2019

Conversation

leakingtapan
Copy link

@leakingtapan leakingtapan commented Apr 25, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
This is a continuation of the CSI migration for volume resizing. It implements migration logic for in-tree plugin to use CSI driver for volume resize:

  • updated external-resizer to support resizing in-tree volumes types as well as CSI volume.
  • updated external-resizer to handle raw block volume

See kubernetes/community#3624 for WIP migration design for volume resize.

Feature tracking issue: kubernetes/enhancements#625

/cc @ddebroy @davidz627 @gnufied @msau42

Which issue(s) this PR fixes:

Fixes #41

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Implement CSI migration logic for volume resizing

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 25, 2019
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 25, 2019
@davidz627
Copy link

@leakingtapan is there anything I can do to help unblock this PR?

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments added. Further, for looking up secrets around resizing, I think we need to call something along the lines of TranslateInTreeStorageClassToCSI to make sure any in-tree secrets (if necessary) is converted to the CSI equivalents. The storage class parameters names csi.storage.k8s.io/resizer-secret-name and csi.storage.k8s.io/resizer-secret-namespace are not expected to be present for the in-tree storage classes.

@leakingtapan leakingtapan changed the title [WIP] Implement CSI migration logic for volume resize Implement CSI migration logic for volume resize Jun 9, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2019
@leakingtapan
Copy link
Author

Finished the rest of the logic and implemented unit test. PTAL

Copy link

@davidz627 davidz627 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small testing and comment comments. Implementation LGTM

Copy link
Contributor

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion around code structure. LGTM otherwise (modulo David's comments)

@leakingtapan
Copy link
Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 14, 2019
@leakingtapan
Copy link
Author

After testing the new change with synchronization, I found a design flaw of using resizer key from annotation. When the new resizer key volume.kubernetes.io/storage-resizer is used, since both in-tree resizer and external resizer is trying to resize the volume at the same time, the external-resizer won't be able to read the resizer key since it's not even written by in-tree resizer yet.

The current workaround is fallback to provisioner key. WDYT?

/cc @davidz627 @ddebroy @gnufied

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2019
@leakingtapan
Copy link
Author

leakingtapan commented Jun 23, 2019

@gnufied addressed comment and updated the condition to add PVC into work queue when the reszier annotation is added from update event. PTAL

Copy link
Contributor

@gnufied gnufied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Just requested some minor changes in tests.

@leakingtapan
Copy link
Author

Updated PR

@leakingtapan
Copy link
Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 24, 2019
@gnufied
Copy link
Contributor

gnufied commented Jun 24, 2019

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2019
@gnufied
Copy link
Contributor

gnufied commented Jun 24, 2019

/assign @msau42

//
// We add the PVC into work queue when the new size is larger then the old size
// or when the resizer name annotation is missing from old PVC
// but present in the new PVC. This is needed for CSI migration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe more in detail why it is needed for migration? Even when migration is enabled, wouldn't the new and old sizes be different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is - when first time a migrated PVC is expanded, it does not yet have the annotation because annotation is only added by in-tree resizer when it receives a volume expansion request. So first update event that will be received by external-resizer will be ignored because it won't know how to support resizing of a "un-annotated" in-tree PVC.

When in-tree resizer does add the annotation, a second update even will be received and we add the pvc to workqueue. If annotation matches the registered driver name in csi_resizer object, we proceeds with expansion internally or we discard the PVC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To determine if we should process the PVC, should we be comparing spec/status of size, instead of new/old PVC size? How do we retry resizing operation if it fails?

Regardless, if this is the way it needs to be, let's explain it in a comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @gnufied for detailed explanation, added you reply into the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do compare spec/status size later - https://github.com/kubernetes-csi/external-resizer/blob/master/pkg/controller/controller.go#L213 but having new vs old size check in update prevents processing of events which aren't size related updates for PVC.

return *resource.NewQuantity(newSizeBytes, resource.BinarySI), nodeResizeRequired, err
}

return *resource.NewQuantity(newSizeBytes, resource.BinarySI), false, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any block plugins that only implement node resize?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that Im aware of, could it be a case for local volume plugin? And how is this going to affect this highlighted line?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing in the csi spec that limits node expansion to only filesystem volumes. For a block plugin that only supports node resize, then is setting nodeResizeRequired=false going to cause nodeResize to never be called?

Should we instead always call node expansion (based off of nodeResizeRequired)? If the plugin gets a node expand call for a block device, can they just return success immediately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the responsibility of determining whether node expansion is needed or not should be delegated to the plugin, but in control-plane the plugin may not have the necessary information to know that volume is being used in block device mode.

For now - I think it makes sense to return whatever nodeResizeRequired value was returned by the plugin. In future I am thinking we may want to update CSI spec to pass volume capability with ControllerExpandVolume request.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question here is more related to node expand. It looks like we don't pass volume capability to node expansion, and maybe we need to add that?

I guess what a plugin could do now is check if the volume has a filesystem on it. If not, then assume it's raw block, otherwise assume it's filesystem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a plugin be able to determine that given target_path in NodeExpandVolume RPC call is a device or a mounted filesystem?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @gnufied here, ControllerExpand should have volume capability in the future so the driver can determine nodeResizeRequired better.

NodeExpandVolume may be determinable by the target_path but it might be nice to know the volume capability too.

Either way, for this PR I think just returning the plugins nodeResizeRequired makes sense instead of overwriting with false.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if we should be dynamically determining whether or not to call node expand. It is simpler to base it strictly on the plugin's capabilities. Especially across controller and node services, this makes testing/supporting version skew more complicated.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2019
@leakingtapan
Copy link
Author

Added more comments @msau42

* Using PVC annotation for sychronication between in-tree resizer and extternal resizer
* Modify enqueue condition for resizer migration
* Add unit tests
@msau42
Copy link
Collaborator

msau42 commented Jun 25, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 25, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leakingtapan, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2019
@k8s-ci-robot k8s-ci-robot merged commit 24780ba into kubernetes-csi:master Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perform resizing of migrated PVCs
6 participants