storage capacity: object ownership #583

pohly · 2021-03-11T16:51:51Z

What type of PR is this?
/kind api-change

What this PR does / why we need it:

This promotes "storage capacity tracking" to beta and changes its usages as agreed upon during KEP review for beta (optional ownership, labels used to identify objects managed by each instance).

Which issue(s) this PR fixes:

Fixes #465

Special notes for your reviewer:

Based on #579

Does this PR introduce a user-facing change?:

Storage capacity tracking becomes a beta feature. The parameters are different (`POD_NAMESPACE` -> `NAMESPACE`) and `CSIStorageCapacity` objects created via the alpha feature in previous releases must be deleted before rolling out the new release, because the way how each instance identifies the objects to manage has changed (from owner reference to labels).

pohly · 2021-03-11T16:52:38Z

/hold

Let's merge this one last once all other changes for storage capacity tracking are in.

verult · 2021-03-16T21:36:14Z

/assign

verult · 2021-04-07T03:17:34Z

Just curious: how does a label selector increase API server CPU load?

verult · 2021-04-07T03:49:22Z

cmd/csi-provisioner/csi-provisioner.go

 		factoryForNamespace = informers.NewSharedInformerFactoryWithOptions(clientset,
 			ctrl.ResyncPeriodOfCsiNodeInformer,
 			informers.WithNamespace(namespace),
+			informers.WithTweakListOptions(func(lo *metav1.ListOptions) {


I'm wondering how easy it is to detect if the informer doesn't fetch objects we expect, and whether we need to unit test this.

If informer logic is wrong, two things could happen:

Missing items that are supposed to be included

May try to create an existing CSIStorageCapacity object. In the logic today this is fine, because nothing is done after object creation (besides logging / error reporting), and the creation error would give us a hint.

Includes more things than it should

No failures, just more network traffic.

The latter case is hard to detect. So maybe a unit test is worthwhile (if it's not too much effort)?

Actually since this is just an options configuration, the logic is simple enough, so it's probably low risk. Totally up to you if you want to unit test this.

I'd prefer to not test this. I suspect that it wouldn't be easy to test comprehensively and in the end is more about correct implementation of the informer and apiserver filtering than it is about correct configuration.

pohly · 2021-04-07T08:25:46Z

Just curious: how does a label selector increase API server CPU load?

I had exactly the same concern and asked on #sig-scalability. The response was that spending more CPU time on server-side filtering can be assumed to be a net win when it decreases the number of objects that need to be transferred. That costs both network bandwidth and CPU time for conversion.

chrishenzie · 2021-04-07T20:24:05Z

When you have a moment can you please address this merge conflict? We are planning to release this tomorrow to align with the Kubernetes 1.21 release.

verult · 2021-04-07T21:34:23Z

I had exactly the same concern and asked on #sig-scalability. The response was that spending more CPU time on server-side filtering can be assumed to be a net win when it decreases the number of objects that need to be transferred. That costs both network bandwidth and CPU time for conversion.

Since I'm not familiar with API server code, I was actually wondering exactly how computationally heavy server-side filtering is. What does it do under the hood to do this filtering?

verult · 2021-04-07T22:11:42Z

README.md

+Optionally, all CSIStorageCapacity objects created by an instance of
+the external-provisioner can have an
+[owner](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents).
+This ensures that the objects get removed automatically when uninstalling


Add: "Setting an owner reference is highly recommended whenever possible (i.e. in the most common case that drivers are run inside containers)."

What's the worst case scenario if orphaned objects aren't cleaned up?

For the case when all orphaned CSIStorageCapacity objects report not enough capacity for the volume to be provisioned, even though the driver is no longer there: the pod will fail to schedule and stay pending, whereas if objects are cleaned up properly, the pod is scheduled and the volume fails provioning, also causing the pod to stay pending. Once the driver comes back, CSIStorageCapacity objects will reflect the actual state again. So I think we are good in this case.

When we video chatted, I think we talked about a case where orphaned objects could cause the scheduler to hang? But I totally forgot which case it was :(. Maybe it's actually OK?

Added.

As you said, orphaned and incorrect objects cause no permanent damage for pods with normal PVCs (which includes generic ephemeral volumes), so it's not too bad. It's worse for CSI ephemeral volumes because then a pod might get scheduled onto a node where the driver then cannot provide the volume and the pod is stuck there (no rescheduling). I think that was what we chatted about, not the scheduler itself hanging.

Ah yeah. Could you maybe mention this in the design doc somewhere (if it's not there already)?

It is covered by the generic ephemeral volume KEP.

verult · 2021-04-07T22:36:18Z

README.md

+determine with the `POD_NAME/NAMESPACE` environment variables and
+the `--capacity-ownerref-level` parameter.
+
+If ownership is disabled, then the following command can be used to clean


Add: "If the ownership is disabled, the storage admin is responsible for removing orphaned CSIStorageCapacity objects, and the following..."

verult · 2021-04-07T23:01:59Z

cmd/csi-provisioner/csi-provisioner.go

@@ -92,7 +92,7 @@ var (
 	enableCapacity           = flag.Bool("enable-capacity", false, "This enables producing CSIStorageCapacity objects with capacity information from the driver's GetCapacity call.")
 	capacityImmediateBinding = flag.Bool("capacity-for-immediate-binding", false, "Enables producing capacity information for storage classes with immediate binding. Not needed for the Kubernetes scheduler, maybe useful for other consumers or for debugging.")
 	capacityPollInterval     = flag.Duration("capacity-poll-interval", time.Minute, "How long the external-provisioner waits before checking for storage capacity changes.")
-	capacityOwnerrefLevel    = flag.Int("capacity-ownerref-level", 1, "The level indicates the number of objects that need to be traversed starting from the pod identified by the POD_NAME and POD_NAMESPACE environment variables to reach the owning object for CSIStorageCapacity objects: 0 for the pod itself, 1 for a StatefulSet, 2 for a Deployment, etc.")
+	capacityOwnerrefLevel    = flag.Int("capacity-ownerref-level", 1, "The level indicates the number of objects that need to be traversed starting from the pod identified by the POD_NAME and POD_NAMESPACE environment variables to reach the owning object for CSIStorageCapacity objects: -1 for no owner, 0 for the pod itself, 1 for a StatefulSet, 2 for a Deployment, etc.")


1 for a StatefulSet or DaemonSet

verult · 2021-04-07T23:16:55Z

pkg/capacity/capacity.go

@@ -551,13 +561,17 @@ func (c *Controller) syncCapacity(ctx context.Context, item workItem) error {
 		// would race with receiving that object through the event handler. In the unlikely
 		// scenario that we end up creating two objects for the same work item, the second
 		// one will be recognized as duplicate and get deleted again once we receive it.
-	} else if capacity.Capacity.Value() == quantity.Value() {
+	} else if capacity.Capacity.Value() == quantity.Value() &&
+		(c.owner == nil || c.isOwnedByUs(capacity)) {


Should this be isManaged()? If a driver running in binary mode manages an object which is unchanged, it should still skip right?

This really is an ownerref check. It covers the scenario where a driver was deployed without ownerref and then gets redeployed with ownerref. The existing objects then need to be updated even when the capacity is the same, which is achieved here by falling through to the final else clause.

I added "same capacity %v and correct owner" to the log message to make this more clear.

verult · 2021-04-07T23:21:48Z

pkg/capacity/capacity.go

 		if owner.Controller != nil && *owner.Controller && owner.UID == c.owner.UID {
 			return true
 		}
 	}
 	return false
 }
+
+// isManaged checks the labels to determine whether this capacity object is managed by
+// the controller instance.


Since the informer is already filtered by labels in the next commit, and all callers of isManaged() fetch from the informer, is this mostly here as a safeguard? If so, nit: could you add a comment to clarify?

Correct. I added a comment.

verult · 2021-04-07T23:29:46Z

pkg/capacity/capacity.go

@@ -587,9 +601,8 @@ func (c *Controller) deleteCapacity(ctx context.Context, capacity *storagev1alph
 // ensures that it gets deleted if no longer needed. Foreign objects


nit: update function name in comment

verult · 2021-04-07T23:51:33Z

pkg/capacity/capacity_test.go

@@ -1217,7 +1264,8 @@ func fakeController(ctx context.Context, client *fakeclientset.Clientset, storag
 		driverName,
 		client,
 		queue,
-		defaultOwner,
+		&defaultOwner,


Should there be a test case for "managed by us but not owned by us"?

There is "reuse one capacity object, add owner" for that.

The inverse "reuse one capacity object, remove owner" doesn't work because the driver instance without owner configured doesn't know which owner references it is expected to remove. This shouldn't be needed in practice and the documentation mentions it.

I was actually referring to a test that captures the container-less driver case, where ownerref stays nil always. IIUC in all the tests so far assumes that capacity objects are eventually managed by a driver in containers (i.e. owner is always set to defaultOwner).

If you agree this test is necessary I would prefer that the test is added in this PR rather than delaying it after the release, because it is a use case we plan to support. Hopefully it's not a lot of work.

You are right, that is a supported use case and wasn't covered. I've added two test cases for it which cover the relevant difference (create objects without owner reference, keep existing references).

pohly · 2021-04-08T09:23:38Z

Since I'm not familiar with API server code, I was actually wondering exactly how computationally heavy server-side filtering is. What does it do under the hood to do this filtering?

It has an in-memory representation of the object and then simply checks labels in the meta data. In that sense it's simply brute force. But the in-memory representation is needed anyway and then the check is still faster than the rest of the processing that would need to happen when sending the object. At least that is my understanding...

pohly · 2021-04-08T09:24:11Z

I pushed an updated version. As before it is based on PR #586

pohly · 2021-04-08T09:36:55Z

When you have a moment can you please address this merge conflict? We are planning to release this tomorrow to align with the Kubernetes 1.21 release.

What about updating client-go in external-provisioner to 1.21.0? Kubernetes 1.21 must be released first, then I can update client-go and only then can we release external-provisioner.

Or we release external-provisioner with the current code. None of the differences seem relevant. I created the update for rc0 here: #599

When the provisioner has created or deleted a volume, storage capacity is likely to be different than before. We have some information (storage class, topology) which allows us to schedule a subset of the CSIStorageCapacity objects for an update sooner than the normal periodic update.

chrishenzie · 2021-04-08T16:24:18Z

@pohly I had planned on updating to 1.21 when it was released. Thank you for taking care of this.

verult · 2021-04-08T23:26:40Z

It has an in-memory representation of the object and then simply checks labels in the meta data. In that sense it's simply brute force. But the in-memory representation is needed anyway and then the check is still faster than the rest of the processing that would need to happen when sending the object. At least that is my understanding...

Yeah that makes sense. Some sort of index could maybe speed up processing but maybe that's overkill 🤷

verult

One more potential change, everything else lgtm

verult · 2021-04-08T23:35:12Z

pkg/capacity/capacity.go

-	} else if capacity.Capacity.Value() == quantity.Value() {
-		klog.V(5).Infof("Capacity Controller: no need to update %s for %+v, same capacity %v", capacity.Name, item, quantity)
+	} else if capacity.Capacity.Value() == quantity.Value() &&
+		(c.owner == nil || c.isOwnedByUs(capacity)) {


I see, and c.owner == nil covers the case that a driver stays owner-less. Sgtm!

verult · 2021-04-08T23:46:14Z

pkg/capacity/capacity_test.go

@@ -1217,7 +1264,8 @@ func fakeController(ctx context.Context, client *fakeclientset.Clientset, storag
 		driverName,
 		client,
 		queue,
-		defaultOwner,
+		&defaultOwner,


I was actually referring to a test that captures the container-less driver case, where ownerref stays nil always. IIUC in all the tests so far assumes that capacity objects are eventually managed by a driver in containers (i.e. owner is always set to defaultOwner).

If you agree this test is necessary I would prefer that the test is added in this PR rather than delaying it after the release, because it is a use case we plan to support. Hopefully it's not a lot of work.

verult · 2021-04-09T16:56:19Z

/lgtm

/assign @msau42
For approval

msau42

/approve

just some nits

msau42 · 2021-04-09T17:19:11Z

README.md

+[owner](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents).
+This ensures that the objects get removed automatically when uninstalling
+the CSI driver.  The owner is
+determine with the `POD_NAME/NAMESPACE` environment variables and


typo: determined

msau42 · 2021-04-09T18:32:36Z

README.md

@@ -76,7 +76,7 @@ See the [storage capacity section](#capacity-support) below for details.

 * `--enable-capacity`: This enables producing CSIStorageCapacity objects with capacity information from the driver's GetCapacity call. The default is to not produce CSIStorageCapacity objects.

-* `--capacity-ownerref-level <levels>`: The level indicates the number of objects that need to be traversed starting from the pod identified by the POD_NAME and POD_NAMESPACE environment variables to reach the owning object for CSIStorageCapacity objects: 0 for the pod itself, 1 for a StatefulSet, 2 for a Deployment, etc. Defaults to `1` (= StatefulSet).
+* `--capacity-ownerref-level <levels>`: The level indicates the number of objects that need to be traversed starting from the pod identified by the POD_NAME and NAMESPACE environment variables to reach the owning object for CSIStorageCapacity objects: 0 for the pod itself, 1 for a StatefulSet, 2 for a Deployment, etc. Defaults to `1` (= StatefulSet). Ownership is optional and can be disabled with -1.


Update this to also include Daemonset

k8s-ci-robot · 2021-04-09T18:38:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [msau42]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The mandatory ownership reference prevented deploying external-provisioner outside of the Kubernetes cluster and made it impossible to reuse objects when rolling out an updated CSI driver which used distributed provisioning. It's better to identify managed objects with labels. This also makes it possible to optimize the watch (to be implemented in a separate commit).

This reduces network traffic and is beneficial for: - multiple CSI drivers in the same namespace (filtered by driver name) - distributed provisioning (filtered by manager ID) CPU load on the kube-apiserver will be higher, but the reduction in network traffic is expected to make this worthwhile.

When the controller runs without an owner configured, it - must not add one (obviously...) - preserve existing owner references

k8s-ci-robot · 2021-04-09T19:03:09Z

New changes are detected. LGTM label has been removed.

pohly · 2021-04-09T19:03:48Z

just some nits

Both fixed, please lgtm again.

/hold cancel

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 11, 2021

k8s-ci-robot requested review from msau42 and saad-ali March 11, 2021 16:52

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 11, 2021

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2021

pohly mentioned this pull request Mar 11, 2021

storage capacity: maximum volume size #584

Merged

k8s-ci-robot assigned verult Mar 16, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2021

This was referenced Mar 22, 2021

distributed snapshotting kubernetes-csi/external-snapshotter#484

Closed

distributed resizing kubernetes-csi/external-resizer#142

Open

pohly mentioned this pull request Mar 31, 2021

storage capacity: refresh after volume changes #586

Merged

verult reviewed Apr 7, 2021

View reviewed changes

verult suggested changes Apr 7, 2021

View reviewed changes

pohly force-pushed the object-ownership branch from e10d0cf to 234ba8f Compare April 8, 2021 07:53

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2021

pohly force-pushed the object-ownership branch from 234ba8f to 8315bb0 Compare April 8, 2021 09:20

pohly force-pushed the object-ownership branch from 8315bb0 to 1c6d794 Compare April 8, 2021 13:49

pohly mentioned this pull request Apr 8, 2021

build: update Kubernetes dependencies to 1.21.0 #599

Merged

verult suggested changes Apr 8, 2021

View reviewed changes

k8s-ci-robot assigned msau42 Apr 9, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2021

msau42 reviewed Apr 9, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2021

pohly added 3 commits April 9, 2021 21:02

capacity: test also with no owner configured

b3a9aa5

When the controller runs without an owner configured, it - must not add one (obviously...) - preserve existing owner references

pohly force-pushed the object-ownership branch from bf41769 to b3a9aa5 Compare April 9, 2021 19:03

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2021

k8s-ci-robot merged commit be0ce47 into kubernetes-csi:master Apr 9, 2021

xing-yang mentioned this pull request Aug 12, 2021

Distributed health monitor controller kubernetes-csi/external-health-monitor#87

Closed

		@@ -587,9 +601,8 @@ func (c Controller) deleteCapacity(ctx context.Context, capacity storagev1alph
		// ensures that it gets deleted if no longer needed. Foreign objects

storage capacity: object ownership #583

storage capacity: object ownership #583

Conversation

pohly commented Mar 11, 2021 • edited Loading

pohly commented Mar 11, 2021

verult commented Mar 16, 2021

verult commented Apr 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Apr 7, 2021

chrishenzie commented Apr 7, 2021

verult commented Apr 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Apr 8, 2021

pohly commented Apr 8, 2021

pohly commented Apr 8, 2021

chrishenzie commented Apr 8, 2021

verult commented Apr 8, 2021

verult left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

verult commented Apr 9, 2021

msau42 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 9, 2021

k8s-ci-robot commented Apr 9, 2021

pohly commented Apr 9, 2021

pohly commented Mar 11, 2021 •

edited

Loading