-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage capacity: object ownership #583
Changes from all commits
2fcc4ae
48ee591
67fe1a3
b3a9aa5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ import ( | |
v1 "k8s.io/api/core/v1" | ||
storagev1 "k8s.io/api/storage/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/labels" | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
utilfeature "k8s.io/apiserver/pkg/util/feature" | ||
"k8s.io/client-go/informers" | ||
|
@@ -92,7 +93,7 @@ var ( | |
enableCapacity = flag.Bool("enable-capacity", false, "This enables producing CSIStorageCapacity objects with capacity information from the driver's GetCapacity call.") | ||
capacityImmediateBinding = flag.Bool("capacity-for-immediate-binding", false, "Enables producing capacity information for storage classes with immediate binding. Not needed for the Kubernetes scheduler, maybe useful for other consumers or for debugging.") | ||
capacityPollInterval = flag.Duration("capacity-poll-interval", time.Minute, "How long the external-provisioner waits before checking for storage capacity changes.") | ||
capacityOwnerrefLevel = flag.Int("capacity-ownerref-level", 1, "The level indicates the number of objects that need to be traversed starting from the pod identified by the POD_NAME and POD_NAMESPACE environment variables to reach the owning object for CSIStorageCapacity objects: 0 for the pod itself, 1 for a StatefulSet, 2 for a Deployment, etc.") | ||
capacityOwnerrefLevel = flag.Int("capacity-ownerref-level", 1, "The level indicates the number of objects that need to be traversed starting from the pod identified by the POD_NAME and POD_NAMESPACE environment variables to reach the owning object for CSIStorageCapacity objects: -1 for no owner, 0 for the pod itself, 1 for a StatefulSet or DaemonSet, 2 for a Deployment, etc.") | ||
|
||
enableNodeDeployment = flag.Bool("node-deployment", false, "Enables deploying the external-provisioner together with a CSI driver on nodes to manage node-local volumes.") | ||
nodeDeploymentImmediateBinding = flag.Bool("node-deployment-immediate-binding", true, "Determines whether immediate binding is supported when deployed on each node.") | ||
|
@@ -393,39 +394,30 @@ func main() { | |
nodeDeployment, | ||
) | ||
|
||
provisionController = controller.NewProvisionController( | ||
clientset, | ||
provisionerName, | ||
csiProvisioner, | ||
serverVersion.GitVersion, | ||
provisionerOptions..., | ||
) | ||
|
||
csiClaimController := ctrl.NewCloningProtectionController( | ||
clientset, | ||
claimLister, | ||
claimInformer, | ||
claimQueue, | ||
controllerCapabilities, | ||
) | ||
|
||
var capacityController *capacity.Controller | ||
if *enableCapacity { | ||
podName := os.Getenv("POD_NAME") | ||
namespace := os.Getenv("POD_NAMESPACE") | ||
if podName == "" || namespace == "" { | ||
klog.Fatalf("need POD_NAMESPACE/POD_NAME env variables, have only POD_NAMESPACE=%q and POD_NAME=%q", namespace, podName) | ||
namespace := os.Getenv("NAMESPACE") | ||
if namespace == "" { | ||
klog.Fatal("need NAMESPACE env variable for CSIStorageCapacity objects") | ||
} | ||
controller, err := owner.Lookup(config, namespace, podName, | ||
schema.GroupVersionKind{ | ||
Group: "", | ||
Version: "v1", | ||
Kind: "Pod", | ||
}, *capacityOwnerrefLevel) | ||
if err != nil { | ||
klog.Fatalf("look up owner(s) of pod: %v", err) | ||
var controller *metav1.OwnerReference | ||
if *capacityOwnerrefLevel >= 0 { | ||
podName := os.Getenv("POD_NAME") | ||
if podName == "" { | ||
klog.Fatal("need POD_NAME env variable to determine CSIStorageCapacity owner") | ||
} | ||
var err error | ||
controller, err = owner.Lookup(config, namespace, podName, | ||
schema.GroupVersionKind{ | ||
Group: "", | ||
Version: "v1", | ||
Kind: "Pod", | ||
}, *capacityOwnerrefLevel) | ||
if err != nil { | ||
klog.Fatalf("look up owner(s) of pod: %v", err) | ||
} | ||
klog.Infof("using %s/%s %s as owner of CSIStorageCapacity objects", controller.APIVersion, controller.Kind, controller.Name) | ||
} | ||
klog.Infof("using %s/%s %s as owner of CSIStorageCapacity objects", controller.APIVersion, controller.Kind, controller.Name) | ||
|
||
var topologyInformer topology.Informer | ||
if nodeDeployment == nil { | ||
|
@@ -447,11 +439,23 @@ func main() { | |
topologyInformer = topology.NewFixedNodeTopology(&segment) | ||
} | ||
|
||
managedByID := "external-provisioner" | ||
if *enableNodeDeployment { | ||
managedByID += "-" + node | ||
} | ||
|
||
// We only need objects from our own namespace. The normal factory would give | ||
// us an informer for the entire cluster. | ||
// us an informer for the entire cluster. We can further restrict the | ||
// watch to just those objects with the right labels. | ||
factoryForNamespace = informers.NewSharedInformerFactoryWithOptions(clientset, | ||
ctrl.ResyncPeriodOfCsiNodeInformer, | ||
informers.WithNamespace(namespace), | ||
informers.WithTweakListOptions(func(lo *metav1.ListOptions) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering how easy it is to detect if the informer doesn't fetch objects we expect, and whether we need to unit test this. If informer logic is wrong, two things could happen:
The latter case is hard to detect. So maybe a unit test is worthwhile (if it's not too much effort)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually since this is just an options configuration, the logic is simple enough, so it's probably low risk. Totally up to you if you want to unit test this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer to not test this. I suspect that it wouldn't be easy to test comprehensively and in the end is more about correct implementation of the informer and apiserver filtering than it is about correct configuration. |
||
lo.LabelSelector = labels.Set{ | ||
capacity.DriverNameLabel: provisionerName, | ||
capacity.ManagedByLabel: managedByID, | ||
}.AsSelector().String() | ||
}), | ||
) | ||
|
||
capacityController = capacity.NewCentralCapacityController( | ||
|
@@ -460,7 +464,8 @@ func main() { | |
clientset, | ||
// Metrics for the queue is available in the default registry. | ||
workqueue.NewNamedRateLimitingQueue(rateLimiter, "csistoragecapacity"), | ||
*controller, | ||
controller, | ||
managedByID, | ||
namespace, | ||
topologyInformer, | ||
factory.Storage().V1().StorageClasses(), | ||
|
@@ -469,8 +474,27 @@ func main() { | |
*capacityImmediateBinding, | ||
) | ||
legacyregistry.CustomMustRegister(capacityController) | ||
|
||
// Wrap Provision and Delete to detect when it is time to refresh capacity. | ||
csiProvisioner = capacity.NewProvisionWrapper(csiProvisioner, capacityController) | ||
} | ||
|
||
provisionController = controller.NewProvisionController( | ||
clientset, | ||
provisionerName, | ||
csiProvisioner, | ||
serverVersion.GitVersion, | ||
provisionerOptions..., | ||
) | ||
|
||
csiClaimController := ctrl.NewCloningProtectionController( | ||
clientset, | ||
claimLister, | ||
claimInformer, | ||
claimQueue, | ||
controllerCapabilities, | ||
) | ||
|
||
// Start HTTP server, regardless whether we are the leader or not. | ||
if addr != "" { | ||
// To collect metrics data from the metric handler itself, we | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add: "Setting an owner reference is highly recommended whenever possible (i.e. in the most common case that drivers are run inside containers)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the worst case scenario if orphaned objects aren't cleaned up?
For the case when all orphaned CSIStorageCapacity objects report not enough capacity for the volume to be provisioned, even though the driver is no longer there: the pod will fail to schedule and stay pending, whereas if objects are cleaned up properly, the pod is scheduled and the volume fails provioning, also causing the pod to stay pending. Once the driver comes back, CSIStorageCapacity objects will reflect the actual state again. So I think we are good in this case.
When we video chatted, I think we talked about a case where orphaned objects could cause the scheduler to hang? But I totally forgot which case it was :(. Maybe it's actually OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
As you said, orphaned and incorrect objects cause no permanent damage for pods with normal PVCs (which includes generic ephemeral volumes), so it's not too bad. It's worse for CSI ephemeral volumes because then a pod might get scheduled onto a node where the driver then cannot provide the volume and the pod is stuck there (no rescheduling). I think that was what we chatted about, not the scheduler itself hanging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah. Could you maybe mention this in the design doc somewhere (if it's not there already)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is covered by the generic ephemeral volume KEP.