Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-driver-registrar permissions #44

Closed
kfox1111 opened this issue Mar 25, 2019 · 38 comments
Closed

cluster-driver-registrar permissions #44

kfox1111 opened this issue Mar 25, 2019 · 38 comments
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@kfox1111
Copy link

cluster-driver-registrar design feels like it has some issues with respect to permissions and encourages what I think is an antipattern.

The trend of loading controllers into k8s and giving them enough access to self register crd's and other things has always felt a little bad. If your deploying with helm, helm is already dealing with registering objects and has all the permissions needed to register them. The permissions to do anything stop after its installed, so it can't be used to do further things to the cluster. This seems good. But, if you give a long running process permissions to do such things, then it can be used to do bad things later. Another facet is auditing. If its in helm, it can be easily parsed by a human and verified its safe before running. This is very hard to do if its buried inside code.

@saad-ali
Copy link
Member

Spoke with @msau42 about this.

Sounds like the proposal is to keep the CSIDriver object, but instead of having it generated by the cluster-driver-registrar sidecar container, just have the driver author provide the manifest for the CSIDriver and have it created as part of driver deployment. This will also make clean up easier.

Overall seems reasonable and will reduce complexity. So I support it.

@kfox1111
Copy link
Author

Yeah. Thats what I"m thinking.

@msau42
Copy link
Collaborator

msau42 commented Mar 25, 2019

This also eliminates the need for needing to run a long-running sidecar that will use up memory, but not really be actively reconciling anything

@lpabon
Copy link
Member

lpabon commented Mar 27, 2019

I'm ok with this as long as we document somewhere what to do.

@msau42
Copy link
Collaborator

msau42 commented Apr 1, 2019

We decided in today's CSI Meeting to go ahead and discontinue development of the cluster-driver-registrar. We need to deprecate and update docs:

  • update README with deprecation
  • update csi-docs: CSINode, CSIDriver. cluster-driver-registrar page
  • update hostpath deployment example to create objects directly
  • update e2es to not use cluster-driver-registrar

@msau42
Copy link
Collaborator

msau42 commented Apr 1, 2019

/assign @lpabon

@j-griffith
Copy link

One thing I started looking at was a need to be able to querie a plugins capabilities (ie from an external controller, perhaps I want to see if the provided storage supports Snapshots, Cloning etc etc). I think it's a fairly important usage feature. I was chatting with @lpabon and @msau42 about this and where it would live; it sounds like removing this may or may not have some implications.

@kfox1111
Copy link
Author

Being able to inspect the driver in order to build an appropriate csidriver object would be useful. But having a whole agent for that running all the time sounds like too much to me. what about a go cli tool that does the inspection and dumps out to stdout a first stab at a csidriver object?

@lpabon
Copy link
Member

lpabon commented Apr 23, 2019

@kfox1111 I think it would be tough to ask customers to have a separate tool. I think the only cli we can depend on is kubectl. We could have a command instead of an object, but in the end, kubectl/Kube api is the only control plane.

@kfox1111
Copy link
Author

kfox1111 commented Apr 23, 2019

I think there is a disconnect here...

I don't expect customers to build up the kubernetes packaging themselves. They will deploy static manifests or helm charts to load k8s objects that load the drivers into their cluster. As such, they will have an already built csidriver object as part of the deployment.

The separate tool, could optionally be used by the person building the k8s packaging for the driver.

@lpabon
Copy link
Member

lpabon commented Apr 23, 2019

@kfox1111 Yeah, I see where you are going. Do you mind attending tomorrow's kube-csi meeting to discuss it with the team?

@j-griffith
Copy link

I think there is a disconnect here...

I don't expect customers to build up the kubernetes packaging themselves. They will deploy static manifests or helm charts to load k8s objects that load the drivers into their cluster. As such, they will have an already built csidriver object as part of the deployment.

The separate tool, could optionally be used by the person building the k8s packaging for the driver.

Ahh, yeah that doesn't help my use case; say for example I have a controller running that periodically takes snapshots, it would be useful for that controller to be able to determine if the PVC assigned to it can even support Snapshots. Anyway, seems like a good primer for discussing this in tomorrows meeting.

@kfox1111
Copy link
Author

I can try. its been historically hard for me to make it though.

@kfox1111
Copy link
Author

@j-griffith thats what csidriver is for though I think? You can query k8s to see what features the csidriver supports?

@kfox1111
Copy link
Author

Spoke too soon. Looks like it doesn't have a flag today to say if it supports snapshots.

Maybe it should though?

@j-griffith
Copy link

Spoke too soon. Looks like it doesn't have a flag today to say if it supports snapshots.

Maybe it should though?

Yep, I was getting at proposing something to actually reflect the response of a full GetCapabilities call. I'm not sure how to request a describe on the csidriver object, anyway that's why I raised the point, gather all the pieces and bits of knowledge and see what we can come up with. It may not have any impact on the registrar removal at all.

@msau42
Copy link
Collaborator

msau42 commented Apr 23, 2019

@j-griffith to support your use case I think there would need two pieces to implement:

  • Add capabilities to the CSIDriver object
  • Add a tool that can automatically generate a CSIDriver object

I agree with @kfox1111 that we shouldn't need a long running process to do this, and some outside utility or some not long-running sidecar should be sufficient.

Now there's also the issue that some volume features don't actually have a specific capability (like raw block), and that capability can only be determined by trying to use it and getting back a runtime error.

@msau42
Copy link
Collaborator

msau42 commented Apr 23, 2019

@j-griffith can you describe your controller in more detail and what it does? It watches PVCs and then calls CreateSnapshot on the csi driver?

@kfox1111
Copy link
Author

Now there's also the issue that some volume features don't actually have a specific capability (like raw block), and that capability can only be determined by trying to use it and getting back a runtime error.

This still could be made a flag in CSIDriver and annotated by the packager. It just can't automatically be done.

@j-griffith
Copy link

j-griffith commented Apr 23, 2019

@msau42 That was an arbitrary example, but I'll expand on the idea a bit, I think this deserves it's own issue (not sure which project it would belong to, but I lean towards Kubernetes API). There are two things that stand out IMO:

  1. In general a user should be able to determine what capabilities the underlying storage they have access to provides. Whether they have multiple SC's or a single SC the only option I see currently to see if optional capabilities are supported is "try it and see if it works", whether that be Snapshot, Extend, Block Mode, Cloning (when it lands) etc.

  2. More specifically in my contrived use case; say I have a controller that creates PVCs and launches some application pod that persists its data to the assigned PVC. Let's also say one of the things that controller does is monitor the capacity usage of that PVC and expands it as necessary; or back to the Snapshot case, maybe I just have a simple Snapshot scheduler running. Ideally I'd be able to check with Kuberentes to determine if the features I require are even available BEFORE creating volumes, deploying my app and running for some period of time only to then try to execute some operation and have it fail.

All of this is more about usability for a standard cluster user, not a developer or a cluster admin. Don't know if that helps or not, I can be more precise if needed. The problem with "optional" capabilities is that they're "optional" so as a user I would like to have an authoritative source to determine what capabilities are or aren't available to me.

@kfox1111
Copy link
Author

part 1 is interesting in that csidriver wouldn't necessarily solve it either, as there is a disconnect between driver and storageclass. Users can not necessarily see the storage class details they are using.

so, part 2 sounds like something like an operator, automating what the person might do in part 1.

This is an interesting set of use cases for sure. Definitely bigger then this container though. This almost feels like its big enough to need a KEP?

@msau42
Copy link
Collaborator

msau42 commented Apr 23, 2019

Thanks for the details, this helps me understand what perspectives we're looking at this from.

I'd like to expand a little bit on each persona:

  • A platform admin that wants to decide what drivers to use in their cluster and maybe is doing initial investigation of the storage landscape and all the options available. At this point the driver is not installed and so there's nothing to query. I think the best option for discovering capabilities in this case is through documentation. We're trying to do something like this in our csi drivers page
  • A Kubernetes user wants to figure out what storage options are available to them in the cluster. Ideally, the platform team should have documented and described their supported options and what StorageClasses are available, but if not, then querying the StorageClass and/or CSIDriver objects could be a possible solution. However, since those objects are admin-scoped objects, then those users must have read access granted to them. However, just because those options exist doesn't mean that the users can use them. The admin may have set things like StorageClass quota per namespace.
  • A Kubernetes controller wants to automate some volume operations but wants to fail fast if the underlying driver doesn't support it. This is something that I think CSIDriver can help with, although without seeing a specific example, I see it more of an optimization. The operations themselves should already be failing with appropriate error messages.

@j-griffith
Copy link

part 1 is interesting in that csidriver wouldn't necessarily solve it either, as there is a disconnect between driver and storageclass. Users can not necessarily see the storage class details they are using.

so, part 2 sounds like something like an operator, automating what the person might do in part 1.

This is an interesting set of use cases for sure. Definitely bigger then this container though. This almost feels like its big enough to need a KEP?

Yeah, I think the gist of what I'm thinking here is to be able to use the Kubernetes API to say "hey, what capabilities are available for StorageClass foo, bar and biz; Then if I need/want to check a Volume for that same info I can easily derive it from the PVC object.

I apologize I'm not completely familiar with what's being suggested WRT the CSIDriver object, I believe folks are talking about an internal object that again isn't exposed via the Kube API.

I am inclined to agree that I don't think the registrar is where something like this should live.

@msau42
Copy link
Collaborator

msau42 commented Apr 23, 2019

CSIDriver is a K8s API object that currently contains some information about an installed driver: https://kubernetes-csi.github.io/docs/csi-driver-object.html

@j-griffith
Copy link

j-griffith commented Apr 23, 2019

CSIDriver is a K8s API object that currently contains some information about an installed driver: https://kubernetes-csi.github.io/docs/csi-driver-object.html

kubectl get csidrivers.storage.k8s.io
error: the server doesn't have a resource type "csidrivers"

@msau42
Copy link
Collaborator

msau42 commented Apr 23, 2019

Are you running in a 1.14 cluster?

We're currently recommending that drivers create the CSIDriver object manually as part of their deployment specs. Instead of running the cluster-driver-registrar

@kfox1111
Copy link
Author

kfox1111 commented Apr 24, 2019

Yeah, with my user persona hat on, I can see the problem. both storageclass and csidriver are kind of scoped to be cluster admin tools. There isn't really an object that is user facing with the required info in it. Usually that would be stuck in documentation somewhere provided by the admin saying what storageclasses existed and what things it supports. like, the gold storage class provides better ha, or the fast storage class provides x iops... but being able to ask the cluster what storageclasses support snapshotting is really important too...

The csidriver should provide the needed info. but the user cant see it, plus they don't use it directly anyway. The storageclass is what they use, but they typically cant see that either...

The problem I guess, is you need to see some of the csidriver info through the lens of a user looking at the storage class.... Maybe an option would be to extend the storageclass api to have a functionality endpoint?

so, in rbac something like:

- apiGroups:
  - "storage.k8s.io"
  resources:
  - storageclasses/functionality
  verbs:
  - get

Which would allow the user to ask for the safe bits of the associated csidriver object by the corresponding storageclass?

Definitely into needing a KEP though I think.

@lpabon
Copy link
Member

lpabon commented Apr 26, 2019

I think we may be overengineering this. At is simplest, we use cluster-driver-registrar as is today: One container creating a CSIDriver object providing information about the driver to the caller (user/admin).

The issue is that this is a running process wasting memory since once the CSIDriver object is created, it does not need to update it any longer.

We have discussed the possibility of changing cluster-driver-registrar from a running loop to a begin-end program so that it can be run in a Job. That way it will:

  1. Create/update the CSIDriver object after communicating with the csi driver. Then stop.
  2. Not waste memory.

If this is not desired, we can have another side-car container (provisioner?) register the CSIDriver. It is running already and it has leader support.

@msau42
Copy link
Collaborator

msau42 commented Apr 26, 2019

I spoke with @liggitt about the self-registration and permissions aspects from the original concern. The fact that this is a controller-level deployment, and can therefore be run in dedicated/restricted nodes (which can control specifically what workloads run on it) means that we're not concerned about this level of self-reporting compared to something like self-reporting Nodes.

@kfox1111 's concern about auditing I think still applies. It's not easy to see the spec created before deployment since it's created on demand. But I think manually crafting the CSIDriver object also makes it more likely that it's going to get out of date (and maybe that's ok). Maybe we can support both models? The sidecar that creates the object on demand, but it also has a "dry run" or "offline" mode that only writes the spec out to a file.

@lpabon
Copy link
Member

lpabon commented Apr 26, 2019

@kfox1111 's concern about auditing I think still applies. It's not easy to see the spec created before deployment since it's created on demand. But I think manually crafting the CSIDriver object also makes it more likely that it's going to get out of date (and maybe that's ok). Maybe we can support both models? The sidecar that creates the object on demand, but it also has a "dry run" or "offline" mode that only writes the spec out to a file.

I think we can leave it up to the vendors then to decide which model to use

@kfox1111
Copy link
Author

Its also up to the operators. Which is my primary hat. Id much rather see a CSIDriver object up front, then to give a driver access to register them itself. If you let the vendors do too much, it makes it harder on the operators to validate. So no easy answer there.

@kfox1111
Copy link
Author

Also, the risk of getting out of date applies to all of the packaging, not just the csidriver object. For example, what if the driver starts saying "I support snapshots" but the packaging doesn't add the sidecar for snapshot support yet? Having the csidriver packaging decoupled from the driver allows the specific csidriver info to be at the package level, stating what the whole k8s driver (csi driver + k8s packaging) supports.

@liggitt
Copy link

liggitt commented Apr 26, 2019

note that I'm not opposed to the admin-created object, I just wasn't especially concerned about drivers self-reporting

I do think it needs to be clearer whether the CSIDriver spec is:

  • a tool for the storage-admin to control what features are enabled for a specific driver (which, to work, must be a subset of the features supported by the driver)
  • a recording of the observed or reported features supported by the driver

@kfox1111
Copy link
Author

hmm.. I wasn't thinking it that way but that is interesting. I did not see it as an admin object but a packager. so maybe there really are 3 sets of different information:

  • the information from the csi driver on what features it supports
  • the information from the k8s packager that wraps up the csi driver stating what support the packaged version supports
  • a way for the admin to turn off support for a feature as they don't want it supported (disable snapshot support for example)

Support for the admin to override it might be useful.

I still think the csidriver object should be a packager object though. If the driver was fully self describing, it should have enough info that you could run a commandline tool against the csidriver and it would fully generate all the kubernetes packaging automatically. statefulsets/daemonsets/csidriver/rbac/etc. Doing just csidriver but not the rest feels like it could cause issues.

@jsafrane
Copy link
Contributor

jsafrane commented Jul 2, 2019

Did we reach any conclusion here? Our documentation says:

As of Kubernetes 1.14, this side car container is being refactored.

It has been two releases without any progress in the refactoring / redesign. The documentation also says that CSIDriver instances should be created "by installation manifest".

I am proposing deprecation of this sidecar in #48.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 30, 2019
@msau42
Copy link
Collaborator

msau42 commented Oct 9, 2019

All documentation has been updated to mark this as deprecated and e2e tests updated.
/close

@k8s-ci-robot
Copy link
Contributor

@msau42: Closing this issue.

In response to this:

All documentation has been updated to mark this as deprecated and e2e tests updated.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

9 participants