-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running kubelet in a container: mounts #6848
Comments
It might work. Where's the proof-of-concept? I don't see why you need a MounterFactory - a single injected On Tue, Apr 14, 2015 at 9:15 PM, Paul Morie [email protected]
|
@thockin no POC yet; wanted to get a write-up out there first. On Wed, Apr 15, 2015 at 12:38 AM, Tim Hockin [email protected]
|
Didn't mean to be curt. I really mean "it might work, it's worth investing Quick test bears fruit: $ sudo ls -l /proc/1/ns/mnt $ docker run -ti --privileged -v /proc:/realproc busybox ls -l On Tue, Apr 14, 2015 at 9:42 PM, Paul Morie [email protected]
|
Natch 👍 I did think you were talking about a POC of kubelet running in container with the above changes. I've done the same test you did:
|
Well then, the next step is to hack kubelet to do volume mounts through an On Tue, Apr 14, 2015 at 10:49 PM, Paul Morie [email protected]
|
@thockin yep, agree |
For the record, this also works:
I didn't realize you could |
Yeah, when you said Factory I thought (interface?). It's true Java does unpleasant things to your brain....
|
the mount helpers are indeed tricky issues. I implemented a POC (aka hack) by intercepting mount(8) call in LD_PRELOAD. You can find the details in my code here |
Nice hack, I like it! An alternative to fork/exec-ing |
@vmarmol Yep, this is definitely a short-term hack. |
👍 |
I would say that getting mount to live in a container, but be able to mount things on the host, would be the holy grail here. So, for example, you could have a container with /usr/sbin/mount.glusterfs instead of it needing to live in the root filesystem as is required today (or with this hack). I'm not arguing against this hack, but more progress in line with what @rootfs is talking about could make it unnecessarily (some day). |
agree re: holy grail @eparis |
The downside to putting mount.glusterfs into a container is that container On Wed, Apr 15, 2015 at 2:24 PM, Paul Morie [email protected]
|
but isn't updating such a container easier than updating binaries in your host/vm (which also have to track some upstream source)? |
also, if this is something like fuse, where you must have a daemon running for the mount to function, wouldn't it be nice if that daemon was in a container, rather than running on the host itself? |
The host mount namespace should have access to the container's filesystem. Kinda dirty (break out of container's mount, find the container, use that FS for some things), but possible. It'd save us from depending on the host's. |
yeah, but that is Someone Else's Problem :) On Wed, Apr 15, 2015 at 2:29 PM, Eric Paris [email protected]
|
Well, yes, of course, but I am not sure I see the connection. Maybe I On Wed, Apr 15, 2015 at 2:33 PM, Eric Paris [email protected]
|
I'm suggesting a completely separate mount_guster container, which the kubelet could use to get a gusterfs mounted on the host (which docker can then put into another unrelated container). Now both the host and the kubelet container can be completely ignorant of gluster. It also wouldn't then matter if the kubelet was in a container or not, since the thing doing the mount would always be in a container. Paul's trick here works so long as the host has what it needs to mount the filesystem in question. But given a system with say the functionality of boot2docker, his trick can not solve the problem. He escapes into the host, but the host doesn't have the functionality. Even putting the binaries in the kubelet container does help a ton. You could, I guess, escape to the host, do some crazy LD_PRELOAD and PATH magic such that you ran the stuff back in the kubelet container, and execute the mount that way. But I'm not certain how to really make that work when you need a daemon running (like all fuse FS) Nothing about mounting + mount namespaces is easy :) and everything related to mounting gluster can be delivered by the "gluster" team. Same for other filesystems. (although a generic mount(8) container could work for ext[2-4], tmps, xfs, since they don't need helpers) |
Makes sense. I wish we had per node pod controller.....
|
@erictune @thockin @smarterclayton @eparis @vmarmol @rootfs One thing I ran into today is that the NFS plugin uses its own mounter abstraction with a different API that calls mount in a shell instead of making the syscall. I'm factoring that into its own mounter implementation that conforms to the interface, but it got me thinking (and I think the discussion here headed in the same direction) about the need to line up different mounter implementations with different plugins under different circumstances. Presumable there will be a need to differentiate the mounters different plugins need when running under a container. I think if we can make all the plugins use and be injected with the same mount interface, it will be a good start toward allowing the above. We can differentiate the mounter used for the different plugin type at the call site ( Really the specific method I've suggested here using the super privileged container approach is a temporary means. The long-term value we'll get from this is probably more in the dimension of making it easier to implement new strategies for dealing with running in a container on a plugin by plugin basis. |
Addendum: all of the strategies we've discussed in this issue so far imo would be implementable behind |
Why can't we have it? We deserve nice things! |
There's already a PR open for mount.Interface() in terms of exec On Wed, Apr 15, 2015 at 8:04 PM, Paul Morie [email protected]
|
@thockin thanks for the heads up, I like the nascent interface in #6400 better; I would love for this PR to go in on top of that. So, that said, I'm not going to try to rework all the volume stuff now -- just get the basic volumes working in a container. Once #6400 goes in, I'll rebase on top of it and pick up the new interface. |
Throwing something out there: if you run boot2docker, could you not run a service on the boot2docker vm that performs the mount and has a restful interface? In that case the mounter implementation can make a rest call to the mounter service. |
I think we can close this out now that PRs are going into master and open new issues as things develop. |
Related to: #4869 |
This looks related moby/moby#17034 |
Currently when the kubelet is run in a container, the mounts that the kubelet performs are not visible to the containers in pods because the kubelet runs in its own mount namespace with a private shared-subtree mode. In order to run the kubelet in a container, we must find a way to have the kubelet perform mounts in the host's mount namespace.
How to do the mount
The mount must be performed from the root mount namespace. Ideally, it would be possible to run a container in the host's root mount namespace with something like
docker run --mnt='host'
. However, this is currently not possible, although it has been requested. That being the case, one option is the super privileged container concept. The basic formula for a super privileged container to execute a command in the host's mount namespace is:nsenter
bitsdocker run -v /:/host
/proc
:host_mnt_ns=$(</host/proc/1/ns/mnt)
nsenter --mount=$host_mnt_ns <some command>
Wrinkle: mount helpers
When
mount -t <fstype>
is invoked, it looks for a mount helper namedmount.<fstype>
to delegate the mount operation to. In order for a containerized kubelet to be able to mount all filesystem types in the manner described here, the mount helpers would need to be installed on the host.Factoring kubernetes
Currently the volume plugins use the
mount.Interface
interface to perform mounts, which serves our purposes for keeping volume plugin code orthogonal from what is performing the mount. We can make an implementation of this interface which handles execing a subprocess that nsenter's the host's mount namespace to perform the mount without requiring any changes to the interface itself.Currently the volume plugins create the instance of
mount.Interface
directly using themount.New()
method. In order to facilitate injecting annsenter
ing mounter, we could instead provide volume plugins with a newMounterFactory
(via theHost
interface) which they can use to get a mounter:The
MounterFactory
should be an exported field on the Kubelet so that the creator of the Kubelet can provide whateverMounterFactory
implementation they want. This will facilitate using alternate implementations in downstream projects, integration tests, etc.The text was updated successfully, but these errors were encountered: