Skip to content
This repository has been archived by the owner on Dec 13, 2018. It is now read-only.

Propogate container's mountpoint to the host #632

Closed
wants to merge 1 commit into from

Conversation

rootfs
Copy link

@rootfs rootfs commented Jun 15, 2015

Current a container has to nsenter the host's mount namespace to mount filesystem and
share with other containers. This approach doesn't work if the filesystem mount
calls helper utility (/sbin/mount.XXX). This limitation makes containerized kubelet unable to mount certain filesystems.

This commit provides a new flag to make rootfs sharable. Since moving a shared rootfs is semantically confusing for pivot_root(2) and MS_MOVE. A new function changeRoot() is provided to switch rootfs to new destination.

Signed-off-by: Huamin Chen [email protected]

@dqminh
Copy link
Contributor

dqminh commented Jun 15, 2015

Will #609 help if we allow containers so share namespaces with each others ? The referenced PR implements pid namespace, but I believe that mount namespace is implementable with some small changes.

@rootfs
Copy link
Author

rootfs commented Jun 15, 2015

@dqminh I believe pid namespace is a stretch for the use case here. For example, if the container joins the host's namespace (/proc/{host_pid}/ns/mnt), then paths inside the container will be invisible, and container is unable to invoke the mount helpers in /sbin.

@dqminh
Copy link
Contributor

dqminh commented Jun 15, 2015

@rootfs ahh so sorry for not reading the attached use case clearly. So the use case is about containerizing kubelet, i.e., kubelet can create and mount a directory on the host, and instruct docker to use that directory as a docker volume ? If so, yes, i doubt that sharing mount namespace is going to help.

@rootfs
Copy link
Author

rootfs commented Jun 15, 2015

@dqminh you are right, one use case is for containerized kubelet to create and mount a directory on the host. But some filesystems (e.g. glusterfs and cephfs) invokes mount helpers, if these helpers are not installed on the host containerized kubelet cannot make the mount point show up on the host namespace.

Sharing mount namespace solves this problem: container installs the mount helpers, binds host's rootfs, mounts filesystems inside a shared rootfs.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2015

This is similar to #623 with the difference being SLAVE/SHARED. Maybe it makes sense to have a single config field and set it private/slave/shared as required.

@rhatdan
Copy link
Contributor

rhatdan commented Jun 15, 2015

@mrunalp I agree. Although, I think you would want this as docker run time rather then setting up the daemon.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2015

@rhatdan Do you mean as a flag to docker run?

@rootfs
Copy link
Author

rootfs commented Jun 15, 2015

love the flag idea

@rhatdan
Copy link
Contributor

rhatdan commented Jun 15, 2015

Yes we need to do something like:

docker run --rootmount=shared fedora
docker run --rootmount=private fedora
docker run --rootmount=slave fedora

And set the default:

docker -d --rootmount=shared

@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2015

@rhatdan Yes, that should work. It needs to be proposed to docker.

@rhvgoyal
Copy link

When containers are launched with "shared", I think any mount done by container will be visible only to docker daemon and not on the host. So this will still work if you want to bind mount that mount point to other containers (as docker daemon sees it). But any utilities on the host still can't see it.

@mrunalp
Copy link
Contributor

mrunalp commented Jun 16, 2015

Yes, that is true since the daemon is started in its own mount namespace. The utilities would have to join that mount namespace to see the mounts.

Sent from my iPhone

On Jun 16, 2015, at 5:23 AM, Vivek Goyal [email protected] wrote:

When containers are launched with "shared", I think any mount done by container will be visible only to docker daemon and not on the host. So this will still work if you want to bind mount that mount point to other containers (as docker daemon sees it). But any utilities on the host still can't see it.


Reply to this email directly or view it on GitHub.

@crosbymichael
Copy link
Contributor

I think we need to take a step back and explain what the expected outcomes you think is correct in certain situations.

Examples:

  1. If I mount a dir on the host and use it has a volume, I expect the same data to be inside the container and the host.
  2. If I mount a dir inside a container, in an existing volume, I expect the host not to see the mount.

If we can figureout the expected outcomes its easier to find a solution.

Current a container has to nsenter the host's mount namespace to mount filesystem and
share with other containers. This approach doesn't work if the filesystem mount
calls helper utility (/sbin/mount.XXX). This commit provides a new flag and makes rootfs sharable.

Signed-off-by: Huamin Chen <[email protected]>
@rootfs
Copy link
Author

rootfs commented Jun 16, 2015

@crosbymichael

following your examples, here are my views based on my understanding based on [1]

Propogation Container mounts dir Host mounts dir Use case
1 Shared Both container and host see the dir Both container and host see the dir Use a container to mount filesystems on the host so other containers can see it
2 Slave container sees dir but host cannot Both container and host see the dir automount, etc
3 Private container sees dir but host doesn't host sees dir but container doesn't current default

Reference
[1] https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

@rhatdan
Copy link
Contributor

rhatdan commented Jun 17, 2015

Yes I agree with this, and we can ignore the "Daemon" problem for now, lets get libcontainer to work correctly and allow us to run containers in different modes depending on the users goal.

If we go with use cases.

Shared:

On a projectatomic host, I want to install a SPC container, which contains gluster or cephs userspaces, I want to use these container userspaces to mount file systems that can be used on the host and by other containers.

Slave:

I want to run a container with a volume mount (bind) from the host, which the admin can later mount file systems on top of and these volumes can be seen inside of the container. A standard use case of this would be autofs.

Private,

As a user I want to be able to run a container, which has a volume mount. I do not want any mount changes on this volume to be seen within the container.

I would argue that Private is the least likely to be requested.

@tjdett
Copy link

tjdett commented Jun 30, 2015

Looking forward to this feature, as I'd like easier Docker filesystem mounting too.

It looks like this isn't the home of libcontainer anymore though, now that there's opencontainers/runc.

@rootfs Perhaps you should submit a PR over there now? I'd hate to see this feature abandoned because it got lost in the move.

@rootfs
Copy link
Author

rootfs commented Jun 30, 2015

@tjdett sure, will submit to runc soon. thank you for the information.

@pmorie
Copy link
Contributor

pmorie commented Jun 30, 2015

Just want to note that for the specific use-case we have for running the kubelet in a container is currently implemented as:

  1. Enter the root mount namespace via nsenter in order to perform mounts
  2. The kubelet also bind-mounts a directory for volumes that we need to be able to see new mountpoints under from the container

So, the current solution we have for this would work with the slave propagation mode. However, as @rootfs has pointed out, there are use-cases which we currently don't support for the containerized kubelet such as gluster, cephfs, etc, which would depend on the mount helpers / daemons being present on the host system, and so aren't really appropriate to go out to the host for. For these use-cases, we would require the shared mode in order to run the mount helpers / daemons / etc inside the container and eliminate dependencies on the host's setup.

Personally I think this should be a flag on the bind-mount spec itself since there will be different requirements probably for different volumes. I could imagine:

[host-path]:[container-path]:[Z]:[propagation-mode]

as a possible syntax. I think most admins would probably prefer to use private propagation modes wherever possible (but that is just my gut feeling).

@rhatdan
Copy link
Contributor

rhatdan commented Jul 1, 2015

I am fine with this, although I feel that a single flag for docker run would be fine in almost all cases. I would argue that most admin would expect SLAVE, and most ADMINS and developers would have no idea what we are talking about and have a hard time understanding what is going on.

Admins would expect that if a volume is mounted into a container and then I later mount on top of the directory, that this new mount point would show up inside the container. That is our experience with using mount namespaces all the way back to RHEL5.

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 1, 2015

Was ported to runc as opencontainers/runc#77

@LK4D4 LK4D4 closed this Jul 1, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants