Propogate container's mountpoint to the host #632

rootfs · 2015-06-15T14:26:14Z

Current a container has to nsenter the host's mount namespace to mount filesystem and
share with other containers. This approach doesn't work if the filesystem mount
calls helper utility (/sbin/mount.XXX). This limitation makes containerized kubelet unable to mount certain filesystems.

This commit provides a new flag to make rootfs sharable. Since moving a shared rootfs is semantically confusing for pivot_root(2) and MS_MOVE. A new function changeRoot() is provided to switch rootfs to new destination.

Signed-off-by: Huamin Chen [email protected]

dqminh · 2015-06-15T14:35:48Z

Will #609 help if we allow containers so share namespaces with each others ? The referenced PR implements pid namespace, but I believe that mount namespace is implementable with some small changes.

rootfs · 2015-06-15T14:45:16Z

@dqminh I believe pid namespace is a stretch for the use case here. For example, if the container joins the host's namespace (/proc/{host_pid}/ns/mnt), then paths inside the container will be invisible, and container is unable to invoke the mount helpers in /sbin.

dqminh · 2015-06-15T15:13:19Z

@rootfs ahh so sorry for not reading the attached use case clearly. So the use case is about containerizing kubelet, i.e., kubelet can create and mount a directory on the host, and instruct docker to use that directory as a docker volume ? If so, yes, i doubt that sharing mount namespace is going to help.

rootfs · 2015-06-15T15:25:24Z

@dqminh you are right, one use case is for containerized kubelet to create and mount a directory on the host. But some filesystems (e.g. glusterfs and cephfs) invokes mount helpers, if these helpers are not installed on the host containerized kubelet cannot make the mount point show up on the host namespace.

Sharing mount namespace solves this problem: container installs the mount helpers, binds host's rootfs, mounts filesystems inside a shared rootfs.

mrunalp · 2015-06-15T15:31:21Z

This is similar to #623 with the difference being SLAVE/SHARED. Maybe it makes sense to have a single config field and set it private/slave/shared as required.

rhatdan · 2015-06-15T15:43:58Z

@mrunalp I agree. Although, I think you would want this as docker run time rather then setting up the daemon.

mrunalp · 2015-06-15T16:44:12Z

@rhatdan Do you mean as a flag to docker run?

rootfs · 2015-06-15T17:00:28Z

love the flag idea

rhatdan · 2015-06-15T19:11:17Z

Yes we need to do something like:

docker run --rootmount=shared fedora
docker run --rootmount=private fedora
docker run --rootmount=slave fedora

And set the default:

docker -d --rootmount=shared

mrunalp · 2015-06-15T19:13:09Z

@rhatdan Yes, that should work. It needs to be proposed to docker.

rhvgoyal · 2015-06-16T12:23:22Z

When containers are launched with "shared", I think any mount done by container will be visible only to docker daemon and not on the host. So this will still work if you want to bind mount that mount point to other containers (as docker daemon sees it). But any utilities on the host still can't see it.

mrunalp · 2015-06-16T14:18:39Z

Yes, that is true since the daemon is started in its own mount namespace. The utilities would have to join that mount namespace to see the mounts.

Sent from my iPhone

On Jun 16, 2015, at 5:23 AM, Vivek Goyal [email protected] wrote:

When containers are launched with "shared", I think any mount done by container will be visible only to docker daemon and not on the host. So this will still work if you want to bind mount that mount point to other containers (as docker daemon sees it). But any utilities on the host still can't see it.

—
Reply to this email directly or view it on GitHub.

crosbymichael · 2015-06-16T17:24:40Z

I think we need to take a step back and explain what the expected outcomes you think is correct in certain situations.

Examples:

If I mount a dir on the host and use it has a volume, I expect the same data to be inside the container and the host.
If I mount a dir inside a container, in an existing volume, I expect the host not to see the mount.

If we can figureout the expected outcomes its easier to find a solution.

Current a container has to nsenter the host's mount namespace to mount filesystem and share with other containers. This approach doesn't work if the filesystem mount calls helper utility (/sbin/mount.XXX). This commit provides a new flag and makes rootfs sharable. Signed-off-by: Huamin Chen <[email protected]>

rootfs · 2015-06-16T19:14:10Z

@crosbymichael

following your examples, here are my views based on my understanding based on [1]

	Propogation	Container mounts dir	Host mounts dir	Use case
1	Shared	Both container and host see the dir	Both container and host see the dir	Use a container to mount filesystems on the host so other containers can see it
2	Slave	container sees dir but host cannot	Both container and host see the dir	automount, etc
3	Private	container sees dir but host doesn't	host sees dir but container doesn't	current default

Reference
[1] https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

rhatdan · 2015-06-17T13:17:40Z

Yes I agree with this, and we can ignore the "Daemon" problem for now, lets get libcontainer to work correctly and allow us to run containers in different modes depending on the users goal.

If we go with use cases.

Shared:

On a projectatomic host, I want to install a SPC container, which contains gluster or cephs userspaces, I want to use these container userspaces to mount file systems that can be used on the host and by other containers.

Slave:

I want to run a container with a volume mount (bind) from the host, which the admin can later mount file systems on top of and these volumes can be seen inside of the container. A standard use case of this would be autofs.

Private,

As a user I want to be able to run a container, which has a volume mount. I do not want any mount changes on this volume to be seen within the container.

I would argue that Private is the least likely to be requested.

tjdett · 2015-06-30T00:28:53Z

Looking forward to this feature, as I'd like easier Docker filesystem mounting too.

It looks like this isn't the home of libcontainer anymore though, now that there's opencontainers/runc.

@rootfs Perhaps you should submit a PR over there now? I'd hate to see this feature abandoned because it got lost in the move.

rootfs · 2015-06-30T16:27:17Z

@tjdett sure, will submit to runc soon. thank you for the information.

pmorie · 2015-06-30T20:19:29Z

Just want to note that for the specific use-case we have for running the kubelet in a container is currently implemented as:

Enter the root mount namespace via nsenter in order to perform mounts
The kubelet also bind-mounts a directory for volumes that we need to be able to see new mountpoints under from the container

So, the current solution we have for this would work with the slave propagation mode. However, as @rootfs has pointed out, there are use-cases which we currently don't support for the containerized kubelet such as gluster, cephfs, etc, which would depend on the mount helpers / daemons being present on the host system, and so aren't really appropriate to go out to the host for. For these use-cases, we would require the shared mode in order to run the mount helpers / daemons / etc inside the container and eliminate dependencies on the host's setup.

Personally I think this should be a flag on the bind-mount spec itself since there will be different requirements probably for different volumes. I could imagine:

[host-path]:[container-path]:[Z]:[propagation-mode]

as a possible syntax. I think most admins would probably prefer to use private propagation modes wherever possible (but that is just my gut feeling).

rhatdan · 2015-07-01T10:44:33Z

I am fine with this, although I feel that a single flag for docker run would be fine in almost all cases. I would argue that most admin would expect SLAVE, and most ADMINS and developers would have no idea what we are talking about and have a hard time understanding what is going on.

Admins would expect that if a volume is mounted into a container and then I later mount on top of the directory, that this new mount point would show up inside the container. That is our experience with using mount namespaces all the way back to RHEL5.

LK4D4 · 2015-07-01T17:20:43Z

Was ported to runc as opencontainers/runc#77

GordonTheTurtle added the status/0-needs-triage label Jun 15, 2015

rhatdan mentioned this pull request Jun 15, 2015

Mount rootfs PRIVATE but volume mounts shared #623

Closed

GordonTheTurtle added the dco/no label Jun 15, 2015

rootfs force-pushed the ms_shared branch from 5d2ee00 to 3c92230 Compare June 15, 2015 20:19

GordonTheTurtle removed the dco/no label Jun 15, 2015

rootfs force-pushed the ms_shared branch from 3c92230 to a66af07 Compare June 16, 2015 19:05

rootfs mentioned this pull request Jun 22, 2015

Add --rootmount={shared:private:slave} flags to set rootfs mount moby/moby#14097

Closed

rootfs mentioned this pull request Jul 1, 2015

provides a new flag to make rootfs all sharable|slave|private propagation settable opencontainers/runc#77

Closed

LK4D4 closed this Jul 1, 2015

alban mentioned this pull request Jul 15, 2015

support mount propagation modes rkt/rkt#1149

Open

rootfs mentioned this pull request Aug 17, 2015

Add rootfs mountpropagation flags moby/moby#15648

Closed

reachlin mentioned this pull request Jul 6, 2017

travis docker version has a bug to support 'shared' flag travis-ci/worker#329

Closed

reachlin mentioned this pull request Jul 18, 2017

travis docker version has a bug to support 'shared' flag travis-ci/travis-ci#8104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propogate container's mountpoint to the host #632

Propogate container's mountpoint to the host #632

rootfs commented Jun 15, 2015

dqminh commented Jun 15, 2015

rootfs commented Jun 15, 2015

dqminh commented Jun 15, 2015

rootfs commented Jun 15, 2015

mrunalp commented Jun 15, 2015

rhatdan commented Jun 15, 2015

mrunalp commented Jun 15, 2015

rootfs commented Jun 15, 2015

rhatdan commented Jun 15, 2015

mrunalp commented Jun 15, 2015

rhvgoyal commented Jun 16, 2015

mrunalp commented Jun 16, 2015

crosbymichael commented Jun 16, 2015

rootfs commented Jun 16, 2015

rhatdan commented Jun 17, 2015

tjdett commented Jun 30, 2015

rootfs commented Jun 30, 2015

pmorie commented Jun 30, 2015

rhatdan commented Jul 1, 2015

LK4D4 commented Jul 1, 2015

Propogate container's mountpoint to the host #632

Propogate container's mountpoint to the host #632

Conversation

rootfs commented Jun 15, 2015

dqminh commented Jun 15, 2015

rootfs commented Jun 15, 2015

dqminh commented Jun 15, 2015

rootfs commented Jun 15, 2015

mrunalp commented Jun 15, 2015

rhatdan commented Jun 15, 2015

mrunalp commented Jun 15, 2015

rootfs commented Jun 15, 2015

rhatdan commented Jun 15, 2015

mrunalp commented Jun 15, 2015

rhvgoyal commented Jun 16, 2015

mrunalp commented Jun 16, 2015

crosbymichael commented Jun 16, 2015

rootfs commented Jun 16, 2015

rhatdan commented Jun 17, 2015

tjdett commented Jun 30, 2015

rootfs commented Jun 30, 2015

pmorie commented Jun 30, 2015

rhatdan commented Jul 1, 2015

LK4D4 commented Jul 1, 2015