Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make parent mount private before bind mounting rootfs #1148

Merged
merged 1 commit into from
Oct 26, 2016

Conversation

rhvgoyal
Copy link
Contributor

This reverts part of the commit eb0a144

That commit introduced two issues.

  • We need to make parent mount of rootfs private before bind mounting
    rootfs. Otherwise bind mounting root can propagate in other mount
    namespaces. (If parent mount is shared).

  • It broke test TestRootfsPropagationSharedMount() on Fedora.

    On fedora /tmp is a mount point with "shared" propagation. I think
    you should be able to reproduce it on other distributions as well
    as long as you mount tmpfs on /tmp and make it "shared" propagation.

    Reason for failure is that pivot_root() fails. And it fails because
    kernel does following check.

    IS_MNT_SHARED(new_mnt->mnt_parent)

    Say /tmp/foo is new rootfs, we have bind mounted rootfs, so new_mnt
    is /tmp/foo, and new_mnt->mnt_parent is /tmp which is "shared" on
    fedora and above check fails.

So this change broke few things, it is a good idea to revert part of it.

Signed-off-by: Vivek Goyal [email protected]

This reverts part of the commit eb0a144

That commit introduced two issues.

- We need to make parent mount of rootfs private before bind mounting
  rootfs. Otherwise bind mounting root can propagate in other mount
  namespaces. (If parent mount is shared).

- It broke test TestRootfsPropagationSharedMount() on Fedora.

  On fedora /tmp is a mount point with "shared" propagation. I think
  you should be able to reproduce it on other distributions as well
  as long as you mount tmpfs on /tmp and make it "shared" propagation.

  Reason for failure is that pivot_root() fails. And it fails because
  kernel does following check.

  IS_MNT_SHARED(new_mnt->mnt_parent)

  Say /tmp/foo is new rootfs, we have bind mounted rootfs, so new_mnt
  is /tmp/foo, and new_mnt->mnt_parent is /tmp which is "shared" on
  fedora and above check fails.

So this change broke few things, it is a good idea to revert part of it.

Signed-off-by: Vivek Goyal <[email protected]>
@rhvgoyal
Copy link
Contributor Author

Following commit introduced regression.

commit eb0a144
Author: Tatsushi Inagaki [email protected]
Date: Mon Feb 29 17:22:45 2016 +0900

Rootfs: reduce redundant parsing of mountinfo

Postpone parsing mountinfo until pivot_root() actually failed

Signed-off-by: Tatsushi Inagaki <[email protected]>

@rhvgoyal
Copy link
Contributor Author

Ok, looks like following is the PR for above mentioned commit.

#608

@rhvgoyal
Copy link
Contributor Author

cc @inatatsu @mrunalp

@mrunalp
Copy link
Contributor

mrunalp commented Oct 25, 2016

LGTM

Approved with PullApprove

@rhvgoyal
Copy link
Contributor Author

rhvgoyal commented Oct 25, 2016

Another way to reproduce problem manually is to do following.

  • Prepare a directory for runc tests. Say runc-tests.
    $ mkdir runc-tests

  • Explode an image in runc-tests/rootfs/

  • Generate a config.json runc spec

  • Specify rootfsPropagation as private in spec file.

    "rootfsPropagation" : "private",

  • Bind mount runc-tests
    $ mount --bind runc-tests runc-tests

  • Make sure runc-tests is "shared"
    $ mount --make-shared runc-tests

  • Now run a container
    $ cd runc-tests
    $ runc run foo

This fails for me with following error message.

container_linux.go:247: starting container process caused "process_linux.go:359: container init caused "rootfs_linux.go:89: jailing process inside rootfs caused \"pivot_root invalid argument\"""

@cyphar
Copy link
Member

cyphar commented Oct 25, 2016

@rhvgoyal This PR doesn't fix this problem for me:

% sudo unshare -m
# mount --bind . .
# mount --make-shared .
# runc run foo
rootfs_linux.go:89: jailing process inside rootfs caused "pivot_root invalid argument"

It also doesn't work if you do it with runc-tests rather than ..

Rejected.

Rejected with PullApprove

@rhvgoyal
Copy link
Contributor Author

@cyphar It is working for me. I tried the steps you mentioned. And works with .

Does it work without my patch?"

Are you using any specialized config.json? I am using one generated by runc spec and just added "rootfsPropagation": "private", to it.

What distro are you using.

Trying to come close to your configuration so that I can try to reproduce the issue and debug.

@cyphar
Copy link
Member

cyphar commented Oct 25, 2016

Here's my config.json I'm on openSUSE Tumbleweed with Linux gordon 4.7.6-1-default #1 SMP PREEMPT Fri Sep 30 12:22:14 UTC 2016 (fb37fcc) x86_64 x86_64 x86_64 GNU/Linux. It doesn't work without your patch either, but my point is that your test case wasn't fixed (for me).

{
    "ociVersion": "1.0.0-rc2-dev",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "process": {
        "terminal": true,
        "consoleSize": {
            "height": 0,
            "width": 0
        },
        "user": {
            "uid": 0,
            "gid": 0
        },
        "args": [
            "sh"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": [
            "CAP_AUDIT_WRITE",
            "CAP_KILL",
            "CAP_NET_BIND_SERVICE"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "noNewPrivileges": true
    },
    "root": {
        "path": "rootfs",
        "readonly": true
    },
    "hostname": "runc",
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620",
                "gid=5"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "sysfs",
            "source": "sysfs",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        },
        {
            "destination": "/sys/fs/cgroup",
            "type": "cgroup",
            "source": "cgroup",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "relatime",
                "ro"
            ]
        }
    ],
    "hooks": {},
    "linux": {
        "rootfsPropagation": "private",
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ]
        },
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "network"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            }
        ],
        "maskedPaths": [
            "/proc/kcore",
            "/proc/latency_stats",
            "/proc/timer_list",
            "/proc/timer_stats",
            "/proc/sched_debug",
            "/sys/firmware"
        ],
        "readonlyPaths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
        ]
    }
}

@rhvgoyal
Copy link
Contributor Author

@cyphar tried your config.json and that works for me to (on top of fedora 24). I guess I will try to install openSUSE tumbleweed.

@rhvgoyal
Copy link
Contributor Author

rhvgoyal commented Oct 26, 2016

@cyphar I just tested with openSuSE tumbleweed and it works for me.

Linux linux-e0t5 4.8.3-1-default #1 SMP PREEMPT Thu Oct 20 09:18:45 UTC 2016 (94eb9fb) x86_64 x86_64 x86_64 GNU/Linux

One thing which surprised me though the path to runc binary. I did make and make install and runc binary got install in /usr/local/sbin/.

But path was still resolving to /usr/sbin/runc. Hence new code was not being tested. So I did /usr/local/sbin/runc run foo and it worked.

Are you sure you are using new runc binary.

@cyphar
Copy link
Member

cyphar commented Oct 26, 2016

I must've forgotten to recompile runc. It works now. I'll LGTM once I get home and test it on the machine which actually failed.

@cyphar
Copy link
Member

cyphar commented Oct 26, 2016

Okay, I think I'm starting to go insane. I've managed to trigger a third failure mode, where runc just doesn't return (this happens on both master and your PR). The strace logs tell me that pivot_root(".", ".") fails with EINVAL. However it looks like a runC bug (not a kernel bug):

% cat /proc/$(pgrep -a 'runc run')/task/*/stack
[<ffffffff820fbbab>] futex_wait_queue_me+0xbb/0x110
[<ffffffff820fc995>] futex_wait+0x105/0x240
[<ffffffff820fe93f>] do_futex+0x1ff/0x510
[<ffffffff820fecbf>] SyS_futex+0x6f/0x140
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff820fbbab>] futex_wait_queue_me+0xbb/0x110
[<ffffffff820fc995>] futex_wait+0x105/0x240
[<ffffffff820fe93f>] do_futex+0x1ff/0x510
[<ffffffff820fecbf>] SyS_futex+0x6f/0x140
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff820fbbab>] futex_wait_queue_me+0xbb/0x110
[<ffffffff820fc995>] futex_wait+0x105/0x240
[<ffffffff820fe93f>] do_futex+0x1ff/0x510
[<ffffffff820fecbf>] SyS_futex+0x6f/0x140
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff820fbbab>] futex_wait_queue_me+0xbb/0x110
[<ffffffff820fc995>] futex_wait+0x105/0x240
[<ffffffff820fe93f>] do_futex+0x1ff/0x510
[<ffffffff820fecbf>] SyS_futex+0x6f/0x140
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff820bec52>] wait_woken+0x42/0x80
[<ffffffff824909a0>] n_tty_read+0x5b0/0x860
[<ffffffff8248ab8d>] tty_read+0x8d/0xf0
[<ffffffff82217cd3>] __vfs_read+0x23/0x130
[<ffffffff82218d41>] vfs_read+0x91/0x130
[<ffffffff8221a082>] SyS_read+0x42/0x90
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff820bec52>] wait_woken+0x42/0x80
[<ffffffff824909a0>] n_tty_read+0x5b0/0x860
[<ffffffff8248ab8d>] tty_read+0x8d/0xf0
[<ffffffff82217cd3>] __vfs_read+0x23/0x130
[<ffffffff82218d41>] vfs_read+0x91/0x130
[<ffffffff8221a082>] SyS_read+0x42/0x90
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff82672dd9>] unix_stream_read_generic+0x6b9/0x880
[<ffffffff82673060>] unix_stream_recvmsg+0x40/0x50
[<ffffffff825b3519>] sock_read_iter+0x89/0xd0
[<ffffffff82217d72>] __vfs_read+0xc2/0x130
[<ffffffff82218d41>] vfs_read+0x91/0x130
[<ffffffff8221a082>] SyS_read+0x42/0x90
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff820fbbab>] futex_wait_queue_me+0xbb/0x110
[<ffffffff820fc995>] futex_wait+0x105/0x240
[<ffffffff820fe93f>] do_futex+0x1ff/0x510
[<ffffffff820fecbf>] SyS_futex+0x6f/0x140
[<ffffffff826d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[<ffffffffffffffff>] 0xffffffffffffffff

@hqhq
Copy link
Contributor

hqhq commented Oct 26, 2016

I followed the step without this PR, and I got panic:

$ sudo runc run hq
^Cpanic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x38 pc=0x4cd269]

goroutine 1 [running]:
panic(0x7d0080, 0xc82000e0f0)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/urfave/cli.HandleAction.func1(0xc8202332e8)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/urfave/cli/app.go:478 +0x38e
panic(0x7d0080, 0xc82000e0f0)
        /usr/local/go/src/runtime/panic.go:443 +0x4e9
github.com/opencontainers/runc/libcontainer.(*genericError).Error(0x0, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/generic_error.go:93 +0x39
github.com/opencontainers/runc/libcontainer.createSystemError(0x7f115ea94260, 0x0, 0x857d80, 0xe, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/generic_error.go:78 +0x16e
github.com/opencontainers/runc/libcontainer.newSystemErrorWithCause(0x7f115ea94260, 0x0, 0x857d80, 0xe, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/generic_error.go:63 +0x4b
github.com/opencontainers/runc/libcontainer.(*initProcess).start(0xc8200dc100, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/process_linux.go:359 +0x9d0
github.com/opencontainers/runc/libcontainer.(*linuxContainer).start(0xc820112180, 0xc8200e4500, 0x1, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/container_linux.go:242 +0x126
github.com/opencontainers/runc/libcontainer.(*linuxContainer).Run(0xc820112180, 0xc8200e4500, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/container_linux.go:204 +0xf2
github.com/opencontainers/runc/libcontainer.(Container).Run-fm(0xc8200e4500, 0x0, 0x0)
        /home/qhuang/runc/utils_linux.go:235 +0x4a
main.(*runner).run(0xc820232cd8, 0xc82011e030, 0x0, 0x0, 0x0)
        /home/qhuang/runc/utils_linux.go:238 +0x835
main.startContainer(0xc8200e43c0, 0xc82011e000, 0x0, 0x0, 0x0, 0x0)
        /home/qhuang/runc/utils_linux.go:313 +0x416
main.glob.func13(0xc8200e43c0, 0x0, 0x0)
        /home/qhuang/runc/run.go:66 +0x77
reflect.Value.call(0x7527e0, 0x9043c0, 0x13, 0x84a580, 0x4, 0xc820233268, 0x1, 0x1, 0x0, 0x0, ...)
        /usr/local/go/src/reflect/value.go:435 +0x120d
reflect.Value.Call(0x7527e0, 0x9043c0, 0x13, 0xc820233268, 0x1, 0x1, 0x0, 0x0, 0x0)
        /usr/local/go/src/reflect/value.go:303 +0xb1
github.com/urfave/cli.HandleAction(0x7527e0, 0x9043c0, 0xc8200e43c0, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/urfave/cli/app.go:487 +0x2ee
github.com/urfave/cli.Command.Run(0x84e410, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8a2740, 0x1a, 0x0, ...)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/urfave/cli/command.go:191 +0xfec
github.com/urfave/cli.(*App).Run(0xc8200b0480, 0xc82000a120, 0x3, 0x3, 0x0, 0x0)
        /home/qhuang/runc/Godeps/_workspace/src/github.com/urfave/cli/app.go:240 +0xaa4
main.main()
        /home/qhuang/runc/main.go:137 +0xe24

And my host mount is messed up, all container mount info propagated to host. :(

I'm on ubuntu 14.04 with the latest runc master code.

@cyphar
Copy link
Member

cyphar commented Oct 26, 2016

@hqhq That's the same panic I get, but the panic only comes after you do a <C-c> of runC. For me this happens on both this PR and master.

@rhvgoyal
Copy link
Contributor Author

@hqhq I am getting that backtrace without this PR. runc hangs and when I do Ctrl-C, I get that backtrace. But with this PR, I don't see the hang and backtrace.

@rhvgoyal
Copy link
Contributor Author

@hqhq I reverted the pivot_root(".",".") patch and now runc hang goes away and I see the error message instead, as expected. (This is without my PR).

$ runc run foo
container_linux.go:247: starting container process caused "process_linux.go:359: container init caused "rootfs_linux.go:90: jailing process inside rootfs caused \"pivot_root invalid argument\"""

@cyphar
Copy link
Member

cyphar commented Oct 26, 2016

Eugh, that's not good. Now there's an issue with the pivot_root(".", ".") stuff... Hmmm...

I'll take a look at this next week (this week is pretty packed).

@rhvgoyal
Copy link
Contributor Author

@cyphar I can reproduce the problem without my PR. But with my PR I don't see the issue you mentioned. Please make sure you are testing with right binary.

Looks like when pivot_root(".", ".") fails, it makes some changes which can make the calling process behave badly and that's what we seem to be facing. My PR just avoids the failure, so I don't see the issue.

So, IMO, my PR is a safe change to do, irrespective of pivot_root(".", "."). It is fixing a regression introduced fixed few months back.

We should continue to debug though that why pivot_root(".",".") failure leaves runc in a hung state.

@rhvgoyal
Copy link
Contributor Author

I think problem might be that we do Fchdir(newroot) but in case of failure don't restore the orginal state back. Should we store cwd of calling process and restore it if pivot_root(".", ".") fails.

@cyphar
Copy link
Member

cyphar commented Oct 26, 2016

@rhvgoyal

Okay, it looks like my previous issues were me not rebuilding the binary properly (not sure). I just tried it again and it works. I'm going to lgtm this, and we can investigate the pivot_root weirdness separately. Maybe we should have a test case for this? Just do:

% unshare -m
% mount --bind . .
% mount --make-shared .
% runc run test

Up to you though, it's probably too special-purpose to justify.

LGTM.

Approved with PullApprove

@rhvgoyal
Copy link
Contributor Author

@cyphar I think previous issue is that rootfsParentMountPrivate(.) does something wrong. I had written this function and I think it will expect absolute path.

Changing it to rootfsParentMountPrivate(rootfs) solves the other problem you are seeing (without my PR).

Given my PR got rid of this code entirely, it automatically fixes the other issue too.

@mrunalp mrunalp merged commit 4599e70 into opencontainers:master Oct 26, 2016
@rhvgoyal
Copy link
Contributor Author

@cyphar create a PR for adding integration test to test above configuration. Hopefully that protects us from such regressions in future for this configuration.

#1151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants