-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rootfs: make pivot_root not use a temporary directory #1125
rootfs: make pivot_root not use a temporary directory #1125
Conversation
cool, testing on RHEL. |
@rhvgoyal could you also try this out? |
okay, this messed up the mounts on my host while testing. |
After running the tests on the host, this is all that remained.
|
Ah, fun. Yeah, this is kinda what I was worried about. I'll need to play around with this more -- the code for |
@mrunalp Does this patch help? diff --git a/libcontainer/rootfs_linux.go b/libcontainer/rootfs_linux.go
index 4f21416ce233..0014cb82e6e6 100644
--- a/libcontainer/rootfs_linux.go
+++ b/libcontainer/rootfs_linux.go
@@ -10,6 +10,7 @@ import (
"os/exec"
"path"
"path/filepath"
+ "runtime"
"strings"
"syscall"
"time"
@@ -566,6 +567,13 @@ func setupPtmx(config *configs.Config, console *linuxConsole) error {
// pivotRoot will call pivot_root such that rootfs becomes the new root
// filesystem, and everything else is cleaned up.
func pivotRoot(rootfs string) error {
+ // We change the cwd during this function. Because Go is multithreaded and
+ // it doesn't appear to correctly implement POSIX thread semantics, we have
+ // to ensure we don't switch threads here. Otherwise we start napalming the
+ // host's filesystem.
+ runtime.LockOSThread()
+ defer runtime.UnlockOSThread()
+
// While the documentation may claim otherwise, pivot_root(".", ".") is
// actually valid. What this results in is / being the new root but
// /proc/self/cwd being the old root. Since we can play around with the cwd |
@cyphar Nope, that patch doesn't help :/ |
// Make pivotDir rprivate to make sure any of the unmounts don't | ||
// propagate to parent. | ||
if err := syscall.Mount("", pivotDir, "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil { | ||
if err := syscall.Unmount(".", syscall.MNT_DETACH); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to mark all old mounts as private before umounting them
syscall.Mount("", ".", "", syscall.MS_PRIVATE|syscall.MS_REC, "")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avagin Yes, we were making the old mounts private before but it got removed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, whoops. I'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied this fix. PTAL.
This looks like a cool idea. Will try tomorrow morning. |
return err | ||
} | ||
syscall.Close(oldroot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have a defer syscall.Close(oldroot)
. I am wondering why this call is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was to make it clear that we didn't need that fd after that point, but I've removed it.
|
||
// We cannot umount(".") because currently our . is newroot. So we need to | ||
// switch back to oldroot before doing a MNT_DETACH (since we still have an | ||
// open fd to it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So looks like you are changing cwd
of the process and root of the process continues to be rootfs
. IOW, .
being new root is not the problem. Probably problem is that .
is cwd of the process and it will keep mounts busy. Hence you are changing cwd to oldroot so that unmount succeeds.
If this is correct, it might be a good idea to modify comment a bit to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MNT_DETACH
should allow us to unmount it without changing directories, from my reading of pivot_root
it does a lot of tomfoolery that I believe might change the cwd. But I'll double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused -- I just removed this line and the code still works. So I think you're right (that .
is still oldroot
) but you're wrong that MNT_DETACH
will fail if we unmount without changing directories.
code looks good. Is it safe to test on my machine? i'm scared.... |
I tested it on my machine and it worked for me. |
Namely, use an undocumented feature of pivot_root(2) where pivot_root(".", ".") is actually a feature and allows you to make the old_root be tied to your /proc/self/cwd in a way that makes unmounting easy. Thanks a lot to the LXC developers which came up with this idea first. This is the first step of many to allowing runC to work with a completely read-only rootfs. Signed-off-by: Aleksa Sarai <[email protected]>
@rhvgoyal While the |
if err := syscall.Chdir("/"); err != nil { | ||
return fmt.Errorf("chdir / %s", err) | ||
|
||
// Currently our "." is oldroot (according to the current kernel code). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what does this comment mean. I suspect you are saying that cwd of process has not changed and oldroot is continuing to be cwd of the process?
If yes, old root is "/" and that's was not necessarily cwd of the process at the time of call to pivot_root().
May be say something like. pivot_root() does not guarantee what will happen to cwd of the calling process. It is a possibility that pivot_root() changed cwd to new root and in that case following umount() will fail. So to be safe, change dir to oldroot
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you're right (I thought about it some more). pivot_root(a, b)
will make a
the new root and mount the old root at b
. What we're doing here is unmounting the old root, because the old root is also on b
. Since b
is .
it matters whether or not pivot_root
has changed your cwd
. It turns out that this does change your cwd
in the current Linux implementation, but I don't want to depend on that (the mount logic is very dodgy in the kernel).
Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.
This is the first step of many to allowing runC to work with a
completely read-only rootfs.
Fixes #1122.
Signed-off-by: Aleksa Sarai [email protected]