-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsenter: guarantee correct user namespace ordering #977
Conversation
Alright @opencontainers/runc-maintainers this is the next PR in the "rewriting nsenter" series of patches. PTAL, the changes here are quite a bit more intrusive than #950. |
|
||
/* This *must* be called before we touch gid_map. */ | ||
static void update_setgroups(int pid, bool setgroup) | ||
static void update_setgroups(int pid, enum policy_t setgroup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think we ever set this to deny or noop when we go into here ?, seems like something like allow_setgroup(pid)
is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For rootless containers, we need to set deny. This patch is based on the rootless containers patch, and I don't see why we have to restrict it (it'll be useful once I rebase the rootless containers PR on top of the nsenter cleanup -- something I'm not looking forward to). If we remove it now, I'll just have to re-add it later.
Rebase'd. |
getting a ton of issues building this patch.
|
None of that is meant to happen. I'll fix those up today. :P |
Alright, I've fixed them up. Really the warnings aren't an issue (they're happening in an error path and we can't do anything if the writes fail). |
@cyphar Janky failed seems related. |
I remember fixing the same issue a week ago. It might be a dodgy rebase. :/ |
@hqhq Alright, I fixed that issue. It was because I had misplaced brackets in the PTAL /cc @opencontainers/runc-maintainers |
@@ -0,0 +1,49 @@ | |||
/* | |||
* Copyright 2014, 2015 Docker Inc. | |||
* Copyright 2015, 2016 The Linux Foundation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct, the original copyright in the license stays as it is, it doesn't just stop because the repo changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm also not a fan of starting to add these on the files, we have a overall license and it applies to all files in the repo, we don't need them per file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, it's a matter of taste I guess. I'll drop the headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyphar thanks
@@ -87,7 +87,7 @@ func TestNsenterInvalidPaths(t *testing.T) { | |||
|
|||
namespaces := []string{ | |||
// join pid ns of the current process | |||
fmt.Sprintf("/proc/%d/ns/pid", -1), | |||
fmt.Sprintf("pid:/proc/%d/ns/pid", -1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should try "pid:/proc/%d/ns/net" here to test the ns path validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea. I've added an extra test for this, so we can test both cases (invalid type specifier and invalid path).
This avoids us from running into cases where libcontainer thinks that a particular namespace file is a different type, and makes it a fatal error rather than causing broken functionality. Signed-off-by: Aleksa Sarai <[email protected]>
Depending on your SELinux setup, the order in which you join namespaces can be important. In general, user namespaces should *always* be joined and unshared first because then the other namespaces are correctly pinned and you have the right priviliges within them. This also is very useful for rootless containers, as well as older kernels that had essentially broken unshare(2) and clone(2) implementations. This also includes huge refactorings in how we spawn processes for complicated reasons that I don't want to get into because it will make me spiral into a cloud of rage. The reasoning is in the giant comment in clone_parent. Have fun. In addition, because we now create multiple children with CLONE_PARENT, we cannot wait for them to SIGCHLD us in the case of a death. Thus, we have to resort to having a child kindly send us their exit code before they die. Hopefully this all works okay, but at this point there's not much more than we can do. Signed-off-by: Aleksa Sarai <[email protected]>
Pinging this since it's needed for #975 (which is blocking -rc3). /cc @opencontainers/runc-maintainers |
I tested on RHEL 7.2. It is failing :/
|
Pretty sure I had run the tests with 975 on RHEL once earlier hence this surprised me. I will debug further.
|
@mrunalp Any update? |
I played around a bit and I can get past this issue with this diff. I don't think this fixes #959. It will most likely need changes similar to what I posted in #959. I will test that and get back in a bit.
|
:/ I really don't like the code in #959, because it breaks up unsharing of |
@mrunalp Since the SELinux issue still happens on |
I will run other tests on RHEL7 and report back here to check that there are no regressions vs master. |
@cyphar I ran the tests on master vs this PR + my diff and they are the same (There are 4 failures) unrelated to this PR. If you can update the PR to include my suggested changes, we can merge this. |
@mrunalp Can this issue be reproduced on a stock Fedora / CentOS VM? I'd like to play around with it a bit to see if I can get another solution to work. If you also know approximately where in the kernel the relevant checks are (presumably in the bowels of SELinux) that'd be great. |
I meant the diff I posted for combining the unshare of all namespaces without which we don't get test parity with master on RHEL 7. For 959 which we can handle separately you can use a Fedora VM to debug.
|
Without this patch applied, RHEL's SELinux policies cause container creation to not really work. Unfortunately this might be an issue for rootless containers (#774) but we'll cross that bridge when we come to it. Signed-off-by: Aleksa Sarai <[email protected]>
@mrunalp I've added that patch as a separate commit so that we can track the old way of doing it in the future. PTAL, it should be good to merge now. |
@crosbymichael This needs LGTM again if you have no more comments. |
/* | ||
* Okay, so this is quite annoying. | ||
* | ||
* In order to make sure that deal with older kernels (when CLONE_NEWUSER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version of old kernels are we talking about here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mentioned later in the comment.
This was fixed in upstream commit 1f7f4dde5c945f41a7abc2285be43d918029ecc5, and was introduced by 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that's for CLONE_PATENT
and CLONE_NEWPID
thing, is that the same thing as CLONE_NEWUSER
not be handled first?
I was thinking if it's not a too new version, maybe we can claim it as a limitation, making the logic much more complicated to handle such a legacy issue seems not a worthy tradeoff to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. The actual reason for the split is the various issues described underneath the section. This split is the whole reason for this PR (so removing the split would make this PR kinda pointless), as there are several legitimate non-legacy issues fixed by this split such as #975.
I can fix the comment, but I don't really want to wait another two weeks for two LGTMs because people are quite busy. I can fix the comment in a later PR if it's not a huge issue for you right now.
@hqhq ❤️ I can open a PR with some of your nits later. |
Namespaces do not need repeated entries and the ordering is handled by the runtime regardless of the spec ordering (e.g. in runC [1]). Using an object leans on the new wording from eeaccfa (glossary: Make objects explicitly unordered and forbid duplicate names, 2016-09-27, opencontainers#584) to make both of those points explicit. [1]: opencontainers/runc#977 Subject: nsenter: guarantee correct user namespace ordering Signed-off-by: W. Trevor King <[email protected]>
Namespaces do not need repeated entries and the ordering is handled by the runtime regardless of the spec ordering (e.g. in runC [1]). Using an object leans on the new wording from eeaccfa (glossary: Make objects explicitly unordered and forbid duplicate names, 2016-09-27, opencontainers#584) to make both of those points explicit. [1]: opencontainers/runc#977 Subject: nsenter: guarantee correct user namespace ordering Signed-off-by: W. Trevor King <[email protected]>
Namespaces do not need repeated entries and the ordering is handled by the runtime regardless of the spec ordering (e.g. in runC [1]). Using an object leans on the new wording from eeaccfa (glossary: Make objects explicitly unordered and forbid duplicate names, 2016-09-27, opencontainers#584) to make both of those points explicit. [1]: opencontainers/runc#977 Subject: nsenter: guarantee correct user namespace ordering Signed-off-by: W. Trevor King <[email protected]>
* way anyway. | ||
*/ | ||
if (unshare(config.cloneflags) < 0) | ||
bail("failed to unshare namespaces"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyphar I didn't notice you append this commit before, does this mean we actually still can't guarantee correct user namespace ordering for old kernels which don't create user namespace first when it's specified along with other CLONE_NEW* flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an SELinux issue that meant that splitting it would cause some issues. Ask @mrunalp for more information. So, while this currently doesn't fix the actual issue I listed in the PR description there are a bunch of plumbing changes that make rootless containers as well as #975 and #976 possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see that after revisiting comments, so looks like we don't have any better ideas to fix the issue (user namespace joining ordering on old kernels) for now. Can you open a PR to update comments accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I've opened #1165 accordingly. PTAL.
Depending on your SELinux setup, the order in which you join namespaces
can be important. In general, user namespaces should always be joined
and unshared first because then the other namespaces are correctly
pinned and you have the right priviliges within them. This also is very
useful for rootless containers, as well as older kernels that had
essentially broken unshare(2) and clone(2) implementations.
This also includes huge refactorings in how we spawn processes for
complicated reasons that I don't want to get into because it will make
me spiral into a cloud of rage. The reasoning is in the giant comment in
clone_parent. Have fun.
In addition, because we now create multiple children with CLONE_PARENT,
we cannot wait for them to SIGCHLD us in the case of a death. Thus, we
have to resort to having a child kindly send us their exit code before
they die. Hopefully this all works okay, but at this point there's not
much more than we can do.
TODO
:Signed-off-by: Aleksa Sarai [email protected]
This is based on #950.Base has been merged.