-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple users/groups mapped for the rootless case #1529
Support multiple users/groups mapped for the rootless case #1529
Conversation
aa29e64
to
98ff8cf
Compare
98ff8cf
to
4a40d13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of comments, as well as some semantic questions that we need to discuss.
// mount verifies that the user isn't trying to set up any mounts they don't have | ||
// the rights to do. In addition, it makes sure that no mount has a `uid=` or | ||
// `gid=` option that doesn't resolve to root. | ||
func rootlessMount(config *configs.Config) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that dropping this makes sense. Even in the multiple-uid case, really we should be making it so that we emit an error if the uid=...
or gid=...
options reference unmapped users. The errors are far too cryptic if you ignore it.
config.Mounts[0].Data = "gid=0" | ||
if err := validator.Validate(config); err != nil { | ||
t.Errorf("Expected error to not occur when setting gid=0 in mount options: %+v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... so all of these removed tests should stay, and we should add new tests making sure that multiple-mapping cases are still treated sanely.
libcontainer/nsenter/nsexec.c
Outdated
@@ -192,22 +194,74 @@ static void update_setgroups(int pid, enum policy_t setgroup) | |||
} | |||
} | |||
|
|||
static void update_uidmap(int pid, char *map, size_t map_len) | |||
static int attempt_mapping_tool(const char *app, int pid, char *map, size_t map_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format this (all of your changes) in the Linux kernel style (sans the 80-column restriction).
libcontainer/nsenter/nsexec.c
Outdated
size_t i; | ||
int child; | ||
|
||
sprintf (pid_fmt, "%d", pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asprintf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be wasteful to allocate dynamic memory when we know it will be just to format an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just because I don't like that you've hard-coded the length of pid_fmt
to be 11
.
Personally I like @teddyking's approach to the formatting a bit more (we send most of the information from Go with a netlink message rather than formatting everything in C). I prefer that because the context in which this C code is run is quite fragile and I get less worried if we do things in Go when possible. I'll read through his PR again and point out which bits seem good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS #1403 is doing a very similar thing for preparing the argv for newuidmap/newgidmap
The main difference is that here I am not using strtok
and parsing the string inline without allocating any memory. I've avoided input validation here as it would be superfluous since newuidmap/newgidmap do that already.
libcontainer/nsenter/nsexec.c
Outdated
} | ||
else { | ||
int status; | ||
while (waitpid (child, &status, 0) < 0 && errno == EINTR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do
waitpid(child, &status, 0);
while (errno == EINTR);
If not, just put the ;
on a bare newline so it's more clear it's an empty loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, fixed and add an error check
libcontainer/nsenter/nsexec.c
Outdated
{ | ||
if (map == NULL || map_len <= 0) | ||
return; | ||
|
||
if (write_file(map, map_len, "/proc/%d/uid_map", pid) < 0) | ||
if (write_file(map, map_len, "/proc/%d/uid_map", pid) < 0) { | ||
if(errno == EPERM && is_rootless && attempt_mapping_tool ("/usr/bin/newuidmap", pid, map, map_len) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about hard-coding the /usr/bin/newuidmap
. At this point, $PATH
should still be trustworthy (so execvp
could work), but maybe @crosbymichael knows better than me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to execvp
so that PATH is honored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we'd quite like to bring over from #1403 is the ability to specify --newuidmap
and --newgidmap
binary locations. It's seems to be in line with the --criuPath
and it's our preferred way to be explicit in Garden. We can also wait for this to get merged and PR the flags separately if y'all want to keep this PR smaller, but it would be good to get buy in for that approach now.
main.go
Outdated
@@ -60,6 +60,14 @@ func main() { | |||
} | |||
v = append(v, fmt.Sprintf("spec: %s", specs.Version)) | |||
app.Version = strings.Join(v, "\n") | |||
|
|||
root := os.Getenv("XDG_RUNTIME_DIR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, as it will mean that if you run runc
as root with XDG_RUNTIME_DIR
set, then the spawned containers won't be visible by other runc
management processes. Instead, this should be conditional based on whether you're an unprivileged user. Even then I'm still not convinced, because what if someone sets up /run/runc
to be like /tmp
(sticky-bit and world-writeable) in order for all users to access the same store of container state?
I agree that runc
should obey XDG_RUNTIME_DIR
, but I'm not entirely convinced that this is correct (especially since changing XDG_RUNTIME_DIR
will make it more difficult for other people to manage the runc
containers spawned by someone with XDG_RUNTIME_DIR
set.
@crosbymichael do you have any ideas on how this should work nicely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to be used only when geteuid() !=0
. I think a shared directory for all users is a very uncommon use case, while most users would probably prefer to have runc working out of the box and behave like other applications that use XDG_RUNTIME_DIR
. The first case is still possible though by overriding the root directory, and I think it is preferable to require extra configuration for an uncommon use like a shared directory for all users.
libcontainer/nsenter/nsexec.c
Outdated
update_uidmap(child, config.uidmap, config.uidmap_len); | ||
update_gidmap(child, config.gidmap, config.gidmap_len); | ||
update_uidmap(child, config.is_rootless, config.uidmap, config.uidmap_len); | ||
update_gidmap(child, config.is_rootless, config.gidmap, config.gidmap_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not always try to use newuidmap
and newgidmap
rather than making it conditional on is_rootless
? (Though maybe you're right here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using newuidmap
and newgidmap
only after an EPERM
. When running as root
that should never happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so then the is_rootless
check is unecessary (which is what I meant)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I can remove that. I've pushed a new version where I drop it.
4d4023d
to
77f8908
Compare
@teddyking yes, we got to something very similar :) An extra point for considering this the right approach, and for the maintainers to pick what they prefer from both implementations |
07b33c0
to
be19b80
Compare
be19b80
to
e89e473
Compare
e89e473
to
a0270be
Compare
is there anything I can do to get this moving forward? |
Yeah, there's a few things left that we can take from #1403. I'll list them as a checklist below.
I'll also add some in-line comments. Sorry for not reviewing this in a while, I've been looking at the other rootless-container-related PR for cgroup handling most recently. |
libcontainer/nsenter/nsexec.c
Outdated
return -1; | ||
} | ||
|
||
static void update_uidmap(int pid, uint8_t is_rootless, char *map, size_t map_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_rootless
is still here, even though it isn't used anymore.
libcontainer/nsenter/nsexec.c
Outdated
} | ||
|
||
static void update_gidmap(int pid, char *map, size_t map_len) | ||
static void update_gidmap(int pid, uint8_t is_rootless, char *map, size_t map_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and it's still here too.
libcontainer/nsenter/nsexec.c
Outdated
bail("failed to update /proc/%d/gid_map", pid); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely stylistic, but I think this is more readable (same applies for update_uidmap
):
if (write_file(map, map_len, "/proc/%d/gid_map", pid) < 0) {
if (errno != EPERM)
bail("failed to update /proc/%d/gid_map", pid);
if (try_mapping_tool("newgidmap", pid, map, map_len))
bail("failed to use newgid map on %d", pid);
}
main.go
Outdated
if os.Geteuid() != 0 { | ||
runtimeDir := os.Getenv("XDG_RUNTIME_DIR") | ||
root = runtimeDir + "/runc" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't handle the case where XDG_RUNTIME_DIR
is unspecified safely. If it's not set, just use /run/runc
as usual.
if strings.HasPrefix(opt, "uid=") && opt != "uid=0" { | ||
return fmt.Errorf("cannot specify uid= mount options in rootless containers where argument isn't 0") | ||
if strings.HasPrefix(opt, "uid=") { | ||
uid, _ := strconv.Atoi(strings.Replace(opt, "uid=", "", 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the ignoring of errors, nor strings.Replace
here. Can we just do something like
var uid int
n, err := fmt.Sscanf(opt, "uid=%d", &uid)
if n != 1 || err != nil {
// Ignore unknown mount options.
continue
}
if !hasIDMapping(uid, config.UidMappings) {
return fmt.Errorf("cannot specify uid= mount options for unmapped uid in rootless containers")
}
And the same for the other option.
libcontainer/nsenter/nsexec.c
Outdated
* Convert the map string into a list of argument that | ||
* newuidmap/newgidmap can understand. | ||
*/ | ||
for (i = 0; i < map_len - 1; ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do this with strtok?
libcontainer/nsenter/nsexec.c
Outdated
int argc = 0; | ||
size_t i; | ||
|
||
sprintf (pid_fmt, "%d", pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it would never be an issue, use snprintf(pid_fmt, 16, ...)
.
Also I believe this needs a rebase, because the test failures are happening due to |
7b00799
to
4769977
Compare
@cyphar Thanks for the fixes! They LGTM |
5cb0e9c
to
959a954
Compare
Signed-off-by: Giuseppe Scrivano <[email protected]>
Take advantage of the newuidmap/newgidmap tools to allow multiple users/groups to be mapped into the new user namespace in the rootless case. Signed-off-by: Giuseppe Scrivano <[email protected]> [ rebased to handle intelrdt changes. ] Signed-off-by: Aleksa Sarai <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
After quite a bit of debugging, I found that previous versions of this patchset did not include newgidmap in a rootless setting. Fix this by passing it whenever group mappings are applied, and also providing some better checking for try_mapping_tool. This commit also includes some stylistic improvements. Signed-off-by: Aleksa Sarai <[email protected]>
With the addition of our new{uid,gid}map support, we used to call execvp(3) from inside nsexec. This would mean that the path resolution for the binaries would happen in nsexec. Move the resolution to the initial setup code, and pass the absolute path to nsexec. Signed-off-by: Aleksa Sarai <[email protected]>
Now that rootless containers have support for multiple uid and gid mappings, allow --user to work as expected. If the user is not mapped, an error occurs (as usual). Signed-off-by: Aleksa Sarai <[email protected]>
This is necessary in order to add proper opportunistic tests, and is a placeholder until we add tests for new{uid,gid}map configurations. Signed-off-by: Aleksa Sarai <[email protected]>
Enable several previously disabled tests (for the idmap execution mode) for rootless containers, in addition to making all tests use the additional mappings. At the moment there's no strong need to add any additional tests purely for rootless_idmap. Signed-off-by: Aleksa Sarai <[email protected]>
959a954
to
eb5bd4f
Compare
Added |
Take advantage of the newuidmap/newgidmap tools to allow multiple users/groups to be mapped into the new user namespace in the rootless case.
This is what I've tried here (only the relevant bits showed for config.json):