-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libcontainer: call Prestart, Poststart hooks from correct places #1811
Conversation
libcontainer/container_linux.go
Outdated
@@ -276,6 +276,21 @@ func (c *linuxContainer) Exec() error { | |||
func (c *linuxContainer) exec() error { | |||
path := filepath.Join(c.root, execFifoFilename) | |||
|
|||
defer func() { | |||
if c.config.Hooks != nil && len(c.config.Hooks.Poststart) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how cheap goroutines are, but you may want:
if c.config.Hooks != nil && len(c.config.Hooks.Poststart) > 0 {
defer func() {
...
}()
}
To avoid spawning one when there are no hooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Ok, it sounds reasonable. Will do it.
libcontainer/container_linux.go
Outdated
} | ||
for i, hook := range c.config.Hooks.Poststart { | ||
if err := hook.Run(s); err != nil { | ||
fmt.Printf("%v", newSystemErrorWithCausef(err, "running poststart hook %d", i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printing (to stdout?) is not a great API for callers. Can we collect these and return any failed-hook warnings to the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Right. I'll make it return warnings to the caller.
libcontainer/container_linux.go
Outdated
s, err := c.currentOCIState() | ||
if err != nil { | ||
return err | ||
} | ||
for i, hook := range c.config.Hooks.Poststart { | ||
s.Status = "created" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this clobber now that these are called in the right place and no longer need to work around #1715?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0c199c4
to
3eadeb7
Compare
Rebased. Addressed review comments.
|
9ceed31
to
9800ca8
Compare
Dusting this off a little bit. The situation of hooks in the OCI runtime spec and in @dongsupark with your change, this comment becomes false: runc/libcontainer/rootfs_linux.go Lines 80 to 87 in 70ca035
It means no OCI hook would be able to do runtime's namespace -> container mounts since the switch to the new root is done as part of create , before any hook is called (according to the spec). We rely on this feature to enable GPU support as a pre-start hook today.
|
Can't you stick to behavior laid out in the runtime spec and handle this sort of thing with mount propagation? For example, mount a shared directory into your container, and then use your hooks to fill content into that shared directory. |
That forces you to shove everything inside a single directory. We want to bind-mount at standard locations instead. Also, it won't work for bind-mounting devices in |
You can mount multiple directories. And you may be able to make bind chains like:
I haven't actually tested that, maybe it doesn't work. [edit: tested, and it works] |
Our CUDA stack expects devices to be in |
Also, it also mean the OCI spec must have been modified with a bind-mount with shared propagation. Thus, instead of one prestart hook to enable GPU support for everyone, we need everyone to modify their OCI spec in order to get our prestart hook to work. |
That doesn't seem like a critical issue, just adjust the paths in my example to match whatever you need.
Agreed. But this seems like it would be straightforward enough to handle at a higher level, with a config similar to what libpod uses for hooks. And an approach like this would avoid having to update the spec to say "runtimes which implement containerization with a pivot, chroot or similar have to run the pre-start hooks somewhere inside the |
I think it is, unless
Possible, but then we would better standardize this higher level construct. We don't want to have to integrate in each different container runtime. In other words, CNI but for devices :)
I think the spec is vague here. @alban mentioned to me that
From https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#create |
If the rootfs is configured as propagation mode "slave"/"rslave", the OCI hook, running in the runtime's namespace could add bind mounts from the host As you noted in opencontainers/runtime-spec#973 (comment), it seems to be the default in runc. systemd-nspawn also uses "rslave". |
@flx42 Thanks for comments. I almost forgot about this PR. ;-) BTW I'm still curious, what happened to the other PR, #1741, on which this PR depends on. |
9800ca8
to
943d36d
Compare
I'll review and get back on this tomorrow.
…Sent from my iPhone
On Nov 13, 2018, at 8:00 PM, Aleksa Sarai ***@***.***> wrote:
LGTM, though I would like input from the cri-o folks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sure @mrunalp. This is the final patch I'm going to be pushing for in 1.0. As soon as this is merged I will prepare the 1.0 branch and send out the vote email. |
943d36d
to
94059a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this series is expected to land soon (hooray :), it may be time to remove the [WIP]
from the PR subject ;).
I'm no longer familiar enough with the runc codebase to comment on where these hooks should live, so I haven't reviewed that portion of this PR (despite it being the most important portion ;).
libcontainer/process_linux.go
Outdated
@@ -364,21 +379,6 @@ func (p *initProcess) start() error { | |||
return newSystemErrorWithCause(err, "setting Intel RDT config for ready process") | |||
} | |||
} | |||
|
|||
if p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Prestart) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the pre-start hooks here makes the earlier // Setup cgroup before prestart hook...
and // call prestart hooks
comments obsolete. Now that the pre-start hooks are firing before the start signal (after all creation has finished), maybe we can just drop those comments?
libcontainer/process_linux.go
Outdated
@@ -395,20 +395,6 @@ func (p *initProcess) start() error { | |||
return newSystemErrorWithCause(err, "setting Intel RDT config for procHooks process") | |||
} | |||
} | |||
if p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Prestart) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here again, this removal obsoletes some of the preceding comments. And maybe procHooks
as a whole can go away now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the PR again, the changes have stopped making sense. I think I'm going to have to properly carry this PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm going to have to properly carry this PR...
Any chance we can just land #1741? I think we all agree on that part of this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we can land that too (this PR includes it as the first commit). But my reason for wanting this in 1.0 is because currently runc
doesn't correctly implement hooks -- and that's a bit of an issue if we want to be at least mostly spec compliant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we can land that too (this PR includes it as the first commit). But my reason for wanting this in 1.0 is because currently runc doesn't correctly implement hooks -- and that's a bit of an issue if we want to be at least mostly spec compliant.
Right. I just don't think we have to jump straight to compliant in a single PR. #1741 moves us a tiny bit closer, and doesn't seem to break anything, so let's land it. Then you or @dongsupark can pick up the timing fix from this PR and get that landed after you've worked it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main reasoning is that time-to-merge for time-critical PRs is usually the main blocker due to timezones. The plan for runc 1.0 is that I send out the release notification this week, and I am at KiwiCon from Thursday on.
But if you reopen the PR and rebase it now I will LGTM it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you reopen the PR and rebase it now I will LGTM it.
I cannot re-open on my own (I need someone with write access to the repo to do that). Once you re-open, I'll rebase (does it even need a rebase? When I rebase locally I see no conflicts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libcontainer/process_linux.go
Outdated
@@ -345,6 +345,21 @@ func (p *initProcess) start() error { | |||
sentResume bool | |||
) | |||
|
|||
// runc start | |||
if !p.IsCreate { | |||
if p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Poststart) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can flatten these two to remove one level of nesting:
if !p.IsCreate && p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Poststart) > 0 {
I think it's necessary to rewrite some pretty large parts of this (in particular, |
@cyphar Right. |
I will play with it today and if I can't get it working with a cleaner set of changes, we can just use this as a stop-gap. Don't get me wrong, I really appreciate you working on this and I'm sorry that discussion and merging of this was delayed for so long. |
Hello! As my colleague @flx42 mentioned, We rely on this feature to enable GPU support and this patch will break our users (nvidia-docker, CRIO, containerd). The issue we are facing is that we rely on the ability to mount a number of things from the runtime's namespace to the container's namespace: @wking suggested a solution for devices above however, that would break all our users today. Is there a solution here that wouldn't break us? |
Wouldn't injecting resources via bind mounts be identical for your users vs. your current approach? Of course, you'd need to deploy updated hooks before deploying updated runcs, but that seems doable. Am I missing something? |
There's been a lot of discussion over in [1] about how to support the NVIDIA folks and others who want to be able to create devices (possibly after having loaded kernel modules) and bind userspace libraries into the container. Currently that's happening in the middle of runc's create-time mount handling before the container pivots to its new root directory with runc's incorrectly-timed prestart hook trigger [2]. With this commit, we extend hooks with a 'precreate' stage to allow trusted parties to manipulate the config JSON before calling the runtime's 'create'. I'm recycling the existing Hook schema from pkg/hooks for this, because we'll want Timeout for reliability and When to avoid the expense of fork/exec when a given hook does not need to make config changes [3]. [1]: opencontainers/runc#1811 [2]: opencontainers/runc#1710 [3]: containers#1828 (comment) Signed-off-by: W. Trevor King <[email protected]>
needs rebase |
can we have integration tests? |
@AkihiroSuda Thanks, Akihiro. Though I have not been following recent discussions regarding the potential regressions that Nvidia folks mentioned. |
AFAICT, @RenaudWasTaken from Nvidia has been pinging for feedback in opencontainers/runtime-spec#1008, but there hasn't been any (publically visible) progress since two months. |
Move CtAct and its corresponding constants `CT_ACT_*` from the top-level directory to `libcontainer`. For that, every call site needs to be updated. This is a preparation for the next commit, where Prestart, Poststart hooks are moved to correct places. Signed-off-by: Dongsu Park <[email protected]>
Since the API of `container.Run()` has changed, we need to also update every call site of `container.Run()` under libcontainer/integration. Signed-off-by: Dongsu Park <[email protected]>
So far Prestart, Poststart hooks have been called from the context of create, which does not satisfy the runtime spec. That's why the runtime-tools validation tests like `hooks_stdin.t` have failed. Let's move call sites of Prestart, Poststart to correct places. Unfortunately as for the Poststart hook, it's not possible to tell whether a specific call site is from create context or run context. That's why we needed to allow Create and Run methods to accept another parameter `action` (of type `CtAct`). Doing that, it's possible to set a variable `initProcess.IsCreate` that allows us distinguish Create from Run. See also opencontainers#1710. Signed-off-by: Dongsu Park <[email protected]>
ba70db6
to
b8b88a3
Compare
@cyphar |
Is this patchset up to date?
Source code used: mater+this PR:
|
@RenaudWasTaken @flx42 @cyphar |
I can carry it if @dongsupark is tired of constantly rebasing this PR. :P |
I was waiting for a new version of the runtime spec before pushing a PR to runc (at least the create-runtime hook is trivial). |
How should we deprecate the old ones?
|
@RenaudWasTaken The prestart hooks should keep their old behaviour so old NVIDIA hooks (from before the merge to the newer hooks) don't break on newer runc versions. We should keep them in runc until they're dropped from the spec (which isn't going to happen any time soon). |
In that case I think this PR can be closed, I can probably open PRs to add the other hooks so that you can review them and we will rebase them once the spec has a new release. |
Yeah, I'll close in favour of newer PRs being opened that implement the now-in-the-spec semantics. Thanks for all your hard work on this @dongsupark -- I know it's taken us a long time to solve this problem, but hopefully it'll all be over quite soon. 😸 |
So far Prestart, Poststart hooks have been called from the context of create, which does not satisfy the runtime spec. That's why the runtime-tools validation tests like
hooks_stdin.t
have failed. Let's move call sites of Prestart, Poststart to correct places.Unfortunately as for the Poststart hook, in practice it's not possible to tell whether a specific call site is from create context or run context. That's why we needed to allow Create and Run methods to accept another parameter
action
(of typeCtAct
). Doing that, it's possible to set a variableinitProcess.IsCreate
that allows us distinguish Create from Run. That's why the first commitlibcontainer: move CtAct to libcontainer
is needed.I tested runtime-tools'
validation/hooks_stdin.t
with this PR of runc, and it worked fine. Though if there would be unexpected breakage from changing like this, please let me know.It depends on a pending PR #1741.
See also #1710.
/cc @wking @liangchenye