Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: call Prestart, Poststart hooks from correct places #1811

Closed

Conversation

dongsupark
Copy link

So far Prestart, Poststart hooks have been called from the context of create, which does not satisfy the runtime spec. That's why the runtime-tools validation tests like hooks_stdin.t have failed. Let's move call sites of Prestart, Poststart to correct places.

Unfortunately as for the Poststart hook, in practice it's not possible to tell whether a specific call site is from create context or run context. That's why we needed to allow Create and Run methods to accept another parameter action (of type CtAct). Doing that, it's possible to set a variable initProcess.IsCreate that allows us distinguish Create from Run. That's why the first commit libcontainer: move CtAct to libcontainer is needed.

I tested runtime-tools' validation/hooks_stdin.t with this PR of runc, and it worked fine. Though if there would be unexpected breakage from changing like this, please let me know.

It depends on a pending PR #1741.
See also #1710.

/cc @wking @liangchenye

@@ -276,6 +276,21 @@ func (c *linuxContainer) Exec() error {
func (c *linuxContainer) exec() error {
path := filepath.Join(c.root, execFifoFilename)

defer func() {
if c.config.Hooks != nil && len(c.config.Hooks.Poststart) > 0 {
Copy link
Contributor

@wking wking May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how cheap goroutines are, but you may want:

if c.config.Hooks != nil && len(c.config.Hooks.Poststart) > 0 {
  defer func() {
    ...
  }()
}

To avoid spawning one when there are no hooks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking Ok, it sounds reasonable. Will do it.

}
for i, hook := range c.config.Hooks.Poststart {
if err := hook.Run(s); err != nil {
fmt.Printf("%v", newSystemErrorWithCausef(err, "running poststart hook %d", i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing (to stdout?) is not a great API for callers. Can we collect these and return any failed-hook warnings to the caller?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking Right. I'll make it return warnings to the caller.

s, err := c.currentOCIState()
if err != nil {
return err
}
for i, hook := range c.config.Hooks.Poststart {
s.Status = "created"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this clobber now that these are called in the right place and no longer need to work around #1715?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking Yes, it makes sense. Will remove this line.
Thanks for pointing to #1715, which looks like another can of worms. ;-)

@dongsupark dongsupark force-pushed the dongsu/oci-hook-state branch from 0c199c4 to 3eadeb7 Compare May 31, 2018 09:29
@dongsupark
Copy link
Author

Rebased.
Added another commit to fix build errors under libcontainer/integration.

Addressed review comments.

  • moved the defer function to inside of the if clause.
  • made the defer function return warnings to the caller of exec().
  • removed unnecessary setting of status.

@dongsupark dongsupark force-pushed the dongsu/oci-hook-state branch 2 times, most recently from 9ceed31 to 9800ca8 Compare June 4, 2018 10:27
@flx42
Copy link

flx42 commented Sep 11, 2018

Dusting this off a little bit. The situation of hooks in the OCI runtime spec and in runc still seems messy today.

@dongsupark with your change, this comment becomes false:

// Signal the parent to run the pre-start hooks.
// The hooks are run after the mounts are setup, but before we switch to the new
// root, so that the old root is still available in the hooks for any mount
// manipulations.
// Note that iConfig.Cwd is not guaranteed to exist here.
if err := syncParentHooks(pipe); err != nil {
return err
}

It means no OCI hook would be able to do runtime's namespace -> container mounts since the switch to the new root is done as part of create, before any hook is called (according to the spec). We rely on this feature to enable GPU support as a pre-start hook today.

@wking
Copy link
Contributor

wking commented Sep 11, 2018

We rely on this feature to enable GPU support as a pre-start hook today.

Can't you stick to behavior laid out in the runtime spec and handle this sort of thing with mount propagation? For example, mount a shared directory into your container, and then use your hooks to fill content into that shared directory.

@flx42
Copy link

flx42 commented Sep 11, 2018

That forces you to shove everything inside a single directory. We want to bind-mount at standard locations instead. Also, it won't work for bind-mounting devices in /dev.

@wking
Copy link
Contributor

wking commented Sep 11, 2018

That forces you to shove everything inside a single directory.

You can mount multiple directories. And you may be able to make bind chains like:

  1. Bind host /dev/a/b/c to /some/share/a/b/c in the host mount namespace.
  2. Bind /share/a/b/c to /dev/a/b/c in the container mount namespace.

I haven't actually tested that, maybe it doesn't work.

[edit: tested, and it works]

@flx42
Copy link

flx42 commented Sep 11, 2018

Our CUDA stack expects devices to be in /dev, not in subdirectories.

@flx42
Copy link

flx42 commented Sep 11, 2018

Also, it also mean the OCI spec must have been modified with a bind-mount with shared propagation. Thus, instead of one prestart hook to enable GPU support for everyone, we need everyone to modify their OCI spec in order to get our prestart hook to work.

@wking
Copy link
Contributor

wking commented Sep 11, 2018

Our CUDA stack expects devices to be in /dev, not in subdirectories.

That doesn't seem like a critical issue, just adjust the paths in my example to match whatever you need.

Also, it also mean the OCI spec must have been modified with a bind-mount with shared propagation. Thus, instead of one prestart hook to enable GPU support for everyone, we need everyone to modify their OCI spec in order to get our prestart hook to work.

Agreed. But this seems like it would be straightforward enough to handle at a higher level, with a config similar to what libpod uses for hooks. And an approach like this would avoid having to update the spec to say "runtimes which implement containerization with a pivot, chroot or similar have to run the pre-start hooks somewhere inside the create workflow when ${THESE_THINGS} have happened but ${THESE_OTHER_THINGS} have not. Runtimes based on VMs and other things are off the hook". That sounds hard to specify clearly ;).

@flx42
Copy link

flx42 commented Sep 11, 2018

That doesn't seem like a critical issue, just adjust the paths in my example to match whatever you need.

I think it is, unless /dev in the container is a bind-mount with shared propagation, how can I make /dev/nvidia appear inside the container without a subdirectory?

Agreed. But this seems like it would be straightforward enough to handle at a higher level, with a config similar to what libpod uses for hooks.

Possible, but then we would better standardize this higher level construct. We don't want to have to integrate in each different container runtime. In other words, CNI but for devices :)

"runtimes which implement containerization with a pivot, chroot or similar have to run the pre-start hooks somewhere inside the create workflow when ${THESE_THINGS} have happened but ${THESE_OTHER_THINGS} have not. Runtimes based on VMs and other things are off the hook".

I think the spec is vague here. @alban mentioned to me that create should do the pivot/chroot per the following wording in the spec:

All of the properties configured in config.json except for process MUST be applied

From https://github.com/opencontainers/runtime-spec/blob/master/runtime.md#create
But what does configured means for the Root struct? Does that mean setting up the mount-point? Or does that mean pivot_root/chroot?

@alban
Copy link
Contributor

alban commented Sep 12, 2018

It means no OCI hook would be able to do runtime's namespace -> container mounts since the switch to the new root is done as part of create, before any hook is called (according to the spec). We rely on this feature to enable GPU support as a pre-start hook today.

If the rootfs is configured as propagation mode "slave"/"rslave", the OCI hook, running in the runtime's namespace could add bind mounts from the host /dev/foo into the $ROOTFS_CONTAINER/my-dev-foo and it will be propagated into /my-dev-foo in the container mount namespace.

As you noted in opencontainers/runtime-spec#973 (comment), it seems to be the default in runc. systemd-nspawn also uses "rslave".

@dongsupark
Copy link
Author

@flx42 Thanks for comments. I almost forgot about this PR. ;-)
Like @wking @alban mentioned, I suppose you can probably figure out how to deal with your issue via mount propagation.

BTW I'm still curious, what happened to the other PR, #1741, on which this PR depends on.
I also have not heard of any feedback from maintainers, whether I'm doing the right thing with this PR or not.

@cyphar
Copy link
Member

cyphar commented Nov 13, 2018

I think this might be needed for 1.0 purely because it's a pretty big spec-related change. However I know that the cri-o folks use this already so maybe we should be careful about changing it to be spec-compliant.

I will fix the conflict.

/cc @opencontainers/runc-maintainers @mrunalp @rhatdan

@cyphar
Copy link
Member

cyphar commented Nov 14, 2018

LGTM, though I would like input from the cri-o folks.

Approved with PullApprove

@mrunalp
Copy link
Contributor

mrunalp commented Nov 14, 2018 via email

@cyphar
Copy link
Member

cyphar commented Nov 14, 2018

Sure @mrunalp. This is the final patch I'm going to be pushing for in 1.0. As soon as this is merged I will prepare the 1.0 branch and send out the vote email.

@cyphar cyphar changed the title [WIP] libcontainer: call Prestart, Poststart hooks from correct places libcontainer: call Prestart, Poststart hooks from correct places Nov 14, 2018
@cyphar cyphar force-pushed the dongsu/oci-hook-state branch from 943d36d to 94059a5 Compare November 14, 2018 05:07
Copy link
Contributor

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this series is expected to land soon (hooray :), it may be time to remove the [WIP] from the PR subject ;).

I'm no longer familiar enough with the runc codebase to comment on where these hooks should live, so I haven't reviewed that portion of this PR (despite it being the most important portion ;).

@@ -364,21 +379,6 @@ func (p *initProcess) start() error {
return newSystemErrorWithCause(err, "setting Intel RDT config for ready process")
}
}

if p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Prestart) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the pre-start hooks here makes the earlier // Setup cgroup before prestart hook... and // call prestart hooks comments obsolete. Now that the pre-start hooks are firing before the start signal (after all creation has finished), maybe we can just drop those comments?

@@ -395,20 +395,6 @@ func (p *initProcess) start() error {
return newSystemErrorWithCause(err, "setting Intel RDT config for procHooks process")
}
}
if p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Prestart) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, this removal obsoletes some of the preceding comments. And maybe procHooks as a whole can go away now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the PR again, the changes have stopped making sense. I think I'm going to have to properly carry this PR...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to have to properly carry this PR...

Any chance we can just land #1741? I think we all agree on that part of this anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we can land that too (this PR includes it as the first commit). But my reason for wanting this in 1.0 is because currently runc doesn't correctly implement hooks -- and that's a bit of an issue if we want to be at least mostly spec compliant.

Copy link
Contributor

@wking wking Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we can land that too (this PR includes it as the first commit). But my reason for wanting this in 1.0 is because currently runc doesn't correctly implement hooks -- and that's a bit of an issue if we want to be at least mostly spec compliant.

Right. I just don't think we have to jump straight to compliant in a single PR. #1741 moves us a tiny bit closer, and doesn't seem to break anything, so let's land it. Then you or @dongsupark can pick up the timing fix from this PR and get that landed after you've worked it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main reasoning is that time-to-merge for time-critical PRs is usually the main blocker due to timezones. The plan for runc 1.0 is that I send out the release notification this week, and I am at KiwiCon from Thursday on.

But if you reopen the PR and rebase it now I will LGTM it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you reopen the PR and rebase it now I will LGTM it.

I cannot re-open on my own (I need someone with write access to the repo to do that). Once you re-open, I'll rebase (does it even need a rebase? When I rebase locally I see no conflicts).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way I'm fine.

If #1741 gets first merged, then I can rebase this PR #1811 on top of that. Also I can fix other issues with obsolete comments.

Or if you want to properly carry this PR with the other one altogether, then you can do so. Go ahead. :-)

@@ -345,6 +345,21 @@ func (p *initProcess) start() error {
sentResume bool
)

// runc start
if !p.IsCreate {
if p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Poststart) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can flatten these two to remove one level of nesting:

if !p.IsCreate && p.config.Config.Hooks != nil && len(p.config.Config.Hooks.Poststart) > 0 {

@cyphar cyphar added this to the 1.0.0 milestone Nov 14, 2018
@cyphar
Copy link
Member

cyphar commented Nov 14, 2018

I think it's necessary to rewrite some pretty large parts of this (in particular, procHooks needs to be removed and a few other such synchronisation bits needs to be cleaned up significantly). I will try to reuse some of these patches but this particular patchset is pretty hard to keep track of (I've never really been a fan of process_linux.go and container_linux.go -- they're really hard to understand).

@dongsupark
Copy link
Author

@cyphar Right.
I also think that process_linux.go and container_linux.go are too hard to understand. Though I didn't want to change many things there, because I was not familiar with background of the code. All I want was only to make it spec-compliant with minimal code changes.
But if you want to clean up more, sure, go ahead. 👍

@cyphar
Copy link
Member

cyphar commented Nov 14, 2018

I will play with it today and if I can't get it working with a cleaner set of changes, we can just use this as a stop-gap. Don't get me wrong, I really appreciate you working on this and I'm sorry that discussion and merging of this was delayed for so long.

@RenaudWasTaken
Copy link
Contributor

Hello!

As my colleague @flx42 mentioned, We rely on this feature to enable GPU support and this patch will break our users (nvidia-docker, CRIO, containerd).

The issue we are facing is that we rely on the ability to mount a number of things from the runtime's namespace to the container's namespace:

@wking suggested a solution for devices above however, that would break all our users today.
We would also have no real way to cleanup what's on the host, since unmount doesn't propagate.

Is there a solution here that wouldn't break us?

@wking
Copy link
Contributor

wking commented Nov 16, 2018

@wking suggested a solution for devices above however, that would break all our users today

Wouldn't injecting resources via bind mounts be identical for your users vs. your current approach? Of course, you'd need to deploy updated hooks before deploying updated runcs, but that seems doable. Am I missing something?

@cyphar cyphar modified the milestones: 1.0.0-rc6, 1.0.0 Nov 21, 2018
wking added a commit to wking/libpod that referenced this pull request Jan 9, 2019
There's been a lot of discussion over in [1] about how to support the
NVIDIA folks and others who want to be able to create devices
(possibly after having loaded kernel modules) and bind userspace
libraries into the container.  Currently that's happening in the
middle of runc's create-time mount handling before the container
pivots to its new root directory with runc's incorrectly-timed
prestart hook trigger [2].  With this commit, we extend hooks with a
'precreate' stage to allow trusted parties to manipulate the config
JSON before calling the runtime's 'create'.

I'm recycling the existing Hook schema from pkg/hooks for this,
because we'll want Timeout for reliability and When to avoid the
expense of fork/exec when a given hook does not need to make config
changes [3].

[1]: opencontainers/runc#1811
[2]: opencontainers/runc#1710
[3]: containers#1828 (comment)

Signed-off-by: W. Trevor King <[email protected]>
@AkihiroSuda
Copy link
Member

needs rebase

@AkihiroSuda
Copy link
Member

can we have integration tests?

@dongsupark
Copy link
Author

@AkihiroSuda Thanks, Akihiro.
Sure, I can rebase it again.
I will also try to add integration tests probably in this week.

Though I have not been following recent discussions regarding the potential regressions that Nvidia folks mentioned.
Have we somehow reached a consensus?

@h-vetinari
Copy link

Though I have not been following recent discussions regarding the potential regressions that Nvidia folks mentioned.
Have we somehow reached a consensus?

AFAICT, @RenaudWasTaken from Nvidia has been pinging for feedback in opencontainers/runtime-spec#1008, but there hasn't been any (publically visible) progress since two months.

@cyphar @vbatts @alban

Dongsu Park added 3 commits July 10, 2019 08:26
Move CtAct and its corresponding constants `CT_ACT_*` from the top-level
directory to `libcontainer`. For that, every call site needs to be
updated.

This is a preparation for the next commit, where Prestart, Poststart
hooks are moved to correct places.

Signed-off-by: Dongsu Park <[email protected]>
Since the API of `container.Run()` has changed, we need to also update
every call site of `container.Run()` under libcontainer/integration.

Signed-off-by: Dongsu Park <[email protected]>
So far Prestart, Poststart hooks have been called from the context
of create, which does not satisfy the runtime spec. That's why the
runtime-tools validation tests like `hooks_stdin.t` have failed.
Let's move call sites of Prestart, Poststart to correct places.

Unfortunately as for the Poststart hook, it's not possible to tell
whether a specific call site is from create context or run context.
That's why we needed to allow Create and Run methods to accept
another parameter `action` (of type `CtAct`). Doing that, it's possible
to set a variable `initProcess.IsCreate` that allows us distinguish
Create from Run.

See also opencontainers#1710.

Signed-off-by: Dongsu Park <[email protected]>
@dongsupark
Copy link
Author

@cyphar
As discussed, we should somehow review and merge opencontainers/runtime-spec#1008.

@kad
Copy link

kad commented Oct 23, 2019

Is this patchset up to date?

root@nuc1/mycontainer # ls -l /tmp/hook.log
ls: cannot access '/tmp/hook.log': No such file or directory
root@nuc1/mycontainer # ./runc-pr1811 create myid
root@nuc1/mycontainer # ls -l /tmp/hook.log
-rw-r--r-- 1 root root 141 Oct 23 23:19 /tmp/hook.log
root@nuc1/mycontainer # ./runc-pr1811 state myid
{
  "ociVersion": "1.0.1-dev",
  "id": "myid",
  "pid": 93793,
  "status": "created",
  "bundle": "/mycontainer",
  "rootfs": "/mycontainer/rootfs",
  "created": "2019-10-23T20:19:46.68684718Z",
  "owner": ""
}
root@nuc1/mycontainer # rm /tmp/hook.log
root@nuc1/mycontainer # ./runc-pr1811 start myid
root@nuc1/mycontainer # ./runc-pr1811 state myid
{
  "ociVersion": "1.0.1-dev",
  "id": "myid",
  "pid": 93793,
  "status": "running",
  "bundle": "/mycontainer",
  "rootfs": "/mycontainer/rootfs",
  "created": "2019-10-23T20:19:46.68684718Z",
  "owner": ""
}
root@nuc1/mycontainer # ls -l /tmp/hook.log
ls: cannot access '/tmp/hook.log': No such file or directory
root@nuc1/mycontainer # ./runc-pr1811 delete myid
root@nuc1/mycontainer # ./runc-pr1811 -v
runc version 1.0.0-rc9+dev
commit: e27dc4fbc2395fd30fddefcbcc8192f13eec5385
spec: 1.0.1-dev
root@nuc1/mycontainer #

Source code used: mater+this PR:

kad@nuc1~/go/src/github.com/opencontainers/runc $ git describe
v1.0.0-rc9-17-ge27dc4fb
kad@nuc1~/go/src/github.com/opencontainers/runc $ git log --oneline origin/master..
e27dc4fb (HEAD -> hooks) libcontainer: call Prestart, Poststart hooks from correct places
15708e5d libcontainer/integration: update call sites of container.Run()
a8dcfb98 libcontainer: move CtAct to libcontainer
kad@nuc1~/go/src/github.com/opencontainers/runc $

@vbatts
Copy link
Member

vbatts commented Feb 5, 2020

@RenaudWasTaken @flx42 @cyphar
This topic came up on the call today. The runtime-spec conversation has merged opencontainers/runtime-spec#1008, and we'll have a spec version shortly (like 1.0.2).
Can we rebase this and push it through?
@opencontainers/runc-maintainers PTAL

@cyphar
Copy link
Member

cyphar commented Feb 5, 2020

I can carry it if @dongsupark is tired of constantly rebasing this PR. :P

@RenaudWasTaken
Copy link
Contributor

I was waiting for a new version of the runtime spec before pushing a PR to runc (at least the create-runtime hook is trivial).
@cyphar since prestart hooks are deprecated now do we want to change them?

@vbatts
Copy link
Member

vbatts commented Feb 6, 2020 via email

@cyphar
Copy link
Member

cyphar commented Feb 6, 2020

@RenaudWasTaken The prestart hooks should keep their old behaviour so old NVIDIA hooks (from before the merge to the newer hooks) don't break on newer runc versions. We should keep them in runc until they're dropped from the spec (which isn't going to happen any time soon).

@RenaudWasTaken
Copy link
Contributor

The prestart hooks should keep their old behaviour so old NVIDIA hooks (from before the merge to the newer hooks) don't break on newer runc versions

In that case I think this PR can be closed, I can probably open PRs to add the other hooks so that you can review them and we will rebase them once the spec has a new release.

@cyphar
Copy link
Member

cyphar commented Feb 21, 2020

Yeah, I'll close in favour of newer PRs being opened that implement the now-in-the-spec semantics.

Thanks for all your hard work on this @dongsupark -- I know it's taken us a long time to solve this problem, but hopefully it'll all be over quite soon. 😸

@cyphar cyphar closed this Feb 21, 2020
@cyphar cyphar mentioned this pull request May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.