Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass back the pid of runc:[1:CHILD] so we can wait on it #1506

Conversation

LittleLightLittleFire
Copy link
Contributor

@LittleLightLittleFire LittleLightLittleFire commented Jul 5, 2017

Fixes #1443


// Clean up the zombie parent process
firstChildProcess, err := os.FindProcess(pid.PidFirstChild)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that propagating the error here makes sense (especially since you might get ECHILD -- I haven't checked the Go source yet). Maybe we should do something like this

if firstChildProcess, err := os.FindProcess(pid.PidFirstChild); err == nil {
	if _, err := firstChildProcess.Wait(); err != nil {
		return err
	}
}

Ditto for the pid.Pid. @mrunalp @dqminh @hqhq WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also are we sure it's not possible to get PidFirstChild = -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.FindProcess can't fail on Unix systems. https://godoc.org/os#FindProcess

On Unix systems, FindProcess always succeeds and returns a Process for the given pid, regardless of whether the process exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also are we sure it's not possible to get PidFirstChild = -1?

Pretty sure it is impossible since it'd mean the grandchild'd pid has been received. This implies that the child has already been created.

I can just change the nsenter function to differentiate between child and grandchild.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, let me know what changes you want to see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would process.Wait return if the process doesn't exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case the Wait is what will fail if the process doesn't exist -- so maybe some thing more like

if _, err := firstChildProcess.Wait(); err != nil {
	if err != unix.ECHILD && err != unix.ESRCH {
		return err
	}
}

Maybe?

Copy link
Contributor Author

@LittleLightLittleFire LittleLightLittleFire Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the concern that runc:[1:CHILD] has already been reaped by an external reaper? Otherwise it should always exist (and wait(2) can't fail).

if err != nil {
return err
}
if _, err := firstChildProcess.Wait(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL, basically this should either just ignore the error:

_, _ = firstChildProcess.Wait()

Or explicitly special-case unix.ESRCH or unix.ECHILD, but I'm not convinced that makes much more sense than just ignoring the error (if you grep for Wait you'll find it).

if err != nil {
return err
}
if _, err := firstChildProcess.Wait(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

This allows the libcontainer to automatically clean up runc:[1:CHILD]
processes created as part of nsenter.

Signed-off-by: Alex Fang <[email protected]>
@LittleLightLittleFire LittleLightLittleFire force-pushed the 1443-runc-reap-child-process branch from 0d21c21 to e92add2 Compare August 5, 2017 03:44
@LittleLightLittleFire
Copy link
Contributor Author

Pull request updated

@cyphar
Copy link
Member

cyphar commented Aug 5, 2017

LGTM.

/cc @opencontainers/runc-maintainers

Approved with PullApprove

@crosbymichael
Copy link
Member

crosbymichael commented Aug 7, 2017

LGTM

Approved with PullApprove

@crosbymichael crosbymichael merged commit d40db12 into opencontainers:master Aug 7, 2017
@LittleLightLittleFire LittleLightLittleFire deleted the 1443-runc-reap-child-process branch August 8, 2017 10:46
@c4milo
Copy link

c4milo commented Aug 10, 2017

does this mean we can run runc --detach using systemd with Type=notify, without leaving zombie processes around?

@cyphar
Copy link
Member

cyphar commented Aug 11, 2017

@c4milo Probably, I would have to test it to find out, but this was intended to solve the more general intermediate zombie issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants