Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send slavefd to the other side when terminal=true #1446

Closed
wants to merge 1 commit into from

Conversation

dqminh
Copy link
Contributor

@dqminh dqminh commented May 11, 2017

Fix #1434

Currently when terminal=true and container side decides to close stdio and also
eventually close /dev/console, it effectively looses the slave side of the pty
pair, and IO on the master results in EIO

If you use EPOLL on all IO we can pretend EIO is ok and continue, but we dont.
And moving to EPOLL is not really easy or straighforward.

Instead when terminal=true, this patch attempts to send over the valid slavefd
when it was opened, and the master side will hold on to it, so no matter what
the container side is doing we will always have a valid pty pair.

The following Go program simulate the container behavior:

package main

import (
	"fmt"
	"log"
	"os"
	"syscall"
	"time"
)

func main() {
	console, err := os.OpenFile("/dev/console", os.O_RDWR, 0)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintln(console, "write before dup /dev/null to stdio")
	makeNullStdio()
	fmt.Fprintln(console, "write after dup /dev/null to stdio")
	console.Close()

	logf, _ := os.Create("/tmp/logfile")
	for {
		console, err = os.OpenFile("/dev/console", os.O_RDWR, 0)
		if err != nil {
			fmt.Fprint(logf, err)
			os.Exit(1)
		}
		fmt.Fprintln(console, "open, write to console, and close")
		console.Close()
		time.Sleep(time.Second)
	}
}

func makeNullStdio() {
	nullFd, _ := os.OpenFile("/dev/null", os.O_RDWR, 0)
	defer nullFd.Close()
	fd := int(nullFd.Fd())
	syscall.Dup2(fd, 0)
	syscall.Dup2(fd, 1)
	syscall.Dup2(fd, 2)
}

Without the patch, we have the following output:

hanamura# runc run hello
write before dup /dev/null to stdio
write after dup /dev/null to stdio

After the patch

hanamura# runc run hello
write before dup /dev/null to stdio
write after dup /dev/null to stdio
open, write to console, and close
open, write to console, and close
open, write to console, and close

Signed-off-by: Daniel Dao [email protected]

@dqminh
Copy link
Contributor Author

dqminh commented May 11, 2017

This is hacky IMO, but i sent it anyway so we can discuss.

cc @crosbymichael @cyphar @TomasTomecek

@dqminh dqminh force-pushed the systemd-console branch from ad67fc5 to 113b719 Compare May 11, 2017 22:21
Currently when terminal=true and container side decides to close stdio and also
eventually close /dev/console, it effectively looses the slave side of the pty
pair, and IO on the master results in EIO

If you use EPOLL on all IO we can pretend EIO is ok and continue, but we dont.
And moving to EPOLL is not really easy or straighforward.

Instead when terminal=true, this patch attempts to send over the valid slavefd
when it was opened, and the master side will hold on to it, so no matter what
the container side is doing we will always have a valid pty pair.

The following Go program simulate the container behavior:

```
package main

import (
	"fmt"
	"log"
	"os"
	"syscall"
	"time"
)

func main() {
	console, err := os.OpenFile("/dev/console", os.O_RDWR, 0)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Fprintln(console, "write before dup /dev/null to stdio")
	makeNullStdio()
	fmt.Fprintln(console, "write after dup /dev/null to stdio")
	console.Close()

	logf, _ := os.Create("/tmp/logfile")
	for {
		console, err = os.OpenFile("/dev/console", os.O_RDWR, 0)
		if err != nil {
			fmt.Fprint(logf, err)
			os.Exit(1)
		}
		fmt.Fprintln(console, "open, write to console, and close")
		console.Close()
		time.Sleep(time.Second)
	}
}

func makeNullStdio() {
	nullFd, _ := os.OpenFile("/dev/null", os.O_RDWR, 0)
	defer nullFd.Close()
	fd := int(nullFd.Fd())
	syscall.Dup2(fd, 0)
	syscall.Dup2(fd, 1)
	syscall.Dup2(fd, 2)
}
```

Without the patch, we have the following output:

```
hanamura# runc run hello
write before dup /dev/null to stdio
write after dup /dev/null to stdio
```

After the patch

```
hanamura# runc run hello
write before dup /dev/null to stdio
write after dup /dev/null to stdio
open, write to console, and close
open, write to console, and close
open, write to console, and close
```

Signed-off-by: Daniel Dao <[email protected]>
@dqminh dqminh force-pushed the systemd-console branch from 113b719 to 380eb67 Compare May 11, 2017 22:45
@hqhq
Copy link
Contributor

hqhq commented May 12, 2017

To me it feels like a better approach than #1434 , but it'll still make upper level receivers confusing, can we keep this kind of hack inside of runc, like bind mounting that pty slave to one more random place that will never be closed by normal applications? Or reserve one more fd in container and dup the pty slave fd to it?

@dqminh
Copy link
Contributor Author

dqminh commented May 12, 2017

like bind mounting that pty slave to one more random place that will never be closed by normal applications

Hmm this doesnt quite work because the application will have to open the mountpoint ( like how /dev/console is opened )

reserve one more fd in container and dup the pty slave fd to it

This is interesting and should also work as long as the application dont close it. I feel like its also weird, maybe even more than our current slavefd passing but it does require no changes to master side which is a plus as well.

@TomasTomecek
Copy link

@dqminh oh wow, thank you for figuring out a much simpler solution! what do you think is hacky about this PR?

Since you managed to produce a working implementation, I'm going to close my wip PR.

@dqminh
Copy link
Contributor Author

dqminh commented May 15, 2017

what do you think is hacky about this PR?

usually in the traditional way, i think we would only require each side to hold its part of the pty pair (i.e. master should not care or know about the slave side). This patch changes that assumption, therefore also the way we send fds over to the master side ). all consumers of runc therefore also require updates to how pty is handled.

Or reserve one more fd in container and dup the pty slave fd to it?

@hqhq i still kinda dislike this approach, mostly because there's no good fd that we can dup too.

@cyphar
Copy link
Member

cyphar commented May 15, 2017

@dqminh

mostly because there's no good fd that we can dup too.

Not to mention that might cause some issues in cases where a process in the container wants to drop all abilities to write to the terminal (if you dup the child the process can't easily know that that is the case aside from doing some magic with /proc/self/fd/...).

This is hacky IMO

Yeah unfortunately I agree. Not just because it changes the --console-socket API but also because it means that now the caller has to manage a file descriptor (that they must not use) in addition to the master they actually do need to use. Also it's possible (not sure though) that sending a PTY file descriptor might change your controlling terminal (see O_NOCTTY) but I would have to check.

@hqhq
Copy link
Contributor

hqhq commented May 16, 2017

@dqminh @cyphar

i still kinda dislike this approach, mostly because there's no good fd that we can dup too.

fd 3 would be the first option to me, unfortunately we already supported LISTEN_FDS for on-demand socket activation for systemd, which starts with fd 3 by default, and we already supported preserve-fds option, so an extra fd between LISTEN_FDS and preserve-fds would be good.

Not to mention that might cause some issues in cases where a process in the container wants to drop all abilities to write to the terminal

In that case if it causes issue like moby/moby#27202 we can definitely blame the application in container from doing such insane operation, applications should never close any random fds which they don't know, otherwise it'll be problematic for other preserved fds as well.

The main reason I prefer this approach is that it'll be hacky anyway, and we better keep this in runc than exposing it up out.
@crosbymichael WDYT about this?

@ijc
Copy link
Contributor

ijc commented May 16, 2017

applications should never close any random fds which they don't know,

It is quite common for applications to close all fds between fork+exec for a variety of reasons and it is not uncommon to implement this as a simple loop up to some random number, since sadly that's the only really portable way.

Even with Go if len(Cmd.ExtraFiles) is sufficiently large an application can easily close an arbitrary fd as a consequence of the dup over it.

Expecting (as the parent) the child of a fork to do (or not do) anything in particular with its own fds is not really possible, it's certainly not possible to require that it not touch any relatively low numbered fds (so 3 is definitely out). You might be able to get away 99.9% of the time with something up in the 1025+ range or even higher.

This is Linux specific code, so can we not rely on inotify or one of the more modern poll type things to detect when the slave is reopened?

@crosbymichael
Copy link
Member

I think the proper fix is to use epoll. we have some experience with it in Go inside containerd so I think it should work but I think you said you tried it and ran into some issues.

@hqhq
Copy link
Contributor

hqhq commented May 17, 2017

@crosbymichael We've tried epoll as moby/moby#27202 (comment) but didn't work out, maybe we didn't do it properly.

@TomasTomecek
Copy link

@hqhq Lennart explained the trick with epoll nicely here:

Now, I think the right fix for all this is to make docker deal properly with POLLHUP on the pty and just leave the pty open. Implementing this properly isn't obvious on UNIX however, as one cannot mask POLLHUP watching when using poll(). However, if epoll() is used using edge-triggered behaviour this can be done nicely and cleanly. And that's what docker should really do here: use epoll to watch the pty master, and when seeing EPOLLHUP it should treat that as a temporary hangup, but not a complete destruction.

@cyphar
Copy link
Member

cyphar commented May 18, 2017

Really since we're using io.Copy we could just make a fork of the stdlib's io.Copy and handle POLLHUP/EPOLLHUP. Though we should check if the golang.org/x/... packages have something like this for us already.

moby/moby#27202 (comment) also shows how ugly the code can get and why we should definitely make this a separate function just based on io.Copy...

@mrunalp
Copy link
Contributor

mrunalp commented May 18, 2017

This requires an undocumented trick where we first close the slave to set HUP on the master.
See socat source here: https://github.com/craSH/socat/blob/c20699fced66696e243d785fdfcd2a94cf11e4cc/xio-pty.c#L205

Another option would be to use inotify as suggested earlier in this thread.

And for some inspiration (or horror) to handle this correctly in go, please take a look at: https://github.com/lxc/lxd/blob/master/shared/util_linux.go#L540

@crosbymichael
Copy link
Member

@mrunalp the horror... ;)

@dqminh
Copy link
Contributor Author

dqminh commented May 18, 2017

Yah i will see again if i can make epoll work.

But this trick is so niceee and clean though 😢

@cyphar
Copy link
Member

cyphar commented May 19, 2017

Oh Christ, look how huge those comments are. 💫

@dqminh dqminh closed this May 26, 2017
dqminh added a commit to dqminh/console that referenced this pull request Jun 29, 2017
This adds a console implementation for linux based on epoll and its
epoll manager (EpollConsole and Epoller). We need this in cases the
slave side repeatedly open then closes the slave console when doing I/O.

For more information, refers to:
- systemd/systemd#4262
- moby/moby#27202
- opencontainers/runc#1446

Signed-off-by: Daniel Dao <[email protected]>
dqminh added a commit to dqminh/console that referenced this pull request Jun 30, 2017
This adds a console implementation for linux based on epoll and its
epoll manager (EpollConsole and Epoller). We need this in cases the
slave side repeatedly open then closes the slave console when doing I/O.

For more information, refers to:
- systemd/systemd#4262
- moby/moby#27202
- opencontainers/runc#1446

Signed-off-by: Daniel Dao <[email protected]>
dqminh added a commit to dqminh/console that referenced this pull request Jun 30, 2017
This adds a console implementation for linux based on epoll and its
epoll manager (EpollConsole and Epoller). We need this in cases the
slave side repeatedly open then closes the slave console when doing I/O.

For more information, refers to:
- systemd/systemd#4262
- moby/moby#27202
- opencontainers/runc#1446

Signed-off-by: Daniel Dao <[email protected]>
dqminh added a commit to dqminh/console that referenced this pull request Jun 30, 2017
This adds a console implementation for linux based on epoll and its
epoll manager (EpollConsole and Epoller). We need this in cases the
slave side repeatedly open then closes the slave console when doing I/O.

For more information, refers to:
- systemd/systemd#4262
- moby/moby#27202
- opencontainers/runc#1446

Signed-off-by: Daniel Dao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants