-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
send slavefd to the other side when terminal=true #1446
Conversation
This is hacky IMO, but i sent it anyway so we can discuss. |
Currently when terminal=true and container side decides to close stdio and also eventually close /dev/console, it effectively looses the slave side of the pty pair, and IO on the master results in EIO If you use EPOLL on all IO we can pretend EIO is ok and continue, but we dont. And moving to EPOLL is not really easy or straighforward. Instead when terminal=true, this patch attempts to send over the valid slavefd when it was opened, and the master side will hold on to it, so no matter what the container side is doing we will always have a valid pty pair. The following Go program simulate the container behavior: ``` package main import ( "fmt" "log" "os" "syscall" "time" ) func main() { console, err := os.OpenFile("/dev/console", os.O_RDWR, 0) if err != nil { log.Fatal(err) } fmt.Fprintln(console, "write before dup /dev/null to stdio") makeNullStdio() fmt.Fprintln(console, "write after dup /dev/null to stdio") console.Close() logf, _ := os.Create("/tmp/logfile") for { console, err = os.OpenFile("/dev/console", os.O_RDWR, 0) if err != nil { fmt.Fprint(logf, err) os.Exit(1) } fmt.Fprintln(console, "open, write to console, and close") console.Close() time.Sleep(time.Second) } } func makeNullStdio() { nullFd, _ := os.OpenFile("/dev/null", os.O_RDWR, 0) defer nullFd.Close() fd := int(nullFd.Fd()) syscall.Dup2(fd, 0) syscall.Dup2(fd, 1) syscall.Dup2(fd, 2) } ``` Without the patch, we have the following output: ``` hanamura# runc run hello write before dup /dev/null to stdio write after dup /dev/null to stdio ``` After the patch ``` hanamura# runc run hello write before dup /dev/null to stdio write after dup /dev/null to stdio open, write to console, and close open, write to console, and close open, write to console, and close ``` Signed-off-by: Daniel Dao <[email protected]>
To me it feels like a better approach than #1434 , but it'll still make upper level receivers confusing, can we keep this kind of hack inside of runc, like bind mounting that pty slave to one more random place that will never be closed by normal applications? Or reserve one more fd in container and dup the pty slave fd to it? |
Hmm this doesnt quite work because the application will have to open the mountpoint ( like how /dev/console is opened )
This is interesting and should also work as long as the application dont close it. I feel like its also weird, maybe even more than our current slavefd passing but it does require no changes to master side which is a plus as well. |
@dqminh oh wow, thank you for figuring out a much simpler solution! what do you think is hacky about this PR? Since you managed to produce a working implementation, I'm going to close my wip PR. |
usually in the traditional way, i think we would only require each side to hold its part of the pty pair (i.e. master should not care or know about the slave side). This patch changes that assumption, therefore also the way we send fds over to the master side ). all consumers of runc therefore also require updates to how pty is handled.
@hqhq i still kinda dislike this approach, mostly because there's no good fd that we can dup too. |
Not to mention that might cause some issues in cases where a process in the container wants to drop all abilities to write to the terminal (if you dup the child the process can't easily know that that is the case aside from doing some magic with
Yeah unfortunately I agree. Not just because it changes the |
fd 3 would be the first option to me, unfortunately we already supported
In that case if it causes issue like moby/moby#27202 we can definitely blame the application in container from doing such insane operation, applications should never close any random fds which they don't know, otherwise it'll be problematic for other preserved fds as well. The main reason I prefer this approach is that it'll be hacky anyway, and we better keep this in runc than exposing it up out. |
It is quite common for applications to close all fds between fork+exec for a variety of reasons and it is not uncommon to implement this as a simple loop up to some random number, since sadly that's the only really portable way. Even with Go if Expecting (as the parent) the child of a fork to do (or not do) anything in particular with its own fds is not really possible, it's certainly not possible to require that it not touch any relatively low numbered fds (so 3 is definitely out). You might be able to get away 99.9% of the time with something up in the 1025+ range or even higher. This is Linux specific code, so can we not rely on |
I think the proper fix is to use epoll. we have some experience with it in Go inside containerd so I think it should work but I think you said you tried it and ran into some issues. |
@crosbymichael We've tried epoll as moby/moby#27202 (comment) but didn't work out, maybe we didn't do it properly. |
@hqhq Lennart explained the trick with epoll nicely here:
|
Really since we're using moby/moby#27202 (comment) also shows how ugly the code can get and why we should definitely make this a separate function just based on |
This requires an undocumented trick where we first close the slave to set HUP on the master. Another option would be to use inotify as suggested earlier in this thread. And for some inspiration (or horror) to handle this correctly in go, please take a look at: https://github.com/lxc/lxd/blob/master/shared/util_linux.go#L540 |
@mrunalp the horror... ;) |
Yah i will see again if i can make epoll work. But this trick is so niceee and clean though 😢 |
Oh Christ, look how huge those comments are. 💫 |
This adds a console implementation for linux based on epoll and its epoll manager (EpollConsole and Epoller). We need this in cases the slave side repeatedly open then closes the slave console when doing I/O. For more information, refers to: - systemd/systemd#4262 - moby/moby#27202 - opencontainers/runc#1446 Signed-off-by: Daniel Dao <[email protected]>
This adds a console implementation for linux based on epoll and its epoll manager (EpollConsole and Epoller). We need this in cases the slave side repeatedly open then closes the slave console when doing I/O. For more information, refers to: - systemd/systemd#4262 - moby/moby#27202 - opencontainers/runc#1446 Signed-off-by: Daniel Dao <[email protected]>
This adds a console implementation for linux based on epoll and its epoll manager (EpollConsole and Epoller). We need this in cases the slave side repeatedly open then closes the slave console when doing I/O. For more information, refers to: - systemd/systemd#4262 - moby/moby#27202 - opencontainers/runc#1446 Signed-off-by: Daniel Dao <[email protected]>
This adds a console implementation for linux based on epoll and its epoll manager (EpollConsole and Epoller). We need this in cases the slave side repeatedly open then closes the slave console when doing I/O. For more information, refers to: - systemd/systemd#4262 - moby/moby#27202 - opencontainers/runc#1446 Signed-off-by: Daniel Dao <[email protected]>
Fix #1434
Currently when terminal=true and container side decides to close stdio and also
eventually close /dev/console, it effectively looses the slave side of the pty
pair, and IO on the master results in EIO
If you use EPOLL on all IO we can pretend EIO is ok and continue, but we dont.
And moving to EPOLL is not really easy or straighforward.
Instead when terminal=true, this patch attempts to send over the valid slavefd
when it was opened, and the master side will hold on to it, so no matter what
the container side is doing we will always have a valid pty pair.
The following Go program simulate the container behavior:
Without the patch, we have the following output:
After the patch
Signed-off-by: Daniel Dao [email protected]