Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tight container limits may cause "read init-p: connection reset by peer" #1914

Open
danail-branekov opened this issue Oct 19, 2018 · 5 comments

Comments

@danail-branekov
Copy link
Contributor

Steps to reproduce:

  1. Create a container via runc create <id>
  2. Set the pid limit of the container to 1 via echo 1 > /sys/fs/cgroup/pids/.../<id>/pids.max
  3. Run a process: runc exec <id> /bin/echo hi. The following error occurs:
runtime/cgo: pthread_create failed: Resource temporarily unavailable
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f4cadc230bb m=3 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7f4cadc230bb
stack: frame={sp:0x7f4cad3e9830, fp:0x0} stack=[0x7f4cacbea2a0,0x7f4cad3e9ea0)
00007f4cad3e9730:  0000000000000000  0000000000000000
00007f4cad3e9740:  0000000000000000  0000000000000000
00007f4cad3e9750:  0000000000000000  0000000000000000
00007f4cad3e9760:  0000000000000000  0000000000000000
00007f4cad3e9770:  0000000000000000  0000000000000000
00007f4cad3e9780:  0000000000000000  0000000000000000
00007f4cad3e9790:  0000000000000000  0000000000000000
00007f4cad3e97a0:  0000000000000000  0000000000000000
00007f4cad3e97b0:  0000000000000000  0000000000000000
00007f4cad3e97c0:  0000000000000000  0000000000000000
00007f4cad3e97d0:  0000000000000000  0000000000000000
00007f4cad3e97e0:  0000000000000000  0000000000000000
00007f4cad3e97f0:  0000000000000000  0000000000000000
00007f4cad3e9800:  0000000000000000  0000000000000000
00007f4cad3e9810:  0000000000000000  0000000000000000
00007f4cad3e9820:  0000000000000000  0000000000000000
00007f4cad3e9830: <0000000000000000  0000000000000000
00007f4cad3e9840:  0000000000000000  0000000000000000
00007f4cad3e9850:  0000000000000000  0000000000000000
00007f4cad3e9860:  0000000000000000  0000000000000000
00007f4cad3e9870:  0000000000000000  0000000000000000
00007f4cad3e9880:  0000000000000000  0000000000000000
00007f4cad3e9890:  0000000000000000  0000000000000000
00007f4cad3e98a0:  0000000000000000  0000000000000000
00007f4cad3e98b0:  fffffffe7fffffff  ffffffffffffffff
00007f4cad3e98c0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98d0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98e0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98f0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9900:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9910:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9920:  ffffffffffffffff  ffffffffffffffff
runtime: unknown pc 0x7f4cadc230bb
stack: frame={sp:0x7f4cad3e9830, fp:0x0} stack=[0x7f4cacbea2a0,0x7f4cad3e9ea0)
00007f4cad3e9730:  0000000000000000  0000000000000000
00007f4cad3e9740:  0000000000000000  0000000000000000
00007f4cad3e9750:  0000000000000000  0000000000000000
00007f4cad3e9760:  0000000000000000  0000000000000000
00007f4cad3e9770:  0000000000000000  0000000000000000
00007f4cad3e9780:  0000000000000000  0000000000000000
00007f4cad3e9790:  0000000000000000  0000000000000000
00007f4cad3e97a0:  0000000000000000  0000000000000000
00007f4cad3e97b0:  0000000000000000  0000000000000000
00007f4cad3e97c0:  0000000000000000  0000000000000000
00007f4cad3e97d0:  0000000000000000  0000000000000000
00007f4cad3e97e0:  0000000000000000  0000000000000000
00007f4cad3e97f0:  0000000000000000  0000000000000000
00007f4cad3e9800:  0000000000000000  0000000000000000
00007f4cad3e9810:  0000000000000000  0000000000000000
00007f4cad3e9820:  0000000000000000  0000000000000000
00007f4cad3e9830: <0000000000000000  0000000000000000
00007f4cad3e9840:  0000000000000000  0000000000000000
00007f4cad3e9850:  0000000000000000  0000000000000000
00007f4cad3e9860:  0000000000000000  0000000000000000
00007f4cad3e9870:  0000000000000000  0000000000000000
00007f4cad3e9880:  0000000000000000  0000000000000000
00007f4cad3e9890:  0000000000000000  0000000000000000
00007f4cad3e98a0:  0000000000000000  0000000000000000
00007f4cad3e98b0:  fffffffe7fffffff  ffffffffffffffff
00007f4cad3e98c0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98d0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98e0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e98f0:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9900:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9910:  ffffffffffffffff  ffffffffffffffff
00007f4cad3e9920:  ffffffffffffffff  ffffffffffffffff

goroutine 1 [runnable, locked to thread]:
runtime.chanrecv(0xc42005a000, 0x0, 0x1, 0x5653f5a1b1fa)
        /usr/local/go/src/runtime/chan.go:415 +0x6a0
runtime.chanrecv1(0xc42005a000, 0x0)
        /usr/local/go/src/runtime/chan.go:400 +0x2b
runtime.gcenable()
        /usr/local/go/src/runtime/mgc.go:217 +0x71
runtime.main()
        /usr/local/go/src/runtime/proc.go:161 +0x126
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2361 +0x1

rax    0x0
rbx    0x7f4cadfc7800
rcx    0x7f4cadc230bb
rdx    0x0
rdi    0x2
rsi    0x7f4cad3e9830
rbp    0x5653f5d15152
rsp    0x7f4cad3e9830
r8     0x0
r9     0x7f4cad3e9830
r10    0x8
r11    0x246
r12    0x5653f76c8480
r13    0xf1
r14    0x11
r15    0x0
rip    0x7f4cadc230bb
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
exec failed: container_linux.go:336: starting container process caused "read init-p: connection reset by peer"

After some debugging we found out what causes this error:

  1. Runc starts the new process without waiting for it here
  2. In parallel runc would put the process into the limited cgroup here
  3. The conditions above create a race where the process can join the restricted cgroup too early while the Golang runtime is initializing and creating its internal threads

In order to prove that we added a sleep of 100ms before the process is joined to the cgroup and this significantly reduced the failure rate. Removing the code that joins the cgroup "fixed" it entirely.

We realise that such a tight limit has quite a limited practical use but we wanted to share the knowledge with the community. We believe that this error may also occur when exceeding any container cgroup limit (e.g. memory, cpu, pids).

Cheers, CF Garden Team

danail-branekov added a commit to cloudfoundry/garden-integration-tests that referenced this issue Oct 30, 2018
danail-branekov added a commit to cloudfoundry/garden-runc-release that referenced this issue Oct 30, 2018
[#159069922]

Submodule src/code.cloudfoundry.org/garden-integration-tests 289117c..e2d8121:
  > Increase pids limits to fix connection reset flake See opencontainers/runc#1914
@kkallday
Copy link
Contributor

Hi @danail-branekov,

I'm trying to get a better understanding of the series of actions that lead to this error.

  1. The process /bin/echo hi is created but hasn't started execution when this line is executed
  2. The process from step (1) is placed into the limited cgroup
  3. The process from step (1) begins execution.
  4. Error occurs. The golang runtime fails to create a goroutine/system thread to execute step (3). A goroutine other than the main goroutine is created because cmd.Start() doesn't wait for the process to exit.

Is this happening because goroutines are assigned their own pids?

@danail-branekov
Copy link
Contributor Author

Hi @kkallday
Kind of. The pids/pidmax cgroup file, despite its name, not only limits the number of allowed process identifiers (PIDs) but also limits the number of thread identifiers (TIDs). When a process starts a new thread (such as a go routine) via e.g. pthread_create, its thread id (TID) adds up to the pids cgroup as well, hence the cgroup limit is reached.

As noted above, if you artificially wait some time for the golang runtime go routines finish their initialisation and complete, then the issue is gone as the user process /bin/echo hi consists of a single PID/TID.

@kkallday
Copy link
Contributor

@danail-branekov gotcha - I didn't know that TIDs count towards the pids cgroup count. That is good to know.

One more question: when you sleep for some time, does that mean the command might execute for some time outside of the cgroup? After the process is placed in the cgroup, the process might have already exited which defeats the purpose of putting it in a cgroup (I'm assuming my understanding here is wrong). Or is there some type of "freeze" on the command before it gets executed in the cgroup.

I'm new here - trying to get a better understanding of the project. Thanks in advance! 😄

@danail-branekov
Copy link
Contributor Author

Well, yes, sleeping is just a hack/workaround to prove that we get the connection reset error because of hitting the pids limit. AFAIK, there is no process freezing, it just starts and its pid is added to the cgroup. Therefore there is some tiny theoretical interval (from here to here) where the process can terminate in the meanwhile. I am not sure what would the behaviour be in that case, maybe running the process would fail...

@cyphar
Copy link
Member

cyphar commented Nov 13, 2018

Ideally we would know when we should contain the process with a liveliness check of the Go runtime but that's not really doable (you could try to do it by doing even more synchronisation -- but that's what #1916 will do implicitly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants