Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal 23 (SIGURG) Docker event flood #315

Closed
pothos opened this issue Jan 11, 2021 · 3 comments
Closed

Signal 23 (SIGURG) Docker event flood #315

pothos opened this issue Jan 11, 2021 · 3 comments
Labels
kind/bug Something isn't working

Comments

@pothos
Copy link
Member

pothos commented Jan 11, 2021

Description
Beginning with release 2605.9.0 Docker containers generate many signal 23 events which flood monitoring systems (example: kubernetes/kops#10388). The SIGURG signal does not kill the process but is generated by Go runtime scheduling (https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md). Because the Go runtime does not know if the process expects external SIGURG signals, the signal is not filtered out but reported to the process (golang/go#37942). The process has to filter this signal out itself before forwarding it to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14 but Flatcar skipped that for Stable). The Go 1.15 compiler was used for the Docker, containerd, and runc binaries. However, while containerd has some workaround in place (containerd/containerd#4532) this is not the case for every part of it.
I suggest trying to downgrade the Docker/containerd binaries to use the Go 1.13 compiler.

Impact

For example, many entries appear in the output of docker events -f container=ID

Environment and steps to reproduce

  1. Set-up: Run an alpine container, and in a second terminal monitor the events as written above (can omit -f …).
  2. Task Hit Ctrl-C a few times and Enter
  3. Action(s): ↑
  4. Error: Observe log entries, maybe also attach with sudo strace -tt -ff -p PID to the container process and also to the containerd-shim process to see 12:27:44.156649 poll([{fd=0, events=POLLIN}], 1, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) 12:27:44.206100 --- SIGURG {si_signo=SIGURG, si_code=SI_USER, si_pid=0, si_uid=0} --- for the container process and [pid 2120] futex(0xa70750, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=100000} <unfinished ...> [pid 2189] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=2114, si_uid=0} --- for the shim process

Expected behavior
The signal 23 from the Go runtime does not appear in the log anymore

Additional information
Created an upstream issue at containerd/containerd#4935

@pothos pothos added the kind/bug Something isn't working label Jan 11, 2021
@pothos
Copy link
Member Author

pothos commented Jan 11, 2021

To compare against the behavior of the upstream binaries from the Docker GitHub release I updated the guide to install them via Ignition: flatcar-archive/flatcar-docs#148

@pothos
Copy link
Member Author

pothos commented Jan 12, 2021

It seems that containerd-shim is not yet filtering the signal and runc is not that much related because it's not active anymore.

Edit: Seems the signal comes actually from the Docker CLI client, not the shim itself.

pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 12, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 12, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 13, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 18, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 18, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 18, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
@margamanterola
Copy link
Contributor

This change will be part of the next set of releases (alpha, beta and stable).

At least one of the missing signals is now handled here:
docker/cli#2929

It's tagged 20.10.3 milestone, so we can probably revert this fix once we upgrade docker to 20.10.3.

pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Feb 1, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
@pothos pothos closed this as completed Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants