-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd service files as blogged have issues on reboot #3148
Comments
On reboot, as in as the system shuts down, or when it comes back up? |
As the system shuts down |
So Could we be talking about https://bugzilla.redhat.com/show_bug.cgi?id=1394937 |
Potentially, yes I guess. What caught my attention was this 'systemd[1]: libpod-conmon-7bfbd4ae7b10b983bc65c8a55450a2294d6e7d0b659a2d0de0d8ba2f205def55.scope: Killing process 1168 (conmon) with signal SIGKILL.' . I.e. I am wondering if systemd somehow has issues (ignores deps?) with the .scope that podman creates? |
@mrunalp PTAL |
Hello there! Would love to get @mrunalp feedback if CRI-O already dealt with that issue :). |
There's an upstream fix proposed for runc here: opencontainers/runc#2062 |
Yeah the runc PR should probably help. Could you test with that change and see if it helps? |
So I tried with an installed runc with the 2062 applied to no avail. Is updating runc, restarting the box and then rerunning the test sufficient? I left some details around my testing here https://bugzilla.redhat.com/show_bug.cgi?id=1710871#c11 |
If I'm correct, we also need to teach Podman to take advantage of the new |
So just an update here. Since I got back from PTO I am unable to reproduce this downstream. I'll try a bit more in the next days, but for sure it is not seen constantly as we did before. Not sure what changed under me though |
i believe this is fixed and therefore I am closing. |
This is not fixed. |
what needs to be done @mheon ... so we can get it assigned? |
Core issue: CGroup scopes are created by Podman and the OCI runtime - one for Conmon, one for the container process. These scopes have no dependencies and as such, during shutdown, systemd will whack them with a SIGTERM and then, after a brief timeout, SIGKILL on shutdown. Under some circumstances, we don't want this to happen - for example, the Openstack teams wants an ordered shutdown, dictated by systemd and their orchestration service. The conmon one is probably simple to deal with - #3474 should fix it once I figure out why the tests are red. Just set a long kill timeout to give time for the container to stop. The container is a lot more complicated. We don't have control of the application in the container, so we don't know what signals are safe to send it - we can't use the same approach as Conmon. We could use systemd dependencies if the container was run from a unit file - but the chain here needs to be inverted, the systemd unit that launched the container needs to depend on the container's CGroup scope, but the unit file exists before the container is launched, and the container's CGroup doesn't exist until it's launched. My initial proposal was to ditch CGroups entirely (optionally) - #3581 Limitations of that approach:
I view this as a long-term enhancement, as Openstack has (mostly) resolved the issue via adding an intermediate systemd unit to handle dependencies. This is 1.6.x timeframe or later. |
Oh, I should mention that Openstack rejected the initial approach - they need |
we believe this is fixed and as such are closing it; please re-open if the problem persists. |
** BUG REPORT **
/kind bug
Description
After configuring a service file to start a podman container as documented here - https://podman.io/blogs/2018/09/13/systemd.html we noticed that on reboot all processes inside the container get SIGKILLED and the container has no chance of terminating gracefully.
Steps to reproduce the issue:
sudo podman pull docker.io/redis
sudo podman run -d --name redis -p 6379:6379 redis
/etc/systemd/system/redis.service:
[Unit]
Description=Redis Podman container
Wants=syslog.service
[Service]
Restart=always
ExecStart=/usr/bin/podman start -a redis
ExecStop=/usr/bin/podman stop -t 10 redis
[Install]
WantedBy=multi-user.target
sudo systemctl enable redis.service
sudo systemctl start redis.service
sudo reboot
Describe the results you received:
May 16 10:05:22 podmanreboot systemd[1]: Stopping Restore /run/initramfs on shutdown...
May 16 10:05:22 podmanreboot systemd[1]: libpod-conmon-7bfbd4ae7b10b983bc65c8a55450a2294d6e7d0b659a2d0de0d8ba2f205def55.scope: Killing process 1168 (conmon) with signal SIGKILL.
May 16 10:05:22 podmanreboot systemd[1]: Stopped libpod-conmon-7bfbd4ae7b10b983bc65c8a55450a2294d6e7d0b659a2d0de0d8ba2f205def55.scope.
May 16 10:05:22 podmanreboot systemd[1]: Stopping Authorization Manager...
May 16 10:05:22 podmanreboot systemd[1]: libpod-7bfbd4ae7b10b983bc65c8a55450a2294d6e7d0b659a2d0de0d8ba2f205def55.scope: Killing process 1201 (redis-server) with signal SIGKILL.
May 16 10:05:22 podmanreboot systemd[1]: Stopped libcontainer container 7bfbd4ae7b10b983bc65c8a55450a2294d6e7d0b659a2d0de0d8ba2f205def55.
May 16 10:05:22 podmanreboot systemd[1]: libpod-7bfbd4ae7b10b983bc65c8a55450a2294d6e7d0b659a2d0de0d8ba2f205def55.scope: Consumed 133ms CPU time
May 16 10:05:22 podmanreboot systemd[1]: Removed slice machine.slice.
May 16 10:05:22 podmanreboot systemd[1]: Removed slice system-sshd\x2dkeygen.slice.
May 16 10:05:22 podmanreboot systemd[1]: Stopping irqbalance daemon...
May 16 10:05:22 podmanreboot systemd[1]: Stopped target Login Prompts.
May 16 10:05:22 podmanreboot systemd[1]: Stopping Serial Getty on ttyS0...
May 16 10:05:22 podmanreboot systemd[7088]: Stopped target Default.
May 16 10:05:22 podmanreboot systemd[7088]: Stopped target Basic System.
May 16 10:05:22 podmanreboot systemd[7088]: Stopped target Paths.
May 16 10:05:22 podmanreboot systemd[7088]: Stopped target Sockets.
May 16 10:05:22 podmanreboot systemd[1]: Stopping Redis Podman container...
Notice how the processes in the redis container got a SIGKILL first thing after a reboot was issued and only later system tries to shut down the podman container.
Describe the results you expected:
We expected a graceful sigterm to the processes and then the 'Stopping Redis Podman container'
Additional information you deem important (e.g. issue happens only occasionally):
Seems pretty reproducable so far
Output of
podman version
:Output of
podman info --debug
:Additional environment details (AWS, VirtualBox, physical, etc.):
Issue first observed with podman from rhel8: podman-1.0.0-2.git921f98f.module+el8+2785+ff8a053f.x86_64
We also observed the problem with podman-1.2.0-1.git3bd528e.module+el8+3135+c5113def.x86_64
The text was updated successfully, but these errors were encountered: