-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long running migrations can cause systemd to terminate daemon on start #7269
Comments
Can we notify systemd the same way while we open the database on startup (which is basically the reason for the slow startup documented here and on shutdown when we for example run database compaction jobs, flush data to disk, or clean up sockets etc.? |
However, we don't do database compactions on stop (explicitly disabled for this reason) so we shouldn't delay there. If IPFS gets stuck on shutdown, we should just die and cleanup when we next restart. |
I've seen ipfs hit the one minute 30-second limit dozens of times on different machines, without badger db. My guess is, that it's a TCP cleanup issue. I think it's still nicer to clean up everything cleanly and as long as we make progress we notify systemd. Maybe add a hard limit, after which we stop notifying on shutdown, to make sure we don't hang indefinitely. |
Odd. We shouldn't be spending any time cleaning up TCP connections or anything like that, unless we have a bug somewhere. Can you reproduce this? |
Well, I converted all my nodes to badgerds with the 0.5.0 release. I need to convert one back to see how it can be reproduced. |
I can report this bug seems to be gone. Not sure when, but I haven't noticed it at all on 0.8. Apart from this, wouldn't it make sense to send go-ipfs a SIGABRT instead of a SIGKILL by systemd by default if this happens? This gives the user a stack trace to share with us? |
I agree. Want to file a PR?
|
Ok, I'm actually just going to disable the startup timeout. It's not helping. |
Version information:
Description:
Now that go-ipfs supports systemd's "notification" system, we need to tell systemd to repeatedly extend time startup timeout while performing repo migrations. Otherwise, systemd may kill the daemon thinking it timed out on startup.
We can do this by repeatedly sending
EXTEND_TIMEOUT_USEC=...
to systemd's notification socket using github.com/coreos/go-systemd/v22/daemon. Seecmd/ipfs/daemon_linux.go
for how we interact with systemd's notification service.The text was updated successfully, but these errors were encountered: