-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Old issue back again? PiGPIOd hanging count down on shutdown? #519
Comments
This is weird because |
Right, completely weird. I am doing more testing. But it seems so far, that the something is being lost, ignored, or what have you during the shutdown sequence. I even wondered if it was maybe a queue issue or something, where the daemon was just busy and ignoring things, but SIGTERM is more aggressive than SIGINT, no? The only thing that comes to mind, is that maybe some change between buster and bullseye is part of the scenario. Once we finalized the unit file (on buster) I never saw the issue again. It was only after I started creating Pi OS 11 images and installing the daemon, I notice the shutdown/reboots where do the hang/time count thing. |
Still seeing the long shutdown time, should I have the execute stop task generate an explicit SIGINT or SIGTERM as a test or simply do a pid kill? Or try something else? |
I've spent a good deal of time looking into this behavior and these are my observations: Running the (first) service unit without The signals sent by systemd are consistent with the documentation:
In the second case, the command In limited testing so far, if I configure sigHandler to handle SIGCONT, the timeout is avoided with either unit file. |
Wow, this is really cool. Which scenario, such would you consider the more graceful? That would be the one I would suggest. I am not that familiar with SIGCONT, so I had to look up the details on it. "...You can send a SIGCONT signal to a process to make it continue. This signal is special—it always makes the process continue if it is stopped, before the signal is delivered. The default behavior is to do nothing else." It is the comment 'Note that, right after sending the signal specified in this setting, systemd will always send SIGCONT, to ensure that even suspended tasks can be terminated cleanly.' That I struggle with, that feels like a potential race condition? Send SIGTERM and while that is happening, SIGCONT is sent... and the result is the process may not actaully stop? Given your testing... I don't recall ever having to handle SIGCONT explicitly, I have several python3 scripts that handle SIGINT, SIGTERM running as defined in unit files under systemd, and I have never had them hang thus far. I don't handle SIGCONT at all (as yet, maybe I should look at that as well). Really good work, truly appreciated it. :) |
I too am concerned about a potential race between two signals (SIGTERM, SIGCONT) calling I have more testing to do ... |
Yup. If you need anything, that I can help with, let me know. I have almost every model of Pi (excluding compute modules, and still trying to find Pi Zero 2). |
Thanks, I will tag your name when I need your input. In the meantime, I'm just using this thread to document some of my findings and to remind myself later what I need to do. |
Observation: Sometimes systemd only sends SIGCONT (without SIGTERM). In this situation, pigpio daemon will terminate in an unclean state - /dev/pigerr may be left open, Hypothesis: systemd is confused as to which process is the main PID.
The pigpiod parent process exits immediately after forking. On occasion, I have seen a journal entry complaining that systemd can't open/read the pid file. Need to delay exit of parent process until pid file has been created. |
@Jibun-no-Kage , in my limited testing, I have observed positive results with the change described below. Can you try this on your system and report your findings? On version 79 in file pigpio.c, replace line 5630 with: |
Sure, will do, will get back to you in a few days. |
Downloaded the latest master zip. I appears the code change is already in place? Line 5630 is "_exit(-1);" |
v79? |
I am seeing a few issues that have been opened over the past few years about the miss-handling of signals. E,g this one and #449 and fundamentally IMVHO there are some short-comings in the code that (referring to, e.g. signal(7)):
I had a go at trying to fix things in #58 in a "safe" manner - but that wasn't used (by anyone other than myself!) - apart from anything else it needed to move some of the unsafe things out of the signal handling into a separate function that needed to be inserted into the daemon's main event loop so it wasn't perfect either... |
Yeah, per the -v results, 79. version. |
@SlySven, I agree that pigpio should be more discriminating in assigning handlers - currently all signals in range 0-63 are treated equally. (There is, of course, the option for no signal handling.) Using the signal(7) signal disposition categories for default actions,I have the following recommendations: Disposition / Action For "Term" disposition, pigpio must use Likewise, for "Core" we want to stop the hardware. What I'm unclear on is without the default handler will the core dump happen without taking some other explicit action. And does this matter? For "Stop", I think the correct course of action is to stop the hardware. For "Cont", I would otherwise have taken the position of ignoring but now that I have seen systemd send SIGCONT when stopping a service, I think it should also be to terminate. Avoid any stream operations in the signal handler. (ie DBG) I am also mulling over refactoring pigpiod so that it simply polls a sig_atomic flag and then runs Your comments are appreciated. |
Would looking at the code logic under RPi.GPIO cleanup() not be applicable? Just a thought. For python the best practice for closing/releasing GPIO state is calling RPi.GPIO cleanup(), either per pin or to return all pins to a default state. Calling cleanup is considered safe end state. |
I'm not in a position to give a full reply right now, but in passing I would point out that for the "stop" case behaviour - I'm not sure you can do anything then - because I think you (the process) are stopped (maybe that happens when execution leaves the handler or maybe before it is entered - I'm not sure), I think you won't be able to do any thing until you get SIGCONT to continue - and you do not want to terminate on THAT precisely because it is used to recover from those signals that stopped you. However you might want to handle then, the fact that you may have lost data in the intervening period... Also, bear in mind that if you do get some of the term/core type signals you may not be even able to shutdown gracefully because, for instance, the memory assigned to your process has been corrupted, or you have invalid pointers being dereferenced (SIGSEGV springs to mind in that scope!) - indeed the purpose of dumping core is so that a post-mortem can be carried out on what was in the memory at the time - so trying to clean things up could be counter productive. In my attempt I tried to handle the "core" cases by restoring the default - kernel built-in - behaviour to do that cored dumping and then returning - so the signal would be immediately re-raised and then fire that default behaviour. |
What are the odds or frequency of such corruption? It should be rare, no? |
@SlySven , please comment on this darft PR #532 to improve signal handling. Note, I'll be adding comments so you don't have to go through the source line by line change. @Jibun-no-Kage , can you test the latest |
On it... I should be able to test this tomorrow. |
@Jibun-no-Kage can this issue now be closed? |
Oh, I did not realize it was still open. Yes. |
Old issue back again? PiGPIOd hanging count down on shutdown? Suspect the default unit file from the pigpiod master zip (download) is not correct?
Pi OS 11 Bullseye, updated and upgraded as of this post date https://www.raspberrypi.com/news/raspberry-pi-os-debian-bullseye/
Downloaded pigpiod-master.zip from website https://abyz.me.uk/rpi/pigpio/pigpiod.html
Noticed that the old issue of pigpiod.service stop is hanging the system shutdown/restart process.
Unit file from master zip, that does the shutdown/restart hang countdown...
`[Unit]
Description=Pigpio daemon
[Service]
Type=forking
PIDFile=pigpio.pid
ExecStart=/usr/local/bin/pigpiod
[Install]
WantedBy=multi-user.target
`
Unit file that does not exhibit the issue...
`[Unit]
Description=Pigpio daemon
[Service]
Type=forking
ExecStart=/usr/local/bin/pigpiod -l
ExecStop=/bin/systemctl kill pigpiod
[Install]
WantedBy=multi-user.target
`
Pretty much the key line is the ExecuteStop entry of course.
The text was updated successfully, but these errors were encountered: