-
-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple qmemman issues #331
Conversation
First the main bug: when meminfo xenstore watch fires, in some cases (just after starting some domain) XS_Watcher refreshes internal list of domains before processing the event. This is done specifically to include new domain in there. But the opposite could happen too - the domain could be destroyed. In this case refres_meminfo() function raises an exception, which isn't handled and interrupts the whole xenstore watch loop. This issue is likely to be triggered by killing the domain, as this way it could disappear shortly after writing updated meminfo entry. In case of proper shutdown, meminfo-writer is stopped earlier and do not write updates just before domain destroy. Fix this by checking if the requested domain is still there just after refreshing the list. Then, catch exceptions in xenstore watch handling functions, to not interrupt xenstore watch loop. If it gets interrupted, qmemman basically stops memory balancing. And finally, clear force_refresh_domain_list flag after refreshing the domain list. That missing line caused domain refresh at every meminfo change, making it use some more CPU time. While at it, change "EOF" log message to something a bit more meaningful. Thanks @conorsch for capturing valuable logs. Fixes QubesOS/qubes-issues#4890
Codecov Report
@@ Coverage Diff @@
## master #331 +/- ##
==========================================
- Coverage 63.58% 63.51% -0.08%
==========================================
Files 50 50
Lines 8978 8988 +10
==========================================
Hits 5709 5709
- Misses 3269 3279 +10
Continue to review full report at Codecov.
|
Looks great! Have patched locally, will keep an eye on behavior and report back. |
OpenQA test summaryComplete test suite and dependencies: https://openqa.qubes-os.org/tests/7491#dependencies Failed tests
New failuresCompared to: https://openqa.qubes-os.org/tests/6362#dependencies
Fixed failuresCompared to: https://openqa.qubes-os.org/tests/6362#dependencies
|
For what it's worth: I've been running this patch locally for nearly a week, and I've yet to observe a single failure of the So, 👍 on merge from me! |
First the main bug: when meminfo xenstore watch fires, in some cases
(just after starting some domain) XS_Watcher refreshes internal list of
domains before processing the event. This is done specifically to
include new domain in there. But the opposite could happen too - the
domain could be destroyed. In this case refres_meminfo() function raises
an exception, which isn't handled and interrupts the whole xenstore
watch loop. This issue is likely to be triggered by killing the domain,
as this way it could disappear shortly after writing updated meminfo
entry. In case of proper shutdown, meminfo-writer is stopped earlier and
do not write updates just before domain destroy.
Fix this by checking if the requested domain is still there just after
refreshing the list.
Then, catch exceptions in xenstore watch handling functions, to not
interrupt xenstore watch loop. If it gets interrupted, qmemman basically
stops memory balancing.
And finally, clear force_refresh_domain_list flag after refreshing the
domain list. That missing line caused domain refresh at every meminfo
change, making it use some more CPU time.
Thanks @conorsch for capturing valuable logs.
Fixes QubesOS/qubes-issues#4890