-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue Identified with Daemon and CentOS 7.4 #821
Comments
I got this issue with skynode aswell with one of our nodes and we are running Ubuntu 16.04,same kernel log, our solution was a re-installation of the daemon, but recently we got a unreposive machine again with other of our node |
A re-installation is not required. |
Yeah, this doesn't look like something that would be solved by reinstalling the daemon, it appears to be an issue with how docker is being utilized somewhere. @vipesz can you provide docker logs from the times you have above, as well as any daemon logs if they appear to contain information as well? There appears to be a docker issue that is similar, moby/moby#19758, maybe give that a read through and see if it matches what you're seeing? |
Kernel Log Analysis
Grepped through /var/log/messages*, found a docker container causing the issue with the log output:
Both of these time stamps are the same, associating the two issues together. A few moments later, with the same docker container, as requested, this is the output docker had:
This container afterwards was not problematic for this specific node and has been stable for 3 days... (this was after the cleanup process, which involved stopping the daemon, and stopping and rebuilding all server containers) Update: Additional Information
As indicated in the previous report;
|
Please use ``` around multiline sourcecode/logs for better formatting ;) |
Overlay fs on cent is very much experimental still. I recommend against is. I have seen fewer issues with xfs and no issues with zfs. This appears to be a Docker issue though. |
I'll see if changing the storage driver to overlay2 along with a kernel upgrade resolves this issue. I'll report back my findings after the holidays |
Updated to overlay2 with XFS d_type enabled and Kernel version 4.14, no problems since. I suggest putting this into the documentation for future reference. Docker Versions: |
Add Details Below:
This issue was identified by experiencing stability problems with the daemon after 90 days of uptime. Severe performance degradation took place, and effectively rendered the daemon unusable without a reboot. However, upon further investigation, and another report from another host, the issue has been identified as a much more severe issue that appears to be related to the way the daemon and docker inter-operate with each other...it's worth noting the following:
Kernel Logs from /var/log/messages
It is worth noting that the server throughout this time is not loaded anywhere near the thresholds for this to be appearing. Also, the server becomes completely unresponsive.
I have assumed that according to the logs, this primarily became a issue with docker and specifically the daemon (as seen here, [du:15885] assuming that the daemon uses this to determine the directory size of the container upon server initialization)
It is also worth noting that java is also not installed on the host machine, isolating the possibility of this being a issue with the host machine.
Upon further investigation and collaboration, we have also discovered this within the kernel logs of the time from when the server becomes unresponsive, will be placed in hastebin for easier readability.
https://hastebin.com/lituniledo.sql
Within this, one thing that appears to pop out most is this:
kernel: cache_from_obj: Wrong slab cache. kmalloc-256 but object is from kmem_cache(935:9cd527dd5e1a39ddc876e23563ac23e13244e42530eccbd9e3df1843d6433225)
We decided to investigate further, as this occurred right after docker has initialized a container, the issues appear right after docker initializes a container.
Additional Information
This is a active docker container.
I will be doing my own investigations into this to see if I can come up with anymore information regarding this and post it here, however, it is a issue that I feel should be given some attention because it has caused stability issues.
The text was updated successfully, but these errors were encountered: