You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
As of this morning, I'm unable to start my synapse homeserver.
Until today, I was running 0.18.2, and this morning I woke up unable to send messages. logs showed:
Jan 23 00:05:44 hypervisor01 docker[64033]: Logged from file matrixfederationclient.py, line 181
Jan 23 00:05:45 hypervisor01 docker[64033]: Traceback (most recent call last):
Jan 23 00:05:45 hypervisor01 docker[64033]: File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
Jan 23 00:05:45 hypervisor01 docker[64033]: msg = self.format(record)
Jan 23 00:05:45 hypervisor01 docker[64033]: File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
Jan 23 00:05:45 hypervisor01 docker[64033]: return fmt.format(record)
Jan 23 00:05:45 hypervisor01 docker[64033]: File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
Jan 23 00:05:45 hypervisor01 docker[64033]: s = self._fmt % record.__dict__
Jan 23 00:05:45 hypervisor01 docker[64033]: KeyError: 'request'
Repeating in a tight loop over and over again. After a restart, these errors continued, so I updated
to 0.19.0. After this, there was no exception, but my homeserver began chewing through CPU and
Memory pretty aggressively, without logging to file or stdout. I `strace`d the process which showed
a tight loop of the same syscalls over and over:
I would guess that implies it's in the same tight loop as above, but now is dying before even logging
the exception. I've dug a bit more, and discovered some OOM kills:
[Mon Feb 6 09:56:44 2017] postgres invoked oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=0
[Mon Feb 6 09:56:44 2017] postgres cpuset=13cdc1a32895a63fbd8a94bf859c4a3cb8d4f578af0f9063372eeec61a35301e mems_allowed=0
...
[Mon Feb 6 09:56:44 2017] Out of memory: Kill process 29061 (python) score 844 or sacrifice child
[Mon Feb 6 09:56:44 2017] Killed process 29061 (python) total-vm:13481740kB, anon-rss:8917388kB, file-rss:0kB
[Mon Feb 6 09:56:44 2017] Out of memory: Kill process 29061 (python) score 844 or sacrifice child
[Mon Feb 6 09:56:44 2017] Killed process 29061 (python) total-vm:13481740kB, anon-rss:8917388kB, file-rss:0kB
[Mon Feb 6 10:27:22 2017] Out of memory: Kill process 44763 (python) score 845 or sacrifice child
[Mon Feb 6 10:27:22 2017] Killed process 44763 (python) total-vm:12339696kB, anon-rss:9074316kB, file-rss:0kB
My dmesg logs are inexplicably ~1:30 off per [Mon Feb 6 11:58:55 2017] Mon Feb 6 10:23:53 PST 2017
from echo $(date) > /dev/kmsg, but basically the timeline works out that after my update, not at
~10:00 UTC, my synapse went off the rails. This made me look back at my logs to see when that first
exception started to crop up, and it has been happening for a long time, so maybe this is all a
red-herring.
This is very reminiscent to the issues I had in 0.16.x involving the events table, but it is
acting like this out of the gate – there is no /sync call to be done, it doesn't even bind to a
port.
Interestingly there it's in disk-sleep for a while, but I eventually was able to kill it. But, yeah,
basically I don't know what my homeserver is doing, and unsure where to start with it. I've attached
a pyflame which implies a loop somewhwere; it's missing the startup since I can't sanely make
pyflame trace synapse as a child process when synapse is in my Docker. I'm going to turn on postgres
logging tonight to see if I can figure out where in the startup it's going south.
It's not even getting to the point where it's running any database queries, I'm not sure at all what's going on. I'm going to try to strace it from startup and see if I can make sense of it.
rrix
changed the title
Synapse not starting, runaway memory/cpu usage
Synapse not starting, runaway memory/cpu usage when using incorrect logging configuration
Feb 8, 2017
As of this morning, I'm unable to start my synapse homeserver.
Until today, I was running 0.18.2, and this morning I woke up unable to send messages. logs showed:
Repeating in a tight loop over and over again. After a restart, these errors continued, so I updated
to 0.19.0. After this, there was no exception, but my homeserver began chewing through CPU and
Memory pretty aggressively, without logging to file or stdout. I `strace`d the process which showed
a tight loop of the same syscalls over and over:
I would guess that implies it's in the same tight loop as above, but now is dying before even logging
the exception. I've dug a bit more, and discovered some OOM kills:
My dmesg logs are inexplicably ~1:30 off per
[Mon Feb 6 11:58:55 2017] Mon Feb 6 10:23:53 PST 2017
from
echo $(date) > /dev/kmsg
, but basically the timeline works out that after my update, not at~10:00 UTC, my synapse went off the rails. This made me look back at my logs to see when that first
exception started to crop up, and it has been happening for a long time, so maybe this is all a
red-herring.
This is very reminiscent to the issues I had in
0.16.x
involving the events table, but it isacting like this out of the gate – there is no
/sync
call to be done, it doesn't even bind to aport.
Interestingly there it's in disk-sleep for a while, but I eventually was able to kill it. But, yeah,
basically I don't know what my homeserver is doing, and unsure where to start with it. I've attached
a pyflame which implies a loop somewhwere; it's missing the startup since I can't sanely make
pyflame trace synapse as a child process when synapse is in my Docker. I'm going to turn on postgres
logging tonight to see if I can figure out where in the startup it's going south.
pyflame.zip
The text was updated successfully, but these errors were encountered: