-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Frequent crashes and spikes in RAM use, several critical errors in logs #3188
Comments
Thanks for the report - as a first step can you upgrade to 0.28.1 and confirm that the problem persists? |
@neilisfragile thanks, will do. |
@neilisfragile Since the upgrade to 0.28.1, things have indeed improved significantly. Still, I'm getting frequent connection problems (Riot complains "Connection to server has been lost"). However it's at least usable again now. When I grep for critical errors, I still get 3 kinds in the last 2 days:
There are also several ERRORs, e.g. |
I forgot to say that synapse periodically eats up all my RAM (4 GiB) and then apparently crashes and gets restarted every few minutes. (I've watched it for a while with |
I'm guessing that your errors are the symptom rather than the cause. Synapse is can be very memory hungry but your box spec should be fine (depending on exactly what you are doing with it). Can you send me logs of the time leading up to a crash? |
@neilisfragile thanks for reading and your input. Now that I watched it more carefully, I'm certain that the "connection problems" in the issue's title are due to Synapse crashing or spiking into >90% of RAM use. The funny thing is that in the last crashes I've observed now (with My box doesn't do too much else. Without Synapse running, RAM use is at about 525 MB, and I set the One curious WARNING I just found is this, not sure what to make of it:
|
Maybe to provide more context, sorry for not mentioning it yet: Synapse is running against postgres, and the client port behind apache2. |
Changed the title to more closely reflect what's going on. |
Okay thanks, let me take a closer look, I'll come back to you. I agree the Cloud Flare error is odd - I'm assuming is it coming from matrix.org (we use CF for ddos protection). Though there is no reason to think it relates to your instability. We have some performance improvements about to land rsn in response to resource constraints in running matrix.org - these will likely help the situation. |
For the record, it got worse again: I can't connect to my Synapse anymore in any meaningful way, Riot is constantly showing that it's connectivity with the server has been lost.
|
Hi, thanks for the update. My hunch is that the errors are symptoms of high load rather than a cause. It's not clear to me why you are seeing such extreme perf problems though - can you give a sense of the sort of rooms you are federating against - I'm interested in size and number rather than the specific room names. Our perf improvements have sat behind recent GDPR work, but are still very much in the works. Best I can suggest at this point is further reduction of SYNAPSE_CACHE_FACTOR |
Looking into SYNAPSE_CACHE_FACTOR a bit more closely, I'd actually suggest that you do the opposite of my advice and see what happens if you increase it. For well spec'd instances (such as yours) increasing the factor can actually reduce overall RAM load since Synapse spends less effort calculating results it could have retrieved from the cache. Suggest trying 1 and see what happens. |
@neilisfragile thanks for your help. I'm the only user on my server, so everything I do is in federated rooms. The breakdown of user number of joined room is: About 20 1:1 chats, 3x with 10-100 users, 7x 100-1k users, 3x with 2k-7k, and 1x with 15k. I'd be happy to leave the bigger ones to see if it helps, probably there's a server command to do so? I've set Due to the lack of other reports like mine, I suspect it's the unique state of my server. It was working fine for months, and I didn't add big rooms recently. |
You are not the only one ;) We have a federated synapse running, a room with ~45 users (in total, matrix + IRC) is bridged to IRC, and some users also use the IRC-bridge to connect to ~10 IRC channels. A month ago or so, synapse got very memory hungry. It eats up to 5 GB RAM (on a 6 GB VM) and gets killed, after increasing the memory to 8GB it get's killed after consuming 7 GB. Playing around with SYNAPSE_CACHE_FACTOR with values between 0.25 and 1 did not have any useful effect. |
@bjo81 ok great to know that this isn't just my own problem. Based on your experiments with adding RAM, I will now refrain from attempting to add swap space to my btrfs partition, thanks for the info. |
@JoKeyser We use Ubuntu Xenial as distro, and there are no real errors in the log:
|
Inspired by #1884 I changed the logging to only console, not journal or file (after adding Edit 1: And as short-term workaround, I consider to follow the suggestion in #3038 to leave some bigger rooms and then use the synapse janitor script from https://github.com/xwiki-labs/synapse_scripts... Edit 2: Unfortunately, it's back to unusable after all... cannot log in since a few hours. |
Interesting, I also used a config from #1884 now and the mem usage bounces between 1 und 5 GB, but no OOM yet. |
Yesterday I used the synapse janitor script, which deleted 28 unused rooms (that I forgot about). I also upgraded to Synapse 0.30.1. Unfortunately still no improvement after 12+ hours. Only the OOM repeats are faster now again, about every 2-3 minutes... Edit: Things actually improved eventually after several days; I can login about 10% of the time and receive some new messages and can send. OOMs still persist though. |
We are running our instance now in a VM with 12 GB RAM and synapses keeps running stable with a usage of 5 GB. |
For me, there was a relapse, constant OOMs and no ability to login anymore in between. Who knows, maybe it would work if I had more RAM, but as is, it's unusable. I'm out of my wits... is there a good way to backup the entire database and start fresh, with the same server keys? |
@JoKeyser did this ever get resolved? |
@richvdh thanks for asking. Unfortunately no, but I resorted to restore from an old backup, and since then it works fine since a few weeks. RAM use is much lower than it used to be, great work. I think this issue can be closed. |
Description
Since a few days, I notice frequent disconnects to my homeserver; mainly on Riot Android, but also some on Riot desktop.
In Synapse's logs, I find several errors that may hint at the problem(s), see below. Sorry that it's a fairly long dump, but I can't judge what parts are useful. Let me know if I can check/add more.
There are a bunch of ones like this:
Steps to reproduce
Running Synapse... :/.
Version information
If not matrix.org:
The text was updated successfully, but these errors were encountered: