-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Document how to back up a synapse server #2046
Comments
definitely interested in this, we're currently doing the things you mention (well, a little more 'verbose' in that I'm pulling /etc/synapse/*) |
Absolutely agree, also interesting is discourse's self backup function that simply asks you for your S3 credentials and does everything by itself. That would be perfect. Thanks for the great work. |
@richvdh any updates ? |
PRs welcome... |
I'm about to do this; moving a Synapse installation from a FreeBSD jail to a Linux container (same CPU architecture, so the data should be compatible). The app configuration and logging configuration is already managed by Ansible so that part is easy, as is the NGINX proxy in front of it and the TLS configuration. That leaves the database, media repository, and any keys for the server itself. Has anyone done this? |
this is a very interesting question.... |
Hey guys, I just moved my synapse instance and everything seems to work including message history and images uploaded in the past. I transferred:
|
Hi, Is there any progress with this? I'm setting up Synapse + Postgres with docker-compose, and I'm not sure about how to create self-consistent, live, automated backups. In my understanding, to obtain consistent backups the Synapse server should be put in read-only mode or stopped while taking the backup, to avoid that some file is changed while the backup is ongoing. Is this correct? If yes, how to do so for a docker-compose-based setup? I cannot run backup scripts on the host machine and must do everything from within the container. |
That's not correct actually, you don't need to turn off Synapse to make a consistent backup. If you use
Basically it runs the entire backup in a single transaction, so Postgres gives it a consistent view the entire time. If you're operating at a large scale, then making SQL dumps of your database is probably inefficient and too slow to restore, so you would probably be considering replication for your Postgres server (including having a hot standby). I can't really advise there myself as I'm not a database expert :-).
Curious; why not? At some level you're going to need to be able to |
@reivilibre thanks for the quick and very detailed answer. Let me add some points and clarify some others:
|
My two cents: if you want to avoid most inconsistencies while keeping the server running, you probably need to configure replication and/or snapshots for both the volume holding the media as well as the postgres database. I.e. take a filesystem snapshot (supported by e.g. btrfs) and backup that, then keep or throw away the snapshot; setup replication for postgres and take a backup of the slave at the exact same time (with replication stopped, obviously, or just stop the slave and take a fs snapshot). Different cloud platforms have different ways to aid with the process (e.g. Amazon Fargate, RDS). One could also think about using something like https://github.com/matrix-org/synapse-s3-storage-provider with S3 or something compatible, e.g. a min.io cluster, to achieve maximum data availability and integrity. There's a plethora of ways to solve the problem to different degrees, which are all out of scope of Synapse itself. Even if it all works, there's still a probability that some buffered/cached data hasn't been written or replicated yet. The question when it comes to backups is "what is good enough". Ideally you avoid ever needing a backup to begin with, which would require HA capabilities, which Synapse doesn't have (yet). |
This is true, but it's not a big deal — the only cost there is the wasted disk space if you restore from this backup and don't clean it out. You'll probably find it much easier to keep it simple. Even if you lose some media that are tracked in the database, it's not going to be the end of the world — you might get errors downloading that piece of media but other than that, nothing too bad will happen. @Iruwen makes some good points but I'd argue these are probably a lot more fiddly and complicated than many 'home users' care about — e.g. a loss of a few hours' worth of data isn't likely a big problem to me personally, so frequent |
@reivilibre thanks for the insights. I also understand and appreciate @Iruwen point of view, but I'd definitely keep it simple unless it might badly screw everything. Eventually loosing some media is not an issue for me, so I'd go with the plain pgsql dump (media are on Minio so I don't explicitly backup them). |
One should maybe note that the system doesn't fall apart when there are inconsistencies between media and its references stored in the datase, it'll just be missing. Otherwise things like event/media retention policies would be a much bigger issue. PS: replication is not a backup method - if you face any kind of data corruption, you'll end up with a distributed mess. |
I'm currently backing up The disk requirements are growing faster than I'd anticipated, so I was looking for documentation to tell me:
(I appreciate that remote resources can be withdrawn at any time, but I'm more interested in ensuring resources used for backups are used to be able to reestablish the local service.) Does the database similarly contain remote server content, and if so is there a way to take a selective dump of local content in such a way that remote content would be repopulated on demand? |
I am still new to the topic. Simply started to backup the listed and relevant files including the full Postgres db using pg_dumpall. The whole process including locations to backup will differ depending on how synapse is installed (from repository, docker, or in a virtualenv). I am not sure if and when I will need to restore the backups. I am afraid that will be quite some manual work. I found these pages with some useful details: https://www.gibiris.org/eo-blog/posts/2022/01/21_containterise-synapse-postgres.html and https://ems-docs.element.io/books/element-cloud-documentation/page/import-database-and-media-dump |
@gwire have you found an answer yet? because remote_content is crazy large in my server, and it doesn't make sense at all to back it up. |
We should give users some guidance on what they need to do to effectively back up and restore a synapse.
Off the top of my head:
The text was updated successfully, but these errors were encountered: