Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Document how to back up a synapse server #2046

Open
richvdh opened this issue Mar 22, 2017 · 20 comments
Open

Document how to back up a synapse server #2046

richvdh opened this issue Mar 22, 2017 · 20 comments
Assignees
Labels
A-Docs things relating to the documentation T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution

Comments

@richvdh
Copy link
Member

richvdh commented Mar 22, 2017

We should give users some guidance on what they need to do to effectively back up and restore a synapse.

Off the top of my head:

  • database
  • media repo
  • homeserver.yaml
  • log config, where present
  • signing key (maybe, but it's fine just to use a new one)
@seanenck
Copy link
Contributor

definitely interested in this, we're currently doing the things you mention (well, a little more 'verbose' in that I'm pulling /etc/synapse/*)

@nordurljosahvida
Copy link

Absolutely agree, also interesting is discourse's self backup function that simply asks you for your S3 credentials and does everything by itself. That would be perfect. Thanks for the great work.

@neilisfragile neilisfragile added the A-Docs things relating to the documentation label Mar 20, 2018
@ghost
Copy link

ghost commented Apr 4, 2020

@richvdh any updates ?

@richvdh
Copy link
Member Author

richvdh commented Apr 6, 2020

PRs welcome...

@kpfleming
Copy link

I'm about to do this; moving a Synapse installation from a FreeBSD jail to a Linux container (same CPU architecture, so the data should be compatible). The app configuration and logging configuration is already managed by Ansible so that part is easy, as is the NGINX proxy in front of it and the TLS configuration.

That leaves the database, media repository, and any keys for the server itself. Has anyone done this?

@DamianoP
Copy link

DamianoP commented Jul 8, 2020

this is a very interesting question....

@krystiancha
Copy link

Has anyone done this?

Hey guys, I just moved my synapse instance and everything seems to work including message history and images uploaded in the past.

I transferred:

  • config: /etc/synapse/homeserver.yaml and /etc/synapse/log_config.yaml
  • database: /var/lib/postgresql
  • signing key: /etc/synapse/foo.bar.signing.key
  • media repo: /var/lib/synapse

@reivilibre reivilibre self-assigned this Aug 2, 2021
@reivilibre reivilibre added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Aug 3, 2021
@nicolamori
Copy link

Hi, Is there any progress with this? I'm setting up Synapse + Postgres with docker-compose, and I'm not sure about how to create self-consistent, live, automated backups. In my understanding, to obtain consistent backups the Synapse server should be put in read-only mode or stopped while taking the backup, to avoid that some file is changed while the backup is ongoing. Is this correct? If yes, how to do so for a docker-compose-based setup? I cannot run backup scripts on the host machine and must do everything from within the container.
Sorry for the probably dumb question but I'm a newcomer and I can't find any clear indication or example about this.

@reivilibre
Copy link
Contributor

That's not correct actually, you don't need to turn off Synapse to make a consistent backup.

If you use pg_dump, you'll note from its manual that it says https://www.postgresql.org/docs/12/app-pgdump.html

pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).

Basically it runs the entire backup in a single transaction, so Postgres gives it a consistent view the entire time.
(However I do recommend restoring the backups offline and into a fresh, empty database with the correct locale settings. Be very careful not to restore into a database that already has tables present as this has led to issues in the past.)

If you're operating at a large scale, then making SQL dumps of your database is probably inefficient and too slow to restore, so you would probably be considering replication for your Postgres server (including having a hot standby). I can't really advise there myself as I'm not a database expert :-).

I cannot run backup scripts on the host machine and must do everything from within the container.

Curious; why not?

At some level you're going to need to be able to pg_dump your database and make some copies of your media store (and then probably put those backups somewhere so that you're not going to get messed up by a disk failure).
I don't run databases in Docker so I'm not really sure, but I imagine the Docker way here is to have a container whose job is to run pg_dump and save the output somewhere.
Maybe someone can chime in with how they do this in their docker-compose setup? Or perhaps you can find some example online; backing up a Postgres database is not Synapse-specific.

@nicolamori
Copy link

@reivilibre thanks for the quick and very detailed answer. Let me add some points and clarify some others:

  • I know about pg_dump, but what about e.g. the media files? Suppose that for taking a full backup I first run pg_dump and then make a tarball of the media folder, and that a user uploads an image from its client in the middle. The backup will then contain the media file but not the database entry (I assume that uploaded images are registered in the DB). Would restoring from this backup lead to an inconsistent Synapse state?
  • Thanks for the tip about the possible DB dump inefficiency; I'll start simple and eventually review the procedure should this become too slow at my scale
  • Strictly speaking it's not that I cannot run scripts on the host VM; just I prefer to have a fully-dockerized deployment to be able to quickly migrate it to another "lightly configured" VM (i.e. with just docker installed and no Matrix-specific configuration).
  • Currently I run a modified postgres image with a cron job executing regular pg_dump runs and uploading dumps to S3 via rclone. All of this happens inside the postgres container, so there's no need to

@Iruwen
Copy link

Iruwen commented Jun 15, 2022

My two cents: if you want to avoid most inconsistencies while keeping the server running, you probably need to configure replication and/or snapshots for both the volume holding the media as well as the postgres database. I.e. take a filesystem snapshot (supported by e.g. btrfs) and backup that, then keep or throw away the snapshot; setup replication for postgres and take a backup of the slave at the exact same time (with replication stopped, obviously, or just stop the slave and take a fs snapshot). Different cloud platforms have different ways to aid with the process (e.g. Amazon Fargate, RDS). One could also think about using something like https://github.com/matrix-org/synapse-s3-storage-provider with S3 or something compatible, e.g. a min.io cluster, to achieve maximum data availability and integrity. There's a plethora of ways to solve the problem to different degrees, which are all out of scope of Synapse itself. Even if it all works, there's still a probability that some buffered/cached data hasn't been written or replicated yet. The question when it comes to backups is "what is good enough". Ideally you avoid ever needing a backup to begin with, which would require HA capabilities, which Synapse doesn't have (yet).

@reivilibre
Copy link
Contributor

@nicolamori

The backup will then contain the media file but not the database entry (I assume that uploaded images are registered in the DB). Would restoring from this backup lead to an inconsistent Synapse state?

This is true, but it's not a big deal — the only cost there is the wasted disk space if you restore from this backup and don't clean it out.
If you back up your database first and then 'rsync' your media directory somewhere, your database will be consistent and Synapse won't necessarily ever notice.
If you do the other way around, you might lose some media files that then have DB entries, but it may not be a big deal for your use case.

You'll probably find it much easier to keep it simple. Even if you lose some media that are tracked in the database, it's not going to be the end of the world — you might get errors downloading that piece of media but other than that, nothing too bad will happen.

@Iruwen makes some good points but I'd argue these are probably a lot more fiddly and complicated than many 'home users' care about — e.g. a loss of a few hours' worth of data isn't likely a big problem to me personally, so frequent pg_dumps are fine for me and I haven't bothered with database replication or storing media on a redundant cluster like minio.

@nicolamori
Copy link

@reivilibre thanks for the insights. I also understand and appreciate @Iruwen point of view, but I'd definitely keep it simple unless it might badly screw everything. Eventually loosing some media is not an issue for me, so I'd go with the plain pgsql dump (media are on Minio so I don't explicitly backup them).

@Iruwen
Copy link

Iruwen commented Jun 16, 2022

One should maybe note that the system doesn't fall apart when there are inconsistencies between media and its references stored in the datase, it'll just be missing. Otherwise things like event/media retention policies would be a much bigger issue.

PS: replication is not a backup method - if you face any kind of data corruption, you'll end up with a distributed mess.

@richvdh richvdh added the Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution label Jul 29, 2022
@gwire
Copy link

gwire commented Aug 2, 2022

I'm currently backing up /etc/matrix-synapse/, a dump of the database, and the media directories.

The disk requirements are growing faster than I'd anticipated, so I was looking for documentation to tell me:

  • is it safe to skip backing up url_cache_thumbnails and url_cache? (the cache in the name suggests so) will these be repopulated if seen by clients?

  • is it safe to skip backing up *_thumbnails? If absent, will the server recalculate these files on demand?

  • is it safe to skip backing up remote_content? If absent, will the server repopulate these files on demand?

(I appreciate that remote resources can be withdrawn at any time, but I'm more interested in ensuring resources used for backups are used to be able to reestablish the local service.)

Does the database similarly contain remote server content, and if so is there a way to take a selective dump of local content in such a way that remote content would be repopulated on demand?

@youphyun
Copy link

youphyun commented Aug 31, 2022

I am still new to the topic. Simply started to backup the listed and relevant files including the full Postgres db using pg_dumpall. The whole process including locations to backup will differ depending on how synapse is installed (from repository, docker, or in a virtualenv). I am not sure if and when I will need to restore the backups. I am afraid that will be quite some manual work. I found these pages with some useful details: https://www.gibiris.org/eo-blog/posts/2022/01/21_containterise-synapse-postgres.html and https://ems-docs.element.io/books/element-cloud-documentation/page/import-database-and-media-dump
One additional question to the media repo, will simply restoring the /media_store folder work or should it be rather done using the export and import Synapse API calls?

@FarisZR
Copy link

FarisZR commented Oct 14, 2023

I'm currently backing up /etc/matrix-synapse/, a dump of the database, and the media directories.

The disk requirements are growing faster than I'd anticipated, so I was looking for documentation to tell me:

  • is it safe to skip backing up url_cache_thumbnails and url_cache? (the cache in the name suggests so) will these be repopulated if seen by clients?
  • is it safe to skip backing up *_thumbnails? If absent, will the server recalculate these files on demand?
  • is it safe to skip backing up remote_content? If absent, will the server repopulate these files on demand?

(I appreciate that remote resources can be withdrawn at any time, but I'm more interested in ensuring resources used for backups are used to be able to reestablish the local service.)

Does the database similarly contain remote server content, and if so is there a way to take a selective dump of local content in such a way that remote content would be repopulated on demand?

@gwire have you found an answer yet? because remote_content is crazy large in my server, and it doesn't make sense at all to back it up.

@gwire

This comment was marked as off-topic.

@kpfleming

This comment was marked as off-topic.

@gwire

This comment was marked as off-topic.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Docs things relating to the documentation T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution
Projects
None yet
Development

No branches or pull requests