-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resuming sync is slow and utilises a large amount of disk space #267
Comments
…anges and pg_logical_slot_get_changes #267
I have fixed this in the master branch so that we only fetch records from the logical replication slot in batched defined by PG_LOGICAL_SLOT_UPTO_NCHANGES which defaults to 5000. |
Though this fix does reduce the disk usage, though I'm not sure if the This query returns numerous results:
Whereas this query returns no results:
I believe this causes the |
i think you are right. I changed the logic on this one. Can you give it a try again please |
Hi @toluaina, Sorry for the delay, I've tested but I'm still having issues with this. I believe the problem might be with versions of Postgres <
The filtering done by pgsync on the logical slot seems to read through all WAL segments in order to filter by XID. I plan on upgrading my Postgres to 14 in the near future and will test and report back once done. |
Cool. Do let me know the outcome thanks. |
PGSync version: 2.1.11
Postgres version: 12.7
Elasticsearch version: 7.10
Redis version: 5.0.6
Python version: 3.8.7
Problem Description:
When resuming PGsync after it was stopped for a while, a query runs that looks like the following:
When a replication slot isn't synced for a while on a busy database and the WAL grows, the above query can take a very long time to complete (in my case, around 20 minutes). On top of the query being slow, it also utilises a large amount of disk space (via temp storage) on the Postgres server:
These large dip in disk space begins when the above query executes. In the above case, it caused PGsync to crash with the following error:
If I change the query to only return for one
xid
, it still takes more than 20 minutes to return. My uneducated guess that that Postgres must read through the entirety of the WAL on disk in order to filter out thexid
s that the query requires.In addition to the above, if the query does execute (after increasing the database disk space) PGSync might crash with the following exception depending on how much data is sitting in the WAL:
All 3.79GB of memory allocated to my PGSync container is exhausted as there are simply too many records returned at once.
The text was updated successfully, but these errors were encountered: