Don't write WAL to local disk before it's streamed to a synchronous standby #250

hlinnaka · 2021-06-08T10:52:04Z

Currently, WAL is always flushed to local disk first, and only then streamed to standby servers. If synchronous replication is used, we just don't acknowledge the commit to the client until the commit record has been streamed. If you kill the backend, the commit goes ahead, the locks are freed, and its effects become visible to other backends, even though it hasn't been streamed out yet. That's a bit sketchy.

knizhnik · 2022-02-15T13:39:39Z

Sorry, I do not understand how postponing write WAL to the local disk can help to solve the problem with "unapproved" commits. At the moment when we wait for sync replicas, transaction is already marked as committed in CLOG and releasing locks caused by backend termination will make it visible for everybody. How it relates with WAL writing?

We can postpone or even completely disable writing WAL to local disks. Instead of it, we can stream it directly to safekeepers. I have done such prototype in PgPro. But performance benefits were not so large. And amount of changes in Postgres core - not so small. And there will be much stronger dependency on speed of consuming WAL by safekeeper/pageserver. So this optimization is non-trivial and not obviously lead to better performance.

May be the easiest way to prevent described problem is to prohibit pg_terminate_backend. Looks like it is the only way to terminate one backend without restarting the whole cluster.

hlinnaka · 2022-02-15T13:53:32Z

Yeah, you're right, I mixed up two slightly different issues:

If you use pg_terminate_backend to kill a backend that has already written the commit record and is waiting for synchronous standby to acknowledge it, the transaction becomes visible to others even though it hasn't been replicated yet.
If you kill the server, e.g. with SIGQUIT, and restart it, when a transaction is in that same state, it becomes visible to others after startup, even though it hasn't been replicated yet.

Prohibiting pg_terminate_backend() fixes 1. Postponing the local WAL write fixes 2. We might need to do both.

knizhnik · 2022-02-15T14:30:47Z

Right now we are always restarting server from scratch, aren't we?
There is no way to restart server (using CLI) without loosing its local data.
What will happen if server is crashed or killed?
First of all it may not restart at all because local pg_wal contains invalid first segment.
Right now local restart is possible only in case of normal server termination - when checkpoint is completed.
If no checkpoint was performed since compute node start, then server will not be able to restart.
It seems to be the primary problem, which make problem 2) described in this ticket non-relevant.

So from my point of view the actual problem is how to detect server crashes and prevent local restarts rather than restart done by console. May be I am missing something, because I am not quite familiar with how console works.

hlinnaka · 2022-02-15T14:33:35Z

Yeah, this issue was purely about PostgreSQL. It's not a problem with zenith currently, because as you said, we always rebuild the server from scratch.

hlinnaka added the c/PostgreSQL Component: PostgreSQL features and bugs label Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't write WAL to local disk before it's streamed to a synchronous standby #250

Don't write WAL to local disk before it's streamed to a synchronous standby #250

hlinnaka commented Jun 8, 2021

knizhnik commented Feb 15, 2022 •

edited

Loading

hlinnaka commented Feb 15, 2022

knizhnik commented Feb 15, 2022

hlinnaka commented Feb 15, 2022

Don't write WAL to local disk before it's streamed to a synchronous standby #250

Don't write WAL to local disk before it's streamed to a synchronous standby #250

Comments

hlinnaka commented Jun 8, 2021

knizhnik commented Feb 15, 2022 • edited Loading

hlinnaka commented Feb 15, 2022

knizhnik commented Feb 15, 2022

hlinnaka commented Feb 15, 2022

knizhnik commented Feb 15, 2022 •

edited

Loading