-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PostgreSQL node failover #2293
Comments
Local TestA local testing for the DNS-based failover is done as follows: (on top of #2580)
diff --git a/edb/server/pgcon/pgcon.pyx b/edb/server/pgcon/pgcon.pyx
index 202c3b0e1..f46fa305b 100644
--- a/edb/server/pgcon/pgcon.pyx
+++ b/edb/server/pgcon/pgcon.pyx
@@ -207,6 +207,7 @@ async def _connect(connargs, dbname, ssl):
lambda: PGConnection(dbname, loop, connargs),
host=host, port=port)
_set_tcp_keepalive(trans)
+ print("connected to", trans.get_extra_info("peername"))
try:
await pgcon.connect()
import asyncio
import edgedb
async def job(pool):
while True:
async with pool.acquire() as conn:
async for tx in conn.with_retry_options(
edgedb.RetryOptions(60, lambda x: 1)
).retrying_transaction():
async with tx:
print(await tx.query("SELECT datetime_current()"))
await tx.execute("SELECT sys::_sleep(1)")
async def main():
async with edgedb.create_async_pool(
"_localdev", min_size=0, max_size=5
) as pool:
try:
jobs = [asyncio.create_task(job(pool)) for _ in range(5)]
await asyncio.wait(jobs)
finally:
print('end')
asyncio.run(main())
ConclusionEdgeDB could mostly handle DNS-based Postgres failover, because:
The missing part is:
AWS RDSFollowing basically the same steps above, AWS multi-az RDS failover with the AWS "Reboot With Failover" feature works fine with EdgeDB with #2580, with a small fix for macOS (requires Python 3.10rc1 for a TCP keepalive flag, bpo-34932); Linux should be fine. In short, "Reboot With Failover" simulates the master Postgres going down without sending any TCP RST packet on any alive connection. In this case TCP keepalive is necessary to capture the "failover" - it's double-quoted because of the same reason: this only means some connection to the master node is not healthy, we're not sure that there is a true failover. Though, it is sufficient for the test where all connections are down and we could cleanly failover to the replica. One possible solution for RDS to detect a failover is to use AWS API, but this requires a lot more configuration. |
In case of RDS failover (based on the Reboot With Failover test), TCP keepalive is needed to detect unhealthy connections to the failing master, and new connect attempts may raise TimeoutError before the new master is in position. This fixes the TCP keepalive feature on macOS and requires Python 3.10 refs bpo-34932. Refs geldata#2293.
In case of RDS failover (based on the Reboot With Failover test), TCP keepalive is needed to detect unhealthy connections to the failing master, and new connect attempts may raise TimeoutError before the new master is in position. This fixes the TCP keepalive feature on macOS and requires Python 3.10 refs bpo-34932. Refs #2293.
Fixed in #2920 |
See #1859 for related discussion.
The text was updated successfully, but these errors were encountered: