[YSQL] Ensure the node is unavailable whenever max_clock_skew_usec
bound may be violated
#23279
Closed
1 task done
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/bug
This issue is a bug
priority/medium
Medium priority issue
status/awaiting-triage
Issue awaiting triage
Jira Link: DB-12206
Description
Motivation
We use a high default value of 500ms for
max_clock_skew_usec
. Partly, the reason is that the tserver is late to detect that the clock skew is out of range.In particular, the skew between two nodes goes undetected when they do not communicate. While all tservers communicate to the master, the roundtrip time to master is long (in the order of 100ms in a geo-distributed cluster) and much time may have passed during between two heartbeats. Moreover, without leases, tservers can still serve clients without heartbeat-ing to master.
Objective
Figure out a mechanism to ensure that no two tservers are available in the cluster whose clocks skew more than
max_clock_skew_usec
.One approach is to leverage https://github.com/aws/clock-bound. A node is only available when 2 * |clock_error| <
max_clock_skew_usec
. This means that no two clocks on available nodes can skew beyondmax_clock_skew_usec
.Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: