Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Ensure the node is unavailable whenever max_clock_skew_usec bound may be violated #23279

Closed
1 task done
pao214 opened this issue Jul 24, 2024 · 1 comment
Closed
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@pao214
Copy link
Contributor

pao214 commented Jul 24, 2024

Jira Link: DB-12206

Description

Motivation

We use a high default value of 500ms for max_clock_skew_usec. Partly, the reason is that the tserver is late to detect that the clock skew is out of range.

In particular, the skew between two nodes goes undetected when they do not communicate. While all tservers communicate to the master, the roundtrip time to master is long (in the order of 100ms in a geo-distributed cluster) and much time may have passed during between two heartbeats. Moreover, without leases, tservers can still serve clients without heartbeat-ing to master.

Objective

Figure out a mechanism to ensure that no two tservers are available in the cluster whose clocks skew more than max_clock_skew_usec.

One approach is to leverage https://github.com/aws/clock-bound. A node is only available when 2 * |clock_error| < max_clock_skew_usec. This means that no two clocks on available nodes can skew beyond max_clock_skew_usec.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@pao214 pao214 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jul 24, 2024
@pao214 pao214 self-assigned this Jul 24, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 24, 2024
@pao214
Copy link
Contributor Author

pao214 commented Aug 15, 2024

NOTE: This isn't really possible unless we are using a Time Sync Service like AWS Time Sync Service.
Reason: TBD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
Status: Done
Development

No branches or pull requests

2 participants