[YSQL] Leverage AWS Clock Bound to reduce the number of read restarts. #21963

pao214 · 2024-04-13T19:28:19Z

Description

Motivation

Using timestamps to decide the order between events in a distributed system is tricky because there is an inherent clock skew between machines. YB uses a very conservative value for the clock skew. This makes it infeasible to wait out the clock skew to resolve event ordering issues.

Proposal

AWS provides a really tight error bound on clocks, see https://aws.amazon.com/blogs/compute/its-about-time-microsecond-accurate-clocks-on-amazon-ec2-instances/.

Roadmap [TBD]

Document the impact of changing hybrid time mechanism to use NTP based physical clock.
Add an example workload to test read-after-commit-visibility (See [YSQL] Add yb-sample-apps workload(s) to benchmark restart read requests metric. #22537).
[YSQL] Ensure the node is unavailable whenever max_clock_skew_usec bound may be violated #23279
Crash when hybrid time exceeds global limit on receipt (See [YSQL] Detect violation of clock error bounds as determined by the NTP clock. #22538).
Create a YBA ticket to optimize the system for tighter clock skew bounds.
Metrics are already published for
- current hybrid time
- current error bound
- current clock skew

Impact on hybrid time

NtpClock picks the earliest time in the uncertainty interval as the physical clock.

Properties of new hybrid time

The true time forms an upper bound on the hybrid times across the nodes in the cluster. This follows from the fact that the hybrid time is actually the earliest possible time on some node and that earliest time is still lower than the true time. If anything, the true time has progressed since then.
This means that the true time is also a global limit. Consequently, the latest possible time is an easily computable global limit. Thus, we have a global limit without requiring any explicit coordination across nodes in the cluster (outside NTP of course).

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

The text was updated successfully, but these errors were encountered:

…t errors. Summary: ### Motivation Prior to this revision, the physical clock uses a constant 500ms time window for the possible clock skew between any two nodes in the cluster. The skew is very conservative since it is a constant and we need to account for the worst case scenarios. This leads to an excessive number of read restart errors, see https://docs.yugabyte.com/preview/architecture/transactions/read-restart-error/. A better approach handles the clock error dynamically. This can be done by leveraging the AWS clockbound library. Since, the clock error is several orders of magnitude lower than the conservative constant bound, we raise much fewer read restart errors. In fact, the read latency improves significantly for the SQLStaleReadDetector yb-sample-apps workload. This revision improves clock precision. It also limits the impact of faulty clocks on the cluster since only those nodes that are out of sync crash. ### About Clockbound As mentioned above, we use the clockbound library to retrieve the uncertainty intervals for timestamps. Clockbound works in a server-client architecture where a clock-bound-d daemon is registered as a systemd service. This daemon requests chronyd for timestamp related information and publishes the clock accuracy information and clock synchronization status to shared memory. The clockbound client then computes the current timestamp uncertainty interval based on the information in the shared memory. NOTE: chronyd does not have sufficient information when using PTP. In such cases, clockbound augments clock error with error information from special device files. ### Configuration Configuring clockbound is a two-step process. 1. Configure the system to setup precise timestamps. 2. Configure the database to use these precise timestamps. #### System Configuration ``` [PHC available] sudo bash ./bin/configure_ptp.sh sudo bash ./bin/configure_clockbound.sh ``` #### Database Configuration Set tserver and master gFlag `time_source=clockbound`. #### yugabyted Configuration Autodetects AWS clusters and recommends configuring clockbound. Provides `--enhance_time_sync_via_clockbound` flag in `yugabyetd start` command. 1. Prechecks for chrony and clockbound configuration. 2. Configures the database with time_source=clockbound. 3. Autodetects PTP and configures clockbound_clock_error_estimate to an appropriate value. ### Design #### Clockbound Client The clockbound client library is compiled and packaged in the third party library repo. This is a library written in Rust that is linked to tserver and accessed through its C interface. #### Clockbound Clock Uses the clockbound library to get the uncertainty intervals. See the comment on clockbound_clock.cc for more information. #### Fault Tolerance Crash and, as a result, temporarily remove the node from Raft groups it is in when clocks go out of sync. This will prevent stale read anomalies. Crashing also prevents the node from killing other nodes in the cluster since it no longer propagates extremely skewed timestamps. #### Utilities Includes the following additional utilities 1. configure_ptp.sh - Installs network driver compiled with PHC. - Configures chrony to use PHC as refclock. 2. configure_clockbound.sh - Setup chrony to give accurate timestamp uncertainty intervals. - Setup clockbound agent. - Setup permissions. 3. clockbound_dump - Dumps the result of clockbound_now client side API. - Useful for computing clock error in external applications such as YBA. Jira: DB-10879 Test Plan: Jenkins: urgent, compile only ### Quick Benchmark (Not statistically significant) Ran the SqlStaleReadDetector workload that 1. Increments random counters in write threads. 2. Aggregates the counter values in the read thread. for 5mins and measures the number of restart read requests and the read latency per operation. | Measurement | WallClock | NtpClock | ClockboundClock | EST_ERROR=0 | NTP/PHC | PTP/PHC | |--------------------------|------------|----------------|------------------|--------------|----------|-----| | Restart Read Requests | ~5k | ~380 | ~70 | ~36 | ~5 | ~5 | | Latency (ms/op) | ~430 | ~150 | ~120 | ~105 | ~140* | ~150* | The latencies are measured on the client side. | **Wall Clock** | Current clock implementation. | | **Clockbound Clock** | Proposed wall clock compatible clock implementation. | | **EST_ERROR=0** | When using now=earliest, global_limit=latest where reference clock is in interval [earliest, latest]. | | **NTP/PHC** | Same but when running the database in the US N Virginia region where PHC is available. | | **PTP/PHC** | Same but using PTP for timestamps. | *Higher latency is expected with PHC since the client is present in Oregon and the database is running in N. Virginia. ### Other benchmarks Developed a few realistic apps in yb-sample-apps. 1. SqlEventCounter 2. SqlBankTransfers 3. SqlWarehouseStock 4. SqlMessageQueue 5. SqlConsistentHashing They all demonstrate a reduction of several orders of magnitude in read restart errors, reinforcing the value of using AWS Time Sync Service and clockbound. ### Failure Scenarios 1. When clockbound is not setup and user configures time_source=clockbound, The database fails to start with an error in tserver.err log. ``` F0826 17:47:53.453330 4432 hybrid_clock.cc:157] Couldn't get the current time: Clock unsynchronized. Status: IO error (yb/util/clockbound_time.cc:145): clockbound API failed with error: No such file or directory, and detail: open ... ``` 2. When selinux permissions are not set correctly for clockbound to access chronyd socket, The systemctl status shows an error ``` Aug 26 17:55:57 ip-10-9-10-243.us-west-2.compute.internal clockbound[32122]: 2024-08-26T17:55:57.318518Z ERROR ThreadId(02) /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/clock-bound-d-1.0.0/src/chrony_poller.rs:73: No reply from chronyd. Is it running? Error: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" } ``` Backport-through: 2024.2 Reviewers: sergei, mbautin, pjain Reviewed By: sergei, mbautin, pjain Subscribers: svc_phabricator, mbautin, sergei, rthallam, smishra, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D37365

…educe read restart errors. Summary: Original commit: 28f27ee / D37365 ### Motivation Prior to this revision, the physical clock uses a constant 500ms time window for the possible clock skew between any two nodes in the cluster. The skew is very conservative since it is a constant and we need to account for the worst case scenarios. This leads to an excessive number of read restart errors, see https://docs.yugabyte.com/preview/architecture/transactions/read-restart-error/. A better approach handles the clock error dynamically. This can be done by leveraging the AWS clockbound library. Since, the clock error is several orders of magnitude lower than the conservative constant bound, we raise much fewer read restart errors. In fact, the read latency improves significantly for the SQLStaleReadDetector yb-sample-apps workload. This revision improves clock precision. It also limits the impact of faulty clocks on the cluster since only those nodes that are out of sync crash. ### About Clockbound As mentioned above, we use the clockbound library to retrieve the uncertainty intervals for timestamps. Clockbound works in a server-client architecture where a clock-bound-d daemon is registered as a systemd service. This daemon requests chronyd for timestamp related information and publishes the clock accuracy information and clock synchronization status to shared memory. The clockbound client then computes the current timestamp uncertainty interval based on the information in the shared memory. NOTE: chronyd does not have sufficient information when using PTP. In such cases, clockbound augments clock error with error information from special device files. ### Configuration Configuring clockbound is a two-step process. 1. Configure the system to setup precise timestamps. 2. Configure the database to use these precise timestamps. #### System Configuration ``` [PHC available] sudo bash ./bin/configure_ptp.sh sudo bash ./bin/configure_clockbound.sh ``` #### Database Configuration Set tserver and master gFlag `time_source=clockbound`. #### yugabyted Configuration Autodetects AWS clusters and recommends configuring clockbound. Provides `--enhance_time_sync_via_clockbound` flag in `yugabyetd start` command. 1. Prechecks for chrony and clockbound configuration. 2. Configures the database with time_source=clockbound. 3. Autodetects PTP and configures clockbound_clock_error_estimate to an appropriate value. ### Design #### Clockbound Client The clockbound client library is compiled and packaged in the third party library repo. This is a library written in Rust that is linked to tserver and accessed through its C interface. #### Clockbound Clock Uses the clockbound library to get the uncertainty intervals. See the comment on clockbound_clock.cc for more information. #### Fault Tolerance Crash and, as a result, temporarily remove the node from Raft groups it is in when clocks go out of sync. This will prevent stale read anomalies. Crashing also prevents the node from killing other nodes in the cluster since it no longer propagates extremely skewed timestamps. #### Utilities Includes the following additional utilities 1. configure_ptp.sh - Installs network driver compiled with PHC. - Configures chrony to use PHC as refclock. 2. configure_clockbound.sh - Setup chrony to give accurate timestamp uncertainty intervals. - Setup clockbound agent. - Setup permissions. 3. clockbound_dump - Dumps the result of clockbound_now client side API. - Useful for computing clock error in external applications such as YBA. Jira: DB-10879 Test Plan: Jenkins: urgent ### Quick Benchmark (Not statistically significant) Ran the SqlStaleReadDetector workload that 1. Increments random counters in write threads. 2. Aggregates the counter values in the read thread. for 5mins and measures the number of restart read requests and the read latency per operation. | Measurement | WallClock | NtpClock | ClockboundClock | EST_ERROR=0 | NTP/PHC | PTP/PHC | |--------------------------|------------|----------------|------------------|--------------|----------|-----| | Restart Read Requests | ~5k | ~380 | ~70 | ~36 | ~5 | ~5 | | Latency (ms/op) | ~430 | ~150 | ~120 | ~105 | ~140* | ~150* | The latencies are measured on the client side. | **Wall Clock** | Current clock implementation. | | **Clockbound Clock** | Proposed wall clock compatible clock implementation. | | **EST_ERROR=0** | When using now=earliest, global_limit=latest where reference clock is in interval [earliest, latest]. | | **NTP/PHC** | Same but when running the database in the US N Virginia region where PHC is available. | | **PTP/PHC** | Same but using PTP for timestamps. | *Higher latency is expected with PHC since the client is present in Oregon and the database is running in N. Virginia. ### Other benchmarks Developed a few realistic apps in yb-sample-apps. 1. SqlEventCounter 2. SqlBankTransfers 3. SqlWarehouseStock 4. SqlMessageQueue 5. SqlConsistentHashing They all demonstrate a reduction of several orders of magnitude in read restart errors, reinforcing the value of using AWS Time Sync Service and clockbound. ### Failure Scenarios 1. When clockbound is not setup and user configures time_source=clockbound, The database fails to start with an error in tserver.err log. ``` F0826 17:47:53.453330 4432 hybrid_clock.cc:157] Couldn't get the current time: Clock unsynchronized. Status: IO error (yb/util/clockbound_time.cc:145): clockbound API failed with error: No such file or directory, and detail: open ... ``` 2. When selinux permissions are not set correctly for clockbound to access chronyd socket, The systemctl status shows an error ``` Aug 26 17:55:57 ip-10-9-10-243.us-west-2.compute.internal clockbound[32122]: 2024-08-26T17:55:57.318518Z ERROR ThreadId(02) /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/clock-bound-d-1.0.0/src/chrony_poller.rs:73: No reply from chronyd. Is it running? Error: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" } ``` Backport-through: 2024.2 Reviewers: sergei, mbautin, pjain Reviewed By: pjain Subscribers: ybase, yql, smishra, rthallam, sergei, mbautin, svc_phabricator Differential Revision: https://phorge.dev.yugabyte.com/D38858

pao214 · 2024-10-11T18:47:40Z

Landed on master (2.23.1) and 2024.2

1. Changes to manual deployment configuration with additional details added to clock sync setup. 2. Changes to database configuration after the system is setup with clockbound systemd service. 3. Changes to read restart error doc on additional recommendation about using the new clock.

Summary: _exported libs are not used anywhere. Remove them. Jira: DB-10879 Test Plan: Jenkins Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D38526

Summary: time_source does not have any secrets. Call home info on time_source is useful. Also, time_source is a non-runtime flag. Jira: DB-10879 Test Plan: Jenkins Backport-through: 2024.2 Reviewers: hsunder, smishra Reviewed By: hsunder Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39031

Summary: ### Azure PHC Issue Azure VMs have hardware clocks too. However, we haven't figured out how we can use them yet. Currently, the clockbound configuration script fatals with the following error. ``` PHC is not available on eth0 ``` **Fix:** Configure PTP only when the script runs on an AWS machine. ### Missing policycoreutils package Install policycoreutils-devel explicitly. ### Yugabyted changes clockbound can now be used on any cloud provider. So, alter users with a warning when using Azure or GCP as well. Jira: DB-10879 Test Plan: Jenkins: compile only Ran ``` sudo bash ./bin/configure_clockbound.sh ``` on AWS, Azure, and GCP Reviewers: nikhil, sanketh Reviewed By: sanketh Differential Revision: https://phorge.dev.yugabyte.com/D39224

Summary: Original commit: d5c096f / D39031 time_source does not have any secrets. Call home info on time_source is useful. Also, time_source is a non-runtime flag. Jira: DB-10879 Test Plan: Jenkins: compile only Backport-through: 2024.2 Reviewers: hsunder, smishra Reviewed By: hsunder Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39361

…viders Summary: Original commit: 689117b / D39224 ### Azure PHC Issue Azure VMs have hardware clocks too. However, we haven't figured out how we can use them yet. Currently, the clockbound configuration script fatals with the following error. ``` PHC is not available on eth0 ``` **Fix:** Configure PTP only when the script runs on an AWS machine. ### Missing policycoreutils package Install policycoreutils-devel explicitly. Jira: DB-10879 Test Plan: Jenkins: urgent, compile only Ran ``` sudo bash ./bin/configure_clockbound.sh ``` on AWS, Azure, and GCP Reviewers: nikhil, sanketh Reviewed By: sanketh Differential Revision: https://phorge.dev.yugabyte.com/D39464

pao214 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Apr 13, 2024

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue labels Apr 13, 2024

pao214 mentioned this issue Apr 13, 2024

[YSQL] Provide user with option to avoid kReadRestart error with extra cost even if statement's output exceeds ysql_output_buffer_size #20336

Closed

1 task

pao214 added this to Wait-Queue Based Locking Apr 15, 2024

robertsami moved this to Pending in Wait-Queue Based Locking Apr 15, 2024

robertsami assigned pao214 Apr 15, 2024

pao214 mentioned this issue May 14, 2024

[YSQL] Use clock drift rate to upper bound hybrid timestamp across all tserver nodes. #21962

Closed

1 task

pao214 mentioned this issue May 24, 2024

[YSQL] Detect violation of clock error bounds as determined by the NTP clock. #22538

Closed

1 task

pao214 mentioned this issue Jun 19, 2024

[YSQL] Roadmap for mitigating read restart errors #22917

Open

1 task

pao214 mentioned this issue Jul 24, 2024

[YSQL] Use AWS clockbound daemon to get more precise error bounds for the clock skew #22540

Closed

1 task

pao214 changed the title ~~[YSQL] Use AWS Time Sync Service to get better clock error bounds.~~ [YSQL] Use AWS Clock Bound to reduce the number of read restarts. Aug 15, 2024

pao214 changed the title ~~[YSQL] Use AWS Clock Bound to reduce the number of read restarts.~~ [YSQL] Leverage AWS Clock Bound to reduce the number of read restarts. Aug 15, 2024

yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Aug 21, 2024

pao214 mentioned this issue Sep 4, 2024

[YSQL] Support rolling restart for NTP Clock #22539

Closed

1 task

pao214 moved this from Pending to In Progress in Wait-Queue Based Locking Sep 13, 2024

sushantrmishra added the 2024.2 Backport Required label Oct 1, 2024

yugabyte-ci added the 2.23.1_blocker label Oct 1, 2024

pao214 moved this from In Progress to Backporting in Wait-Queue Based Locking Oct 9, 2024

pao214 moved this from Backporting to Done in Wait-Queue Based Locking Oct 9, 2024

pao214 mentioned this issue Oct 11, 2024

[YSQL] Simulate clockbound daemon. #23475

Open

1 task

pao214 closed this as completed Oct 11, 2024

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 14, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

e423950

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 14, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

5389ee5

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 15, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

164240a

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 15, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

bb3f099

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 17, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

f76f199

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 17, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

c37b2aa

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 17, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

cffc1af

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 18, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

14efb03

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 18, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

80f79b4

pao214 added a commit to pao214/yugabyte-db that referenced this issue Oct 18, 2024

[yugabyte#21963] YSQL: Configure clockbound agent

2aec49b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Leverage AWS Clock Bound to reduce the number of read restarts. #21963

[YSQL] Leverage AWS Clock Bound to reduce the number of read restarts. #21963

pao214 commented Apr 13, 2024 •

edited

Loading

pao214 commented Oct 11, 2024

[YSQL] Leverage AWS Clock Bound to reduce the number of read restarts. #21963

[YSQL] Leverage AWS Clock Bound to reduce the number of read restarts. #21963

Comments

pao214 commented Apr 13, 2024 • edited Loading

Description

Motivation

Proposal

Impact on hybrid time

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

pao214 commented Oct 11, 2024

pao214 commented Apr 13, 2024 •

edited

Loading