Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#21963] YSQL: Leverage aws-clock-bound library to reduce read restar…
…t errors. Summary: ### Motivation Prior to this revision, the physical clock uses a constant 500ms time window for the possible clock skew between any two nodes in the cluster. The skew is very conservative since it is a constant and we need to account for the worst case scenarios. This leads to an excessive number of read restart errors, see https://docs.yugabyte.com/preview/architecture/transactions/read-restart-error/. A better approach handles the clock error dynamically. This can be done by leveraging the AWS clockbound library. Since, the clock error is several orders of magnitude lower than the conservative constant bound, we raise much fewer read restart errors. In fact, the read latency improves significantly for the SQLStaleReadDetector yb-sample-apps workload. This revision improves clock precision. It also limits the impact of faulty clocks on the cluster since only those nodes that are out of sync crash. ### About Clockbound As mentioned above, we use the clockbound library to retrieve the uncertainty intervals for timestamps. Clockbound works in a server-client architecture where a clock-bound-d daemon is registered as a systemd service. This daemon requests chronyd for timestamp related information and publishes the clock accuracy information and clock synchronization status to shared memory. The clockbound client then computes the current timestamp uncertainty interval based on the information in the shared memory. NOTE: chronyd does not have sufficient information when using PTP. In such cases, clockbound augments clock error with error information from special device files. ### Configuration Configuring clockbound is a two-step process. 1. Configure the system to setup precise timestamps. 2. Configure the database to use these precise timestamps. #### System Configuration ``` [PHC available] sudo bash ./bin/configure_ptp.sh sudo bash ./bin/configure_clockbound.sh ``` #### Database Configuration Set tserver and master gFlag `time_source=clockbound`. #### yugabyted Configuration Autodetects AWS clusters and recommends configuring clockbound. Provides `--enhance_time_sync_via_clockbound` flag in `yugabyetd start` command. 1. Prechecks for chrony and clockbound configuration. 2. Configures the database with time_source=clockbound. 3. Autodetects PTP and configures clockbound_clock_error_estimate to an appropriate value. ### Design #### Clockbound Client The clockbound client library is compiled and packaged in the third party library repo. This is a library written in Rust that is linked to tserver and accessed through its C interface. #### Clockbound Clock Uses the clockbound library to get the uncertainty intervals. See the comment on clockbound_clock.cc for more information. #### Fault Tolerance Crash and, as a result, temporarily remove the node from Raft groups it is in when clocks go out of sync. This will prevent stale read anomalies. Crashing also prevents the node from killing other nodes in the cluster since it no longer propagates extremely skewed timestamps. #### Utilities Includes the following additional utilities 1. configure_ptp.sh - Installs network driver compiled with PHC. - Configures chrony to use PHC as refclock. 2. configure_clockbound.sh - Setup chrony to give accurate timestamp uncertainty intervals. - Setup clockbound agent. - Setup permissions. 3. clockbound_dump - Dumps the result of clockbound_now client side API. - Useful for computing clock error in external applications such as YBA. Jira: DB-10879 Test Plan: Jenkins: urgent, compile only ### Quick Benchmark (Not statistically significant) Ran the SqlStaleReadDetector workload that 1. Increments random counters in write threads. 2. Aggregates the counter values in the read thread. for 5mins and measures the number of restart read requests and the read latency per operation. | Measurement | WallClock | NtpClock | ClockboundClock | EST_ERROR=0 | NTP/PHC | PTP/PHC | |--------------------------|------------|----------------|------------------|--------------|----------|-----| | Restart Read Requests | ~5k | ~380 | ~70 | ~36 | ~5 | ~5 | | Latency (ms/op) | ~430 | ~150 | ~120 | ~105 | ~140* | ~150* | The latencies are measured on the client side. | **Wall Clock** | Current clock implementation. | | **Clockbound Clock** | Proposed wall clock compatible clock implementation. | | **EST_ERROR=0** | When using now=earliest, global_limit=latest where reference clock is in interval [earliest, latest]. | | **NTP/PHC** | Same but when running the database in the US N Virginia region where PHC is available. | | **PTP/PHC** | Same but using PTP for timestamps. | *Higher latency is expected with PHC since the client is present in Oregon and the database is running in N. Virginia. ### Other benchmarks Developed a few realistic apps in yb-sample-apps. 1. SqlEventCounter 2. SqlBankTransfers 3. SqlWarehouseStock 4. SqlMessageQueue 5. SqlConsistentHashing They all demonstrate a reduction of several orders of magnitude in read restart errors, reinforcing the value of using AWS Time Sync Service and clockbound. ### Failure Scenarios 1. When clockbound is not setup and user configures time_source=clockbound, The database fails to start with an error in tserver.err log. ``` F0826 17:47:53.453330 4432 hybrid_clock.cc:157] Couldn't get the current time: Clock unsynchronized. Status: IO error (yb/util/clockbound_time.cc:145): clockbound API failed with error: No such file or directory, and detail: open ... ``` 2. When selinux permissions are not set correctly for clockbound to access chronyd socket, The systemctl status shows an error ``` Aug 26 17:55:57 ip-10-9-10-243.us-west-2.compute.internal clockbound[32122]: 2024-08-26T17:55:57.318518Z ERROR ThreadId(02) /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/clock-bound-d-1.0.0/src/chrony_poller.rs:73: No reply from chronyd. Is it running? Error: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" } ``` Backport-through: 2024.2 Reviewers: sergei, mbautin, pjain Reviewed By: sergei, mbautin, pjain Subscribers: svc_phabricator, mbautin, sergei, rthallam, smishra, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D37365
- Loading branch information