-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Leverage AWS Clock Bound to reduce the number of read restarts. #21963
Labels
2.23.1_blocker
2024.2 Backport Required
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Comments
pao214
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Apr 13, 2024
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
labels
Apr 13, 2024
1 task
1 task
1 task
1 task
pao214
changed the title
[YSQL] Use AWS Time Sync Service to get better clock error bounds.
[YSQL] Use AWS Clock Bound to reduce the number of read restarts.
Aug 15, 2024
pao214
changed the title
[YSQL] Use AWS Clock Bound to reduce the number of read restarts.
[YSQL] Leverage AWS Clock Bound to reduce the number of read restarts.
Aug 15, 2024
1 task
pao214
added a commit
that referenced
this issue
Oct 9, 2024
…t errors. Summary: ### Motivation Prior to this revision, the physical clock uses a constant 500ms time window for the possible clock skew between any two nodes in the cluster. The skew is very conservative since it is a constant and we need to account for the worst case scenarios. This leads to an excessive number of read restart errors, see https://docs.yugabyte.com/preview/architecture/transactions/read-restart-error/. A better approach handles the clock error dynamically. This can be done by leveraging the AWS clockbound library. Since, the clock error is several orders of magnitude lower than the conservative constant bound, we raise much fewer read restart errors. In fact, the read latency improves significantly for the SQLStaleReadDetector yb-sample-apps workload. This revision improves clock precision. It also limits the impact of faulty clocks on the cluster since only those nodes that are out of sync crash. ### About Clockbound As mentioned above, we use the clockbound library to retrieve the uncertainty intervals for timestamps. Clockbound works in a server-client architecture where a clock-bound-d daemon is registered as a systemd service. This daemon requests chronyd for timestamp related information and publishes the clock accuracy information and clock synchronization status to shared memory. The clockbound client then computes the current timestamp uncertainty interval based on the information in the shared memory. NOTE: chronyd does not have sufficient information when using PTP. In such cases, clockbound augments clock error with error information from special device files. ### Configuration Configuring clockbound is a two-step process. 1. Configure the system to setup precise timestamps. 2. Configure the database to use these precise timestamps. #### System Configuration ``` [PHC available] sudo bash ./bin/configure_ptp.sh sudo bash ./bin/configure_clockbound.sh ``` #### Database Configuration Set tserver and master gFlag `time_source=clockbound`. #### yugabyted Configuration Autodetects AWS clusters and recommends configuring clockbound. Provides `--enhance_time_sync_via_clockbound` flag in `yugabyetd start` command. 1. Prechecks for chrony and clockbound configuration. 2. Configures the database with time_source=clockbound. 3. Autodetects PTP and configures clockbound_clock_error_estimate to an appropriate value. ### Design #### Clockbound Client The clockbound client library is compiled and packaged in the third party library repo. This is a library written in Rust that is linked to tserver and accessed through its C interface. #### Clockbound Clock Uses the clockbound library to get the uncertainty intervals. See the comment on clockbound_clock.cc for more information. #### Fault Tolerance Crash and, as a result, temporarily remove the node from Raft groups it is in when clocks go out of sync. This will prevent stale read anomalies. Crashing also prevents the node from killing other nodes in the cluster since it no longer propagates extremely skewed timestamps. #### Utilities Includes the following additional utilities 1. configure_ptp.sh - Installs network driver compiled with PHC. - Configures chrony to use PHC as refclock. 2. configure_clockbound.sh - Setup chrony to give accurate timestamp uncertainty intervals. - Setup clockbound agent. - Setup permissions. 3. clockbound_dump - Dumps the result of clockbound_now client side API. - Useful for computing clock error in external applications such as YBA. Jira: DB-10879 Test Plan: Jenkins: urgent, compile only ### Quick Benchmark (Not statistically significant) Ran the SqlStaleReadDetector workload that 1. Increments random counters in write threads. 2. Aggregates the counter values in the read thread. for 5mins and measures the number of restart read requests and the read latency per operation. | Measurement | WallClock | NtpClock | ClockboundClock | EST_ERROR=0 | NTP/PHC | PTP/PHC | |--------------------------|------------|----------------|------------------|--------------|----------|-----| | Restart Read Requests | ~5k | ~380 | ~70 | ~36 | ~5 | ~5 | | Latency (ms/op) | ~430 | ~150 | ~120 | ~105 | ~140* | ~150* | The latencies are measured on the client side. | **Wall Clock** | Current clock implementation. | | **Clockbound Clock** | Proposed wall clock compatible clock implementation. | | **EST_ERROR=0** | When using now=earliest, global_limit=latest where reference clock is in interval [earliest, latest]. | | **NTP/PHC** | Same but when running the database in the US N Virginia region where PHC is available. | | **PTP/PHC** | Same but using PTP for timestamps. | *Higher latency is expected with PHC since the client is present in Oregon and the database is running in N. Virginia. ### Other benchmarks Developed a few realistic apps in yb-sample-apps. 1. SqlEventCounter 2. SqlBankTransfers 3. SqlWarehouseStock 4. SqlMessageQueue 5. SqlConsistentHashing They all demonstrate a reduction of several orders of magnitude in read restart errors, reinforcing the value of using AWS Time Sync Service and clockbound. ### Failure Scenarios 1. When clockbound is not setup and user configures time_source=clockbound, The database fails to start with an error in tserver.err log. ``` F0826 17:47:53.453330 4432 hybrid_clock.cc:157] Couldn't get the current time: Clock unsynchronized. Status: IO error (yb/util/clockbound_time.cc:145): clockbound API failed with error: No such file or directory, and detail: open ... ``` 2. When selinux permissions are not set correctly for clockbound to access chronyd socket, The systemctl status shows an error ``` Aug 26 17:55:57 ip-10-9-10-243.us-west-2.compute.internal clockbound[32122]: 2024-08-26T17:55:57.318518Z ERROR ThreadId(02) /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/clock-bound-d-1.0.0/src/chrony_poller.rs:73: No reply from chronyd. Is it running? Error: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" } ``` Backport-through: 2024.2 Reviewers: sergei, mbautin, pjain Reviewed By: sergei, mbautin, pjain Subscribers: svc_phabricator, mbautin, sergei, rthallam, smishra, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D37365
pao214
added a commit
that referenced
this issue
Oct 9, 2024
…educe read restart errors. Summary: Original commit: 28f27ee / D37365 ### Motivation Prior to this revision, the physical clock uses a constant 500ms time window for the possible clock skew between any two nodes in the cluster. The skew is very conservative since it is a constant and we need to account for the worst case scenarios. This leads to an excessive number of read restart errors, see https://docs.yugabyte.com/preview/architecture/transactions/read-restart-error/. A better approach handles the clock error dynamically. This can be done by leveraging the AWS clockbound library. Since, the clock error is several orders of magnitude lower than the conservative constant bound, we raise much fewer read restart errors. In fact, the read latency improves significantly for the SQLStaleReadDetector yb-sample-apps workload. This revision improves clock precision. It also limits the impact of faulty clocks on the cluster since only those nodes that are out of sync crash. ### About Clockbound As mentioned above, we use the clockbound library to retrieve the uncertainty intervals for timestamps. Clockbound works in a server-client architecture where a clock-bound-d daemon is registered as a systemd service. This daemon requests chronyd for timestamp related information and publishes the clock accuracy information and clock synchronization status to shared memory. The clockbound client then computes the current timestamp uncertainty interval based on the information in the shared memory. NOTE: chronyd does not have sufficient information when using PTP. In such cases, clockbound augments clock error with error information from special device files. ### Configuration Configuring clockbound is a two-step process. 1. Configure the system to setup precise timestamps. 2. Configure the database to use these precise timestamps. #### System Configuration ``` [PHC available] sudo bash ./bin/configure_ptp.sh sudo bash ./bin/configure_clockbound.sh ``` #### Database Configuration Set tserver and master gFlag `time_source=clockbound`. #### yugabyted Configuration Autodetects AWS clusters and recommends configuring clockbound. Provides `--enhance_time_sync_via_clockbound` flag in `yugabyetd start` command. 1. Prechecks for chrony and clockbound configuration. 2. Configures the database with time_source=clockbound. 3. Autodetects PTP and configures clockbound_clock_error_estimate to an appropriate value. ### Design #### Clockbound Client The clockbound client library is compiled and packaged in the third party library repo. This is a library written in Rust that is linked to tserver and accessed through its C interface. #### Clockbound Clock Uses the clockbound library to get the uncertainty intervals. See the comment on clockbound_clock.cc for more information. #### Fault Tolerance Crash and, as a result, temporarily remove the node from Raft groups it is in when clocks go out of sync. This will prevent stale read anomalies. Crashing also prevents the node from killing other nodes in the cluster since it no longer propagates extremely skewed timestamps. #### Utilities Includes the following additional utilities 1. configure_ptp.sh - Installs network driver compiled with PHC. - Configures chrony to use PHC as refclock. 2. configure_clockbound.sh - Setup chrony to give accurate timestamp uncertainty intervals. - Setup clockbound agent. - Setup permissions. 3. clockbound_dump - Dumps the result of clockbound_now client side API. - Useful for computing clock error in external applications such as YBA. Jira: DB-10879 Test Plan: Jenkins: urgent ### Quick Benchmark (Not statistically significant) Ran the SqlStaleReadDetector workload that 1. Increments random counters in write threads. 2. Aggregates the counter values in the read thread. for 5mins and measures the number of restart read requests and the read latency per operation. | Measurement | WallClock | NtpClock | ClockboundClock | EST_ERROR=0 | NTP/PHC | PTP/PHC | |--------------------------|------------|----------------|------------------|--------------|----------|-----| | Restart Read Requests | ~5k | ~380 | ~70 | ~36 | ~5 | ~5 | | Latency (ms/op) | ~430 | ~150 | ~120 | ~105 | ~140* | ~150* | The latencies are measured on the client side. | **Wall Clock** | Current clock implementation. | | **Clockbound Clock** | Proposed wall clock compatible clock implementation. | | **EST_ERROR=0** | When using now=earliest, global_limit=latest where reference clock is in interval [earliest, latest]. | | **NTP/PHC** | Same but when running the database in the US N Virginia region where PHC is available. | | **PTP/PHC** | Same but using PTP for timestamps. | *Higher latency is expected with PHC since the client is present in Oregon and the database is running in N. Virginia. ### Other benchmarks Developed a few realistic apps in yb-sample-apps. 1. SqlEventCounter 2. SqlBankTransfers 3. SqlWarehouseStock 4. SqlMessageQueue 5. SqlConsistentHashing They all demonstrate a reduction of several orders of magnitude in read restart errors, reinforcing the value of using AWS Time Sync Service and clockbound. ### Failure Scenarios 1. When clockbound is not setup and user configures time_source=clockbound, The database fails to start with an error in tserver.err log. ``` F0826 17:47:53.453330 4432 hybrid_clock.cc:157] Couldn't get the current time: Clock unsynchronized. Status: IO error (yb/util/clockbound_time.cc:145): clockbound API failed with error: No such file or directory, and detail: open ... ``` 2. When selinux permissions are not set correctly for clockbound to access chronyd socket, The systemctl status shows an error ``` Aug 26 17:55:57 ip-10-9-10-243.us-west-2.compute.internal clockbound[32122]: 2024-08-26T17:55:57.318518Z ERROR ThreadId(02) /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/clock-bound-d-1.0.0/src/chrony_poller.rs:73: No reply from chronyd. Is it running? Error: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" } ``` Backport-through: 2024.2 Reviewers: sergei, mbautin, pjain Reviewed By: pjain Subscribers: ybase, yql, smishra, rthallam, sergei, mbautin, svc_phabricator Differential Revision: https://phorge.dev.yugabyte.com/D38858
Landed on master (2.23.1) and 2024.2 |
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 14, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 14, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 15, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 15, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 17, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 17, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 17, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 18, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 18, 2024
pao214
added a commit
to pao214/yugabyte-db
that referenced
this issue
Oct 18, 2024
pao214
added a commit
that referenced
this issue
Oct 18, 2024
1. Changes to manual deployment configuration with additional details added to clock sync setup. 2. Changes to database configuration after the system is setup with clockbound systemd service. 3. Changes to read restart error doc on additional recommendation about using the new clock.
pao214
added a commit
that referenced
this issue
Oct 22, 2024
Summary: _exported libs are not used anywhere. Remove them. Jira: DB-10879 Test Plan: Jenkins Reviewers: mbautin Reviewed By: mbautin Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D38526
pao214
added a commit
that referenced
this issue
Oct 23, 2024
Summary: time_source does not have any secrets. Call home info on time_source is useful. Also, time_source is a non-runtime flag. Jira: DB-10879 Test Plan: Jenkins Backport-through: 2024.2 Reviewers: hsunder, smishra Reviewed By: hsunder Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39031
pao214
added a commit
that referenced
this issue
Oct 25, 2024
Summary: ### Azure PHC Issue Azure VMs have hardware clocks too. However, we haven't figured out how we can use them yet. Currently, the clockbound configuration script fatals with the following error. ``` PHC is not available on eth0 ``` **Fix:** Configure PTP only when the script runs on an AWS machine. ### Missing policycoreutils package Install policycoreutils-devel explicitly. ### Yugabyted changes clockbound can now be used on any cloud provider. So, alter users with a warning when using Azure or GCP as well. Jira: DB-10879 Test Plan: Jenkins: compile only Ran ``` sudo bash ./bin/configure_clockbound.sh ``` on AWS, Azure, and GCP Reviewers: nikhil, sanketh Reviewed By: sanketh Differential Revision: https://phorge.dev.yugabyte.com/D39224
pao214
added a commit
that referenced
this issue
Oct 30, 2024
Summary: Original commit: d5c096f / D39031 time_source does not have any secrets. Call home info on time_source is useful. Also, time_source is a non-runtime flag. Jira: DB-10879 Test Plan: Jenkins: compile only Backport-through: 2024.2 Reviewers: hsunder, smishra Reviewed By: hsunder Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D39361
pao214
added a commit
that referenced
this issue
Dec 4, 2024
…viders Summary: Original commit: 689117b / D39224 ### Azure PHC Issue Azure VMs have hardware clocks too. However, we haven't figured out how we can use them yet. Currently, the clockbound configuration script fatals with the following error. ``` PHC is not available on eth0 ``` **Fix:** Configure PTP only when the script runs on an AWS machine. ### Missing policycoreutils package Install policycoreutils-devel explicitly. Jira: DB-10879 Test Plan: Jenkins: urgent, compile only Ran ``` sudo bash ./bin/configure_clockbound.sh ``` on AWS, Azure, and GCP Reviewers: nikhil, sanketh Reviewed By: sanketh Differential Revision: https://phorge.dev.yugabyte.com/D39464
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.23.1_blocker
2024.2 Backport Required
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-10879
Description
Motivation
Using timestamps to decide the order between events in a distributed system is tricky because there is an inherent clock skew between machines. YB uses a very conservative value for the clock skew. This makes it infeasible to wait out the clock skew to resolve event ordering issues.
Proposal
AWS provides a really tight error bound on clocks, see https://aws.amazon.com/blogs/compute/its-about-time-microsecond-accurate-clocks-on-amazon-ec2-instances/.
Roadmap [TBD]
max_clock_skew_usec
bound may be violated #23279Impact on hybrid time
NtpClock picks the earliest time in the uncertainty interval as the physical clock.
Properties of new hybrid time
Issue Type
kind/enhancement
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: