From 28f27ee22bdff7e1460301a634eb786eff512256 Mon Sep 17 00:00:00 2001 From: Bvsk Patnaik Date: Fri, 16 Aug 2024 21:43:18 +0000 Subject: [PATCH] [#21963] YSQL: Leverage aws-clock-bound library to reduce read restart errors. Summary: ### Motivation Prior to this revision, the physical clock uses a constant 500ms time window for the possible clock skew between any two nodes in the cluster. The skew is very conservative since it is a constant and we need to account for the worst case scenarios. This leads to an excessive number of read restart errors, see https://docs.yugabyte.com/preview/architecture/transactions/read-restart-error/. A better approach handles the clock error dynamically. This can be done by leveraging the AWS clockbound library. Since, the clock error is several orders of magnitude lower than the conservative constant bound, we raise much fewer read restart errors. In fact, the read latency improves significantly for the SQLStaleReadDetector yb-sample-apps workload. This revision improves clock precision. It also limits the impact of faulty clocks on the cluster since only those nodes that are out of sync crash. ### About Clockbound As mentioned above, we use the clockbound library to retrieve the uncertainty intervals for timestamps. Clockbound works in a server-client architecture where a clock-bound-d daemon is registered as a systemd service. This daemon requests chronyd for timestamp related information and publishes the clock accuracy information and clock synchronization status to shared memory. The clockbound client then computes the current timestamp uncertainty interval based on the information in the shared memory. NOTE: chronyd does not have sufficient information when using PTP. In such cases, clockbound augments clock error with error information from special device files. ### Configuration Configuring clockbound is a two-step process. 1. Configure the system to setup precise timestamps. 2. Configure the database to use these precise timestamps. #### System Configuration ``` [PHC available] sudo bash ./bin/configure_ptp.sh sudo bash ./bin/configure_clockbound.sh ``` #### Database Configuration Set tserver and master gFlag `time_source=clockbound`. #### yugabyted Configuration Autodetects AWS clusters and recommends configuring clockbound. Provides `--enhance_time_sync_via_clockbound` flag in `yugabyetd start` command. 1. Prechecks for chrony and clockbound configuration. 2. Configures the database with time_source=clockbound. 3. Autodetects PTP and configures clockbound_clock_error_estimate to an appropriate value. ### Design #### Clockbound Client The clockbound client library is compiled and packaged in the third party library repo. This is a library written in Rust that is linked to tserver and accessed through its C interface. #### Clockbound Clock Uses the clockbound library to get the uncertainty intervals. See the comment on clockbound_clock.cc for more information. #### Fault Tolerance Crash and, as a result, temporarily remove the node from Raft groups it is in when clocks go out of sync. This will prevent stale read anomalies. Crashing also prevents the node from killing other nodes in the cluster since it no longer propagates extremely skewed timestamps. #### Utilities Includes the following additional utilities 1. configure_ptp.sh - Installs network driver compiled with PHC. - Configures chrony to use PHC as refclock. 2. configure_clockbound.sh - Setup chrony to give accurate timestamp uncertainty intervals. - Setup clockbound agent. - Setup permissions. 3. clockbound_dump - Dumps the result of clockbound_now client side API. - Useful for computing clock error in external applications such as YBA. Jira: DB-10879 Test Plan: Jenkins: urgent, compile only ### Quick Benchmark (Not statistically significant) Ran the SqlStaleReadDetector workload that 1. Increments random counters in write threads. 2. Aggregates the counter values in the read thread. for 5mins and measures the number of restart read requests and the read latency per operation. | Measurement | WallClock | NtpClock | ClockboundClock | EST_ERROR=0 | NTP/PHC | PTP/PHC | |--------------------------|------------|----------------|------------------|--------------|----------|-----| | Restart Read Requests | ~5k | ~380 | ~70 | ~36 | ~5 | ~5 | | Latency (ms/op) | ~430 | ~150 | ~120 | ~105 | ~140* | ~150* | The latencies are measured on the client side. | **Wall Clock** | Current clock implementation. | | **Clockbound Clock** | Proposed wall clock compatible clock implementation. | | **EST_ERROR=0** | When using now=earliest, global_limit=latest where reference clock is in interval [earliest, latest]. | | **NTP/PHC** | Same but when running the database in the US N Virginia region where PHC is available. | | **PTP/PHC** | Same but using PTP for timestamps. | *Higher latency is expected with PHC since the client is present in Oregon and the database is running in N. Virginia. ### Other benchmarks Developed a few realistic apps in yb-sample-apps. 1. SqlEventCounter 2. SqlBankTransfers 3. SqlWarehouseStock 4. SqlMessageQueue 5. SqlConsistentHashing They all demonstrate a reduction of several orders of magnitude in read restart errors, reinforcing the value of using AWS Time Sync Service and clockbound. ### Failure Scenarios 1. When clockbound is not setup and user configures time_source=clockbound, The database fails to start with an error in tserver.err log. ``` F0826 17:47:53.453330 4432 hybrid_clock.cc:157] Couldn't get the current time: Clock unsynchronized. Status: IO error (yb/util/clockbound_time.cc:145): clockbound API failed with error: No such file or directory, and detail: open ... ``` 2. When selinux permissions are not set correctly for clockbound to access chronyd socket, The systemctl status shows an error ``` Aug 26 17:55:57 ip-10-9-10-243.us-west-2.compute.internal clockbound[32122]: 2024-08-26T17:55:57.318518Z ERROR ThreadId(02) /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/clock-bound-d-1.0.0/src/chrony_poller.rs:73: No reply from chronyd. Is it running? Error: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" } ``` Backport-through: 2024.2 Reviewers: sergei, mbautin, pjain Reviewed By: sergei, mbautin, pjain Subscribers: svc_phabricator, mbautin, sergei, rthallam, smishra, yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D37365 --- CMakeLists.txt | 7 +- bin/configure_clockbound.sh | 412 +++++++++++++++ bin/configure_ptp.sh | 168 ++++++ bin/yugabyted | 110 +++- cmake_modules/FindClockbound.cmake | 38 ++ cmake_modules/YugabyteFindThirdParty.cmake | 18 +- src/yb/server/CMakeLists.txt | 2 + src/yb/server/clockbound_clock.cc | 484 ++++++++++++++++++ src/yb/server/clockbound_clock.h | 24 + src/yb/server/hybrid_clock.cc | 14 +- src/yb/tools/CMakeLists.txt | 4 + src/yb/tools/clockbound_dump.cc | 153 ++++++ src/yb/tserver/server_main_util.cc | 2 + src/yb/util/monotime.h | 6 + src/yb/util/physical_time.cc | 6 + src/yb/util/physical_time.h | 6 +- src/yb/yql/pggate/ybc_pggate.cc | 2 + src/yb/yql/pgwrapper/CMakeLists.txt | 1 + src/yb/yql/pgwrapper/clockbound_clock-test.cc | 198 +++++++ yb_release_manifest.json | 3 + .../cmd/server/handlers/api_cluster_info.go | 3 + .../clusters/details/alerts/alerts.ts | 5 + 22 files changed, 1653 insertions(+), 13 deletions(-) create mode 100644 bin/configure_clockbound.sh create mode 100644 bin/configure_ptp.sh create mode 100644 cmake_modules/FindClockbound.cmake create mode 100644 src/yb/server/clockbound_clock.cc create mode 100644 src/yb/server/clockbound_clock.h create mode 100644 src/yb/tools/clockbound_dump.cc create mode 100644 src/yb/yql/pgwrapper/clockbound_clock-test.cc diff --git a/CMakeLists.txt b/CMakeLists.txt index 78548cefc6ee..acb85cf4a97d 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -627,7 +627,7 @@ endif() function(ADD_THIRDPARTY_LIB LIB_NAME) set(options) set(one_value_args SHARED_LIB STATIC_LIB) - set(multi_value_args DEPS) + set(multi_value_args DEPS INCLUDE_DIRS) cmake_parse_arguments(ARG "${options}" "${one_value_args}" "${multi_value_args}" ${ARGN}) if(ARG_UNPARSED_ARGUMENTS) message(SEND_ERROR "Error: unrecognized arguments: ${ARG_UNPARSED_ARGUMENTS}") @@ -653,6 +653,11 @@ function(ADD_THIRDPARTY_LIB LIB_NAME) PROPERTIES IMPORTED_LINK_INTERFACE_LIBRARIES "${ARG_DEPS}") endif() + if (ARG_INCLUDE_DIRS) + target_include_directories(${LIB_NAME} + SYSTEM INTERFACE "${ARG_INCLUDE_DIRS}") + endif() + # Set up an "exported variant" for this thirdparty library (see "Visibility" # above). It's the same as the real target, just with an "_exported" suffix. # We prefer the static archive if it exists (as it's akin to an "internal" diff --git a/bin/configure_clockbound.sh b/bin/configure_clockbound.sh new file mode 100644 index 000000000000..d0a7106586f2 --- /dev/null +++ b/bin/configure_clockbound.sh @@ -0,0 +1,412 @@ +#!/usr/bin/env bash + +# Copyright (c) YugabyteDB, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. +# + +# This script configures chronyd and clockbound for AWS EC2 instances. +# Supported operating systems: AlmaLinux, Red Hat, and Ubuntu. +# Requires internet access and root privileges during configuration. +# No internet or root access necesary for validation. + +# shellcheck disable=SC2086 + +# Exit on errors. +set -euo pipefail + +# Default values +VERBOSE=0 +VALIDATE_ONLY=0 +OS_RELEASE="" +CHRONY_USER="" + +# Parse arguments +while [[ $# -gt 0 ]]; do + case "$1" in + --verbose) + VERBOSE=1 + ;; + --validate) + VALIDATE_ONLY=1 + ;; + *) + echo "Usage: $0 [--verbose] [--validate]" + exit 1 + ;; + esac + shift +done + +# Log functions. +log_empty_line() { + echo >&2 +} + +log_warn() { + local _log_level="warn" + log "$@" +} + +log_error() { + local _log_level="error" + log "$@" +} + +get_timestamp() { + date +%Y-%m-%d_%H_%M_%S +} + +# This just logs to stderr. +log() { + BEGIN_COLOR='\033[0;32m' + END_COLOR='\033[0m' + + case ${_log_level:-info} in + error) + BEGIN_COLOR='\033[0;31m' + shift + ;; + warn) + BEGIN_COLOR='\033[0;33m' + shift + ;; + esac + echo -e "${BEGIN_COLOR}[$( get_timestamp ) ${BASH_SOURCE[1]##*/}:${BASH_LINENO[0]} \ + ${FUNCNAME[1]}]${END_COLOR}" "$@" >&2 + +} + +fatal() { + log "$@" + exit 1 +} + +# Prints messages only in verbose mode. +log_verbose() { + if [[ "${VERBOSE}" -eq 1 ]]; then + log "$@" + fi +} + +# Detects OS and stores in OS_RELEASE variable. +detect_os() { + ETC_RELEASE_FILE=/etc/os-release + if [[ -f "${ETC_RELEASE_FILE}" ]]; then + OS_RELEASE=$(grep '^ID=' "${ETC_RELEASE_FILE}" | cut -d '=' -f 2 | tr -d '"') + else + fatal "${ETC_RELEASE_FILE} does not exist." + fi +} + +# Checks script arguments and user permissions. +prechecks() { + # Enable verbose mode if requested + if [[ "${VERBOSE}" -eq 1 ]]; then + set -x + fi + + # Ensure that the OS is supported. + case "${OS_RELEASE}" in + almalinux|centos|rhel|ubuntu) + log_verbose "Detected OS: ${OS_RELEASE}" + ;; + *) + fatal "Unsupported operating system: ${OS_RELEASE}" + ;; + esac + + # Check if the script is run as root when --validate is NOT specified. + if [[ "${VALIDATE_ONLY}" -ne 1 ]] && [[ "$(id -u)" -ne 0 ]]; then + fatal "Configuration requires root access. Run the script with sudo?" + fi +} + +# Restarts a systemd service, whose name is passed as first argument. +# Assumes that the service is already configured. +restart_service() { + local service_name="$1" + + log_verbose "Reloading systemd daemon and enabling ${service_name} service..." + + if ! systemctl daemon-reload; then + fatal "Failed to reload systemd." + fi + + if ! systemctl is-enabled --quiet "${service_name}"; then + if ! systemctl enable "${service_name}"; then + fatal "Failed to enable ${service_name}. Please check the service status for details." + fi + fi + + if ! systemctl start "${service_name}"; then + fatal "Failed to start ${service_name}. Please check the service status for details." + fi + + if ! systemctl is-active --quiet "${service_name}"; then + fatal "${service_name} failed to start. Please check the service status for details." + fi + + log_verbose "${service_name} service has been configured and started." +} + +validate_chrony_config() { + # Check if chrony.conf exists in either location + if [[ -f /etc/chrony.conf ]]; then + CHRONY_CONF="/etc/chrony.conf" + elif [[ -f /etc/chrony/chrony.conf ]]; then + CHRONY_CONF="/etc/chrony/chrony.conf" + else + fatal "chrony.conf not found." + fi + + if ! grep -q '^maxclockerror 50' "${CHRONY_CONF}"; then + fatal "${CHRONY_CONF} does not have 'maxclockerror 50' set." + fi + + if ! systemctl is-active --quiet chronyd; then + fatal "chronyd service is not running." + fi +} + +# Configures and restarts chronyd. +# Assumes chronyd is installed and running. +configure_chrony() { + # Check if chrony.conf exists in either location + if [[ -f /etc/chrony.conf ]]; then + CHRONY_CONF="/etc/chrony.conf" + elif [[ -f /etc/chrony/chrony.conf ]]; then + CHRONY_CONF="/etc/chrony/chrony.conf" + else + fatal "chrony.conf does not exist in /etc/chrony.conf or /etc/chrony/chrony.conf." + fi + + if ! grep -q '^maxclockerror' "${CHRONY_CONF}"; then + echo "maxclockerror 50" >> "${CHRONY_CONF}" + log_verbose "maxclockerror 50 has been added to /etc/chrony.conf." + log_verbose "Restarting chronyd service to apply configuration changes..." + fi + + if ! grep -q '^maxclockerror 50' "${CHRONY_CONF}"; then + fatal "${CHRONY_CONF} has 'maxclockerror' set to a value other than 50." + fi + + restart_service "chronyd" +} + +validate_clockbound_config() { + if ! systemctl is-active --quiet clockbound; then + fatal "clockbound is not enabled." + fi + + if ! pgrep -f 'clockbound --max-drift-rate 50' > /dev/null; then + fatal "clockbound is not configured with --max-drift-rate 50." + fi +} + +# Installs a C linker. +# Rust does not install the linker on its own. +install_C_linker() { + log_verbose "Installing C linker on ${OS_RELEASE}..." + case $OS_RELEASE in + almalinux|centos|rhel) + dnf update -y + dnf install -y gcc + ;; + ubuntu) + apt-get update -y + apt-get install -y gcc + ;; + *) + echo "Unsupported operating system ${OS_RELEASE}." + exit 1 + ;; + esac +} + +# Installs rust +install_rust() { + # Check if Rust is installed + if ! command -v rustc >/dev/null 2>&1; then + log_verbose "Rust is not installed. Installing Rust..." + + # Install Rust using rustup + install_C_linker + curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y + + # Add Rust to the current shell session's PATH + # shellcheck disable=SC1090,SC1091 + source "$HOME/.cargo/env" + + log_verbose "Rust installation complete." + else + # Add Rust to the current shell session's PATH + # shellcheck disable=SC1090,SC1091 + source "$HOME/.cargo/env" + + log_verbose "Rust is already installed." + fi +} + +ensure_selinux_perms() { + # Explicit check to ensure that the selinux type is not unexpected. + # shellcheck disable=SC2012 + selinux_type=$(ls -Z "/usr/local/bin/clockbound" | awk '{print $4}') + if [[ "$selinux_type" == *"user_home_t"* ]]; then + fatal "Incorrect selinux type for clockbound." + fi + + cat < chrony_uds_access.te +module chrony_uds_access 1.0; + +require { + type bin_t; + type chronyd_t; + type unconfined_service_t; + class unix_dgram_socket { sendto }; +} + +allow bin_t chronyd_t:unix_dgram_socket sendto; +allow chronyd_t unconfined_service_t:unix_dgram_socket sendto; +EOF + checkmodule -M -m -o chrony_uds_access.mod chrony_uds_access.te + semodule_package -o chrony_uds_access.pp -m chrony_uds_access.mod + semodule -i chrony_uds_access.pp +} + +install_clockbound() { + # Check if clockbound is installed + if [[ ! -f "/usr/local/bin/clockbound" ]]; then + log_verbose "clockbound is not installed. Installing clockbound..." + + install_rust + cargo install clock-bound-d + # Always copy the file instead of moving the file. + # Moving the file will retain the selinux type, usually that + # of the home directory. This will prevent systemd from starting + # the service because of incorrect selinux type. + cp "${HOME}/.cargo/bin/clockbound" /usr/local/bin/ + chown "${CHRONY_USER}:${CHRONY_USER}" /usr/local/bin/clockbound + case "${OS_RELEASE}" in + almalinux|centos|rhel) + ensure_selinux_perms + ;; + ubuntu) + ;; + esac + + log_verbose "clockbound installation complete." + else + log_verbose "clockbound is already installed." + fi +} + +configure_clockbound() { + if ! systemctl is-active --quiet clockbound; then + # Configure and start clockbound + log_verbose "clockbound is not enabled. Configuring and starting clockbound service..." + + # Check systemd version + SYSTEMD_VERSION=$(systemctl --version | head -n 1 | awk '{print $2}') + + if id "chrony" &>/dev/null; then + CHRONY_USER="chrony" + elif id "_chrony" &>/dev/null; then + CHRONY_USER="_chrony" + else + fatal "Neither 'chrony' nor '_chrony' user exists. Exiting." + fi + + # Pick ETH_DEVICE as the first non-loopback device. + for iface in /sys/class/net/*; do + iface=$(basename "$iface") + if [[ "${iface}" != "lo" ]]; then + ETH_DEVICE="${iface}" + break + fi + done + + EXTRA_ARGS="" + if chronyc sources | grep "#.\s*PHC" > /dev/null 2>&1; then + # Check if PHC is available on ETH_DEVICE. + if ethtool -T "${ETH_DEVICE}" | grep -q "PTP Hardware Clock: none"; then + fatal "PHC is not available on ${ETH_DEVICE}." + fi + + # Check whether a PHC source is selected. + if ! chronyc sources | grep "#\*\s*PHC" > /dev/null 2>&1; then + fatal "PHC source is not selected as the clock soruce." + fi + + PHC_ID=$(chronyc sources | grep "#\*\s*PHC" | awk '{print $2}') + EXTRA_ARGS="-r ${PHC_ID} -i ${ETH_DEVICE}" + fi + + # Create the clockbound service file based on systemd version + if [[ "${SYSTEMD_VERSION}" -ge 235 ]]; then + cat < /usr/lib/systemd/system/clockbound.service +[Unit] +Description=ClockBound + +[Service] +Type=simple +Restart=always +RestartSec=10 +ExecStart=/usr/local/bin/clockbound --max-drift-rate 50 ${EXTRA_ARGS} +RuntimeDirectory=clockbound +RuntimeDirectoryPreserve=yes +WorkingDirectory=/run/clockbound +User=${CHRONY_USER} +Group=${CHRONY_USER} + +[Install] +WantedBy=multi-user.target +EOF + else + cat < /usr/lib/systemd/system/clockbound.service +[Unit] +Description=ClockBound + +[Service] +Type=simple +Restart=always +RestartSec=10 +PermissionsStartOnly=true +ExecStartPre=/bin/mkdir -p /run/clockbound +ExecStartPre=/bin/chmod 775 /run/clockbound +ExecStartPre=/bin/chown ${CHRONY_USER}:${CHRONY_USER} /run/clockbound +ExecStartPre=/bin/cd /run/clockbound +ExecStart=/usr/local/bin/clockbound --max-drift-rate 50 ${EXTRA_ARGS} +User=${CHRONY_USER} +Group=${CHRONY_USER} + +[Install] +WantedBy=multi-user.target +EOF + fi + fi + + restart_service "clockbound" +} + +detect_os +prechecks + +if [[ "${VALIDATE_ONLY}" -ne 1 ]]; then + configure_chrony +fi +validate_chrony_config + +if [[ "${VALIDATE_ONLY}" -ne 1 ]]; then + install_clockbound + configure_clockbound +fi +validate_clockbound_config diff --git a/bin/configure_ptp.sh b/bin/configure_ptp.sh new file mode 100644 index 000000000000..4cda96636d81 --- /dev/null +++ b/bin/configure_ptp.sh @@ -0,0 +1,168 @@ +#!/usr/bin/env bash + +# Copyright (c) YugabyteDB, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. +# + +# Compile network driver ena with PTP and install it. + +PTP_DEVICE="" +OS_RELEASE="" + +set -euo pipefail + +# Log functions. +get_timestamp() { + date +%Y-%m-%d_%H_%M_%S +} + +# This just logs to stderr. +log() { + BEGIN_COLOR='\033[0;32m' + END_COLOR='\033[0m' + + case ${_log_level:-info} in + error) + BEGIN_COLOR='\033[0;31m' + shift + ;; + warn) + BEGIN_COLOR='\033[0;33m' + shift + ;; + esac + echo -e "${BEGIN_COLOR}[$( get_timestamp ) ${BASH_SOURCE[1]##*/}:${BASH_LINENO[0]} \ + ${FUNCNAME[1]}]${END_COLOR}" "$@" >&2 + +} + +fatal() { + log "$@" + exit 1 +} + +detect_os() { + ETC_RELEASE_FILE=/etc/os-release + if [[ -f "${ETC_RELEASE_FILE}" ]]; then + OS_RELEASE=$(grep '^ID=' "${ETC_RELEASE_FILE}" | cut -d '=' -f 2 | tr -d '"') + else + fatal "${ETC_RELEASE_FILE} does not exist." + fi +} + +prechecks() { + if [[ "$(id -u)" -ne 0 ]]; then + fatal "Configuration requires root access. Run the script with sudo?" + fi + + # Ensure that the OS is supported. + case "${OS_RELEASE}" in + almalinux|centos|rhel|ubuntu) + log "Detected OS: ${OS_RELEASE}" + ;; + *) + fatal "Unsupported operating system: ${OS_RELEASE}" + ;; + esac + + if ! grep -w '^CONFIG_PTP_1588_CLOCK=[y]' "/boot/config-$(uname -r)" \ + > /dev/null 2>&1; then + fatal "CONFIG_PTP_1588_CLOCK is not enabled in the kernel." + fi +} + +install_prereqs() { + case "${OS_RELEASE}" in + almalinux|centos|rhel) + dnf update -y + dnf install "kernel-devel-$(uname -r)" git -y + ;; + ubuntu) + apt update -y + apt install "linux-headers-$(uname -r)" -y + apt install make gcc -y + ;; + esac +} + +compile_ena() { + git clone https://github.com/amzn/amzn-drivers.git + cd amzn-drivers + cd kernel/linux/ena + ENA_PHC_INCLUDE=1 make +} + +reload_ena() { + rmmod ena + insmod ena.ko phc_enable=1 + PTP_DEVICE=$(find /sys/class/ptp -mindepth 1 -maxdepth 1 | \ + head -n 1 | xargs basename) +} + +configure_chrony() { + # Check if chrony.conf exists in either location + if [[ -f /etc/chrony.conf ]]; then + CHRONY_CONF="/etc/chrony.conf" + elif [[ -f /etc/chrony/chrony.conf ]]; then + CHRONY_CONF="/etc/chrony/chrony.conf" + else + fatal "chrony.conf not found." + fi + + if ! grep -q "^refclock PHC /dev/${PTP_DEVICE} poll 0 delay 0.000010 prefer" \ + "${CHRONY_CONF}"; then + echo "refclock PHC /dev/${PTP_DEVICE} poll 0 delay 0.000010 prefer" \ + >> "${CHRONY_CONF}" + fi + + if ! systemctl daemon-reload; then + fatal "Failed to reload systemd." + fi + + if ! systemctl restart chronyd; then + fatal "Failed to restart chronyd." + fi +} + +validate() { + if ! ls /dev/"${PTP_DEVICE}" > /dev/null 2>&1; then + fatal "/dev/${PTP_DEVICE} does not exist." + fi + + PTP_ID=${PTP_DEVICE##ptp} + for iface in /sys/class/net/*; do + iface=$(basename "$iface") + if [[ "${iface}" != "lo" ]]; then + ETH_DEVICE="${iface}" + break + fi + done + if ! ethtool -T "${ETH_DEVICE}" | grep -q "PTP Hardware Clock: ${PTP_ID}"; then + fatal "${ETH_DEVICE} does not support /dev/${PTP_DEVICE}." + fi + + if ! systemctl is-active --quiet chronyd; then + fatal "chronyd service is not running." + fi + + if ! chronyc sources | grep -q '#.\s*PHC' > /dev/null 2>&1; then + fatal "PHC source not configured in chrony." + fi +} + +detect_os +prechecks +install_prereqs +compile_ena +reload_ena +configure_chrony +validate diff --git a/bin/yugabyted b/bin/yugabyted index 0ccec4904922..b2087cd4ab98 100755 --- a/bin/yugabyted +++ b/bin/yugabyted @@ -124,6 +124,9 @@ PREREQS_ERROR_MSGS = { ' please free the port and restart the node.', 'ycql_metric_port': 'YCQL metrics port {} is already in use. For accessing the YCQL metrics,' \ ' please free the port and restart the node.', + 'clockbound': 'Clockbound is recommended on AWS clusters. It can reduce read restart errors' \ + ' significantly in concurrent workloads.' \ + ' Relevant flag: --enhance_time_sync_via_clockbound.', } QUICK_START_LINKS = { 'mac' : 'https://docs.yugabyte.com/preview/quick-start/', @@ -219,7 +222,9 @@ EXAMPLE = { "that are part of the same cluster:\n" + "yugabyted start --join=host:port,[host:port]\n\n" + "# Create a secure cluster:\n" + - "yugabyted start --secure --certs_dir=\n\n", + "yugabyted start --secure --certs_dir=\n\n" + + "# Use precise clocks:\n" + + "yugabyted start --enhance_time_sync_via_clockbound\n\n", 'stop' : "", 'destroy' : "", 'status' : "", @@ -661,6 +666,31 @@ def get_cli_title(): cli_title += div_line return cli_title +def has_aws_time_sync_service(): + try: + # Run the chronyc sources command and capture the output + result = subprocess.run(['chronyc', 'sources'], capture_output=True, text=True, timeout=1) + + # Check if 169.254.169.123 is in the output + if result.returncode == 0 and '169.254.169.123' in result.stdout: + return True + except (subprocess.TimeoutExpired, FileNotFoundError): + return False + + return False + +def is_phc_configured(): + try: + # Run the chronyc sources command and capture the output + result = subprocess.run(['systemctl', 'status', 'clockbound'], + capture_output=True, text=True, timeout=1) + + # Check if PHC is in the output + if result.returncode == 0 and 'PHC' in result.stdout: + return True + except (subprocess.TimeoutExpired, FileNotFoundError): + return False + class ControlScript(object): def __init__(self): self.configs = None @@ -690,6 +720,9 @@ class ControlScript(object): atexit.register(self.kill_children) Output.script_exit_func = self.kill_children + if self.configs.temp_data.get("enhance_time_sync_via_clockbound"): + self.assert_system_configured_for_clockbound() + if self.configs.saved_data.get("read_replica"): self.start_rr_process() else: @@ -2761,6 +2794,12 @@ class ControlScript(object): prereqs_warn.add('ntp/chrony') prereqs_warn_flag = True + # Configuring clockbound is strongly recommended for AWS clusters. + if has_aws_time_sync_service() and not self.configs.temp_data[ + "enhance_time_sync_via_clockbound"]: + prereqs_warn.add('clockbound') + prereqs_warn_flag = True + (failed_ports, warning_ports, mandatory_port_available, recommended_port_available) = self.check_ports() @@ -3559,6 +3598,45 @@ class ControlScript(object): return common_gflags + # Returns value of flag_name in flags. + # Returns None is none exists. + def get_flag_value(self, flags, flag_name): + for flag in flags.split(','): + if flag.startswith(flag_name): + return flag.split("=")[1] + return None + + # Requires that flag_name not already be in flags. + # Returns flag_name=flag_value appended to flags. + def append_flag(self, flags, flag_name, flag_value): + if flags: + flags += ',' + return flags + f"{flag_name}={flag_value}" + + def config_time_source_clockbound(self, flags): + # Configure tserver flag time_source=clockbound + # when --enhance_time_sync_via_clockbound is set. + if self.configs.temp_data["enhance_time_sync_via_clockbound"]: + # Check database configuration. + time_source = self.get_flag_value(flags, "time_source") + if time_source and time_source != "clockbound": + raise ValueError( + "Cannot configure time_source with" + " --enhance_time_sync_via_clockbound.") + + # Configure time_source=clockbound if not already. + if not time_source: + flags = self.append_flag(flags, "time_source", "clockbound") + + # 100us clock error is a good estimate for PTP configurations. + if is_phc_configured(): + clock_error_estimate = self.get_flag_value( + flags, "clockbound_clock_error_estimate_usec") + if not clock_error_estimate: + flags = self.append_flag(flags, + "clockbound_clock_error_estimate_usec", 100) + + return flags def get_master_cmd(self, common_flags): advertise_ip = self.advertise_ip() @@ -3612,6 +3690,9 @@ class ControlScript(object): # Update the master_flags in self.configs.saved_data with pg_parity flags self.configs.saved_data["master_flags"] = master_flags + master_flags = self.config_time_source_clockbound(master_flags) + self.configs.saved_data["master_flags"] = master_flags + if master_flags: # This will split the simple and complex flags # Now we can parse and append the master flags @@ -3767,6 +3848,9 @@ class ControlScript(object): # Update the tserver_flags in self.configs.saved_data with pg_parity flags self.configs.saved_data["tserver_flags"] = tserver_flags + tserver_flags = self.config_time_source_clockbound(tserver_flags) + self.configs.saved_data["tserver_flags"] = tserver_flags + if tserver_flags: # This will split the simple and complex flags # Now we can parse and append the tserver flags @@ -3997,6 +4081,23 @@ class ControlScript(object): # Sets YW metrics to use local database. os.environ["USE_NATIVE_METRICS"] = "true" + def assert_system_configured_for_clockbound(self): + Output.init_animation("Validating system config for clockbound...") + configure_clockbound_path = find_binary_location("configure_clockbound.sh") + cmd = ["bash", configure_clockbound_path, "--validate"] + try: + subprocess.check_call(cmd) + Output.update_animation("Clockbound configured successfully.") + except subprocess.CalledProcessError as e: + exit_code = e.returncode + Output.update_animation("Failed to validate clockbound configuration.", + status=Output.ANIMATION_FAIL) + Output.log_error_and_exit( + Output.make_red("ERROR") + f": Exit code: {exit_code}." + " Did you run configure_clockbound.sh script?" + ) + + # Runs post_install script for linux computers. def post_install_yb(self): if not sys.platform.startswith('linux'): @@ -7658,6 +7759,8 @@ class ControlScript(object): self.configs.saved_data["ui"] = args.ui self.configs.saved_data["backup_daemon"] = args.backup_daemon self.configs.temp_data["initial_scripts_dir"] = args.initial_scripts_dir + self.configs.temp_data[ + "enhance_time_sync_via_clockbound"] = args.enhance_time_sync_via_clockbound if has_errors: sys.exit(1) @@ -8212,6 +8315,10 @@ class ControlScript(object): cur_parser.add_argument( "--enable_pg_parity_early_access", help="Enable PostgreSQL compatibility features." " Default value is False.", action="store_true", default=False) + cur_parser.add_argument( + "--enhance_time_sync_via_clockbound", + help="Enable clock bound for the node. Default value is False.", + action="store_true", default=False) # Hidden commands for development/advanced users cur_parser.add_argument( @@ -8413,6 +8520,7 @@ class Configs(object): "xcluster_databases": "", "xcluster_target_addresses": "", "xcluster_bootstrap_done": "", + "enhance_time_sync_via_clockbound": False, } self.config_file = config_file diff --git a/cmake_modules/FindClockbound.cmake b/cmake_modules/FindClockbound.cmake new file mode 100644 index 000000000000..9db3cb381d9d --- /dev/null +++ b/cmake_modules/FindClockbound.cmake @@ -0,0 +1,38 @@ +# - Find clockbound (clockbound.h, libclockbound.a) +# This module defines +# CLOCKBOUND_INCLUDE_DIR, directory containing headers +# CLOCKBOUND_STATIC_LIB, path to clockbound's static library +# CLOCKBOUND_FOUND, whether clockbound has been found + +# +# Copyright (c) YugabyteDB, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. +# + +find_path(CLOCKBOUND_INCLUDE_DIR clockbound.h + # make sure we don't accidentally pick up a different version + NO_CMAKE_SYSTEM_PATH + NO_SYSTEM_ENVIRONMENT_PATH) + +# The AWS clockbound source exports both shared and static lib. +# Use a static lib to simplify packaging. +# +# Moreover, this lib is exported using a C interface. This means that +# the exported symbols in clockbound.h are not name mangled. +# Wrap clockbound.h in extern "C" before use in C++. +find_library(CLOCKBOUND_STATIC_LIB libclockbound.a + NO_CMAKE_SYSTEM_PATH + NO_SYSTEM_ENVIRONMENT_PATH) + +include(FindPackageHandleStandardArgs) +find_package_handle_standard_args(CLOCKBOUND REQUIRED_VARS + CLOCKBOUND_STATIC_LIB CLOCKBOUND_INCLUDE_DIR) diff --git a/cmake_modules/YugabyteFindThirdParty.cmake b/cmake_modules/YugabyteFindThirdParty.cmake index 51151b9e3a82..3f22bf6afbd2 100644 --- a/cmake_modules/YugabyteFindThirdParty.cmake +++ b/cmake_modules/YugabyteFindThirdParty.cmake @@ -225,9 +225,9 @@ macro(yb_find_third_party_dependencies) SHARED_LIB "${ABSEIL_SHARED_LIB}") ADD_CXX_FLAGS("-DYB_ABSL_ENABLED") endif() - # ------------------------------------------------------------------------------------------------- + # ------------------------------------------------------------------------------------------------ # Deciding whether to use tcmalloc - # ------------------------------------------------------------------------------------------------- + # ------------------------------------------------------------------------------------------------ # Do not use tcmalloc for ASAN/TSAN. if ("${YB_TCMALLOC_ENABLED}" STREQUAL "") @@ -276,7 +276,7 @@ macro(yb_find_third_party_dependencies) endif() # - # ------------------------------------------------------------------------------------------------- + # ------------------------------------------------------------------------------------------------ ## curl find_package(CURL REQUIRED) @@ -317,8 +317,8 @@ macro(yb_find_third_party_dependencies) set(Boost_NO_SYSTEM_PATHS ON) if("${YB_COMPILER_TYPE}" MATCHES "^gcc[0-9]+$") - # TODO: display this only if using a devtoolset compiler on CentOS, and ideally only if the error - # actually happens. + # TODO: display this only if using a devtoolset compiler on CentOS, and ideally only if the + # error actually happens. message("Note: if Boost fails to find Threads, you might need to install the " "gcc-toolset-N-libatomic-devel package, or devtoolset-N-libatomic-devel package for " "older RedHat/CentOS versions, where N is the toolset version number.") @@ -396,4 +396,12 @@ macro(yb_find_third_party_dependencies) find_package(HdrHistogram REQUIRED) include_directories(SYSTEM ${LIBHDR__HISTOGRAM_INCLUDE_DIR}) ADD_THIRDPARTY_LIB(hdr_histogram STATIC_LIB "${LIBHDR_HISTOGRAM_STATIC_LIB}") + + ## AWS clockbound + find_package(Clockbound REQUIRED) + # Adding the include directory using target_include_directories since + # include_directories does not work here for some odd reason. + ADD_THIRDPARTY_LIB(clockbound + STATIC_LIB ${CLOCKBOUND_STATIC_LIB} + INCLUDE_DIRS ${CLOCKBOUND_INCLUDE_DIR}) endmacro() diff --git a/src/yb/server/CMakeLists.txt b/src/yb/server/CMakeLists.txt index c456ef257435..402c35a35b7e 100644 --- a/src/yb/server/CMakeLists.txt +++ b/src/yb/server/CMakeLists.txt @@ -37,6 +37,7 @@ set(YB_PCH_PREFIX server) ######################################### set(SERVER_COMMON_SRCS + clockbound_clock.cc hybrid_clock.cc logical_clock.cc skewed_clock.cc @@ -44,6 +45,7 @@ set(SERVER_COMMON_SRCS add_library(server_common ${SERVER_COMMON_SRCS}) target_link_libraries(server_common + clockbound yb_common gutil yb_fs diff --git a/src/yb/server/clockbound_clock.cc b/src/yb/server/clockbound_clock.cc new file mode 100644 index 000000000000..d3e26e353295 --- /dev/null +++ b/src/yb/server/clockbound_clock.cc @@ -0,0 +1,484 @@ +// Copyright (c) YugabyteDB, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +// in compliance with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software distributed under the License +// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +// or implied. See the License for the specific language governing permissions and limitations +// under the License. +// + +extern "C" { +#include +} + +#include +#include +#include +#include + +#include "yb/server/clockbound_clock.h" + +#include "yb/gutil/port.h" +#include "yb/gutil/sysinfo.h" +#include "yb/server/hybrid_clock.h" +#include "yb/util/flags.h" +#include "yb/util/logging.h" +#include "yb/util/math_util.h" +#include "yb/util/monotime.h" +#include "yb/util/result.h" + +static constexpr auto kAutoConfigNumClockboundCtxs = 0; + +// There are multiple levels of time synchronization in increasing order +// of accuracy: +// +// 1. Random NTP servers for time synchronization: +// If the cluster nodes use this method for time sync, do NOT use +// "clockbound" as a time source (which internally uses AWS +// clockbound agent). Instead, use WallClock, which conservatively +// assumes that the clock skew between any two nodes in the database +// cluster will not exceed `max_clock_skew_usec` (which is 500ms +// by default). This assumption may or may not hold in practice. +// +// 2. AWS Time Sync Service: +// AWS time sync service uses GPS and atomic clocks in each data +// center to provide accurate timestamps. Starting from this level, +// AWS clockbound provides an upper bound on the clock error. The +// bound varies with time and is typically under 500us. When the +// clockbound lib cannot provide an upper bound, it returns an +// error (tserver crashes in this case). +// +// 3. AWS Time Sync Service with PHC: +// PHC is a special hardware clock on the local node in addition to +// AWS time sync service. The hardware clock is more accurate and +// supports both NTP and PTP. With NTP, the clock error is typically +// under 100us. The 100us estimate is not an upper bound. +// +// 4. Precision Time Protocol (PTP) for time synchronization. +// PTP requires PHC (and other infrastructure provided by AWS). +// This is strictly better than NTP and should be used whenever +// possible. However, this requires installing new network drivers. +// PTP provides a clock error typically under 40us. +// +// AWS Time Sync Clock Error Summary: +// 1. Old hardware: ~500us +// 2. New hardware, PHC, with NTP: ~100us +// 3. New hardware, PHC, with PTP: ~40us +// +// Picking a conservative default of 1ms. +DEFINE_NON_RUNTIME_uint64(clockbound_clock_error_estimate_usec, 1000, + "An estimate of the clock error in microseconds." + " When the estimate is too low and the reported clock error exceeds the estimate," + " the database timestamps fall behind real time." + " When the estimate is too high, the database is more prone to false read restarts."); + +// See the comment on CreateClockboundClock for cautious upgrade/rollback path. +DEFINE_RUNTIME_bool(clockbound_mixed_clock_mode, false, + "When true, use max_clock_skew_usec as read time uncertainty interval (same as WallClock)." + " When false, use low clock errors reported by AWS clockbound to compute the read time" + " uncertainty interval."); + +DEFINE_NON_RUNTIME_uint32(clockbound_num_ctxs, kAutoConfigNumClockboundCtxs, + "Number of clockbound contexts to open." + " When set to 0, the number of contexts is automatically determined." + " This is a performance optimization to reduce contention on the clockbound context."); + +// Use this bound for global limit in mixed clock mode. +DECLARE_uint64(max_clock_skew_usec); + +namespace yb::server { + +namespace { + +const std::string kClockboundClockName = "clockbound"; + +// ClockboundClock +// +// Uses clockbound_ctx and provides a thread-safe interface +// to the clockbound API. +// +// This class is made thread-safe by serializing access to clockbound_ctx +// objects. +// +// clockbound_ctx is not thread-safe. +class ClockboundClock : public PhysicalClock { + public: + ClockboundClock() : ctxs_( + FLAGS_clockbound_num_ctxs == kAutoConfigNumClockboundCtxs + ? base::MaxCPUIndex() + 1 : FLAGS_clockbound_num_ctxs) { + auto num_ctxs = ctxs_.size(); + LOG_WITH_FUNC(INFO) << "Opening " << num_ctxs << " clockbound_ctx ..."; + for (size_t i = 0; i < num_ctxs; i++) { + clockbound_err open_err; + auto ctx = clockbound_open(CLOCKBOUND_SHM_DEFAULT_PATH, &open_err); + // clockbound_open returns nullptr on failure. + if (ctx == nullptr) { + LOG_WITH_FUNC(FATAL) + << "Opening clockbound_ctx failed with error " + << ClockboundErrorToStatus(open_err); + } + ctxs_[i].ctx = ctx; + } + } + + // Returns PhysicalTime{real_time, clock error} on success. + // Returns an error status on failure. + Result Now() override { + clockbound_now_result result; + // Thread Safety: + // 1. clockbound_ctx is not thread safe. + // 2. Return value of clockbound_now is not thread safe. + auto num_ctxs = ctxs_.size(); +#if defined(__APPLE__) + int ctx_index = static_cast( + std::hash()(std::this_thread::get_id()) % num_ctxs); +#else + int ctx_index = sched_getcpu(); + if (ctx_index < 0) { + ctx_index = static_cast( + std::hash()(std::this_thread::get_id()) % num_ctxs); + } +#endif + for (size_t i = 0;; i++) { + auto &padded_ctx = ctxs_[(ctx_index + i) % num_ctxs]; + std::unique_lock lock(padded_ctx.mutex, std::defer_lock); + if (i >= num_ctxs - 1) { + // Last iteration, lock the mutex. + lock.lock(); + } else { + // Lock the mutex without contention. + if (!lock.try_lock()) { + continue; + } + } + auto err = clockbound_now(padded_ctx.ctx, &result); + if (err != nullptr) { + RETURN_NOT_OK(ClockboundErrorToStatus(*err)); + } + break; + } + + // Always check whether the clock went out of sync. + SCHECK_NE(result.clock_status, CLOCKBOUND_STA_UNKNOWN, + ServiceUnavailable, + "Clock status is unknown, time cannot be trusted."); + + auto earliest = MonoTime::TimespecToMicros(result.earliest); + auto latest = MonoTime::TimespecToMicros(result.latest); + + auto error = ceil_div(latest - earliest, MicrosTime(2)); + // This check is not mandatory. However, when the clock error + // is unreasonably high such as 250ms, it indicates a serious + // infrastructure failure and we report that scenario. + // + // Use half of max_clock_skew_usec as the maximum allowed clock error. + MicrosTime max_clock_error = + ANNOTATE_UNPROTECTED_READ(FLAGS_max_clock_skew_usec) / 2; + SCHECK_LE(error, max_clock_error, ServiceUnavailable, + Format("Clock error: $0 exceeds maximum allowed clock error: $1." + " This indicates a serious infrastructure failure." + " Ensure that the clocks are synchronized properly.", + error, max_clock_error)); + + auto real_time = earliest + error; + return PhysicalTime{ real_time, error }; + } + + ~ClockboundClock() { + for (auto &padded_ctx : ctxs_) { + auto close_err = clockbound_close(padded_ctx.ctx); + LOG_IF(WARNING, close_err != nullptr) + << "Failed to close clockbound_ctx with error: " + << ClockboundErrorToStatus(*close_err); + } + } + + private: + static Status ClockboundErrorToStatus(const clockbound_err &err) { + switch (err.kind) { + case CLOCKBOUND_ERR_NONE: + return STATUS_FORMAT(IllegalState, "This state is not reachable"); + case CLOCKBOUND_ERR_SYSCALL: + return STATUS_FORMAT( + IOError, "clockbound API failed with error: $0, and detail: $1", + strerror(err.sys_errno), err.detail); + case CLOCKBOUND_ERR_SEGMENT_NOT_INITIALIZED: + return STATUS_FORMAT(IOError, "Segment not initialized"); + case CLOCKBOUND_ERR_SEGMENT_MALFORMED: + return STATUS_FORMAT(IOError, "Segment malformed"); + case CLOCKBOUND_ERR_CAUSALITY_BREACH: + return STATUS_FORMAT(IOError, "Segment and clock reads out of order"); + } + return STATUS_FORMAT(NotSupported, "Unknown error code: $0", err.kind); + } + + MicrosTime MaxGlobalTime(PhysicalTime time) override { + LOG_WITH_FUNC(FATAL) + << "Internal Error: MaxGlobalTime must not be called" + << " for ClockboundClock"; + } + + // Padded to avoid false sharing. + struct alignas(CACHELINE_SIZE) PaddedClockboundCtx { + std::mutex mutex; + clockbound_ctx *ctx; + char padding[ + CACHELINE_SIZE - + (sizeof(std::mutex) + sizeof(clockbound_ctx*)) % CACHELINE_SIZE]; + }; + static_assert(sizeof(PaddedClockboundCtx) % CACHELINE_SIZE == 0); + + std::vector ctxs_; +}; + +// RealTimeAlignedClock +// a. Leverages AWS clockbound as the underlying physical time source. +// In steady state, combines the best properties of both NtpClock and +// WallClock. +// b. NtpClock drastically reduces the length of the uncertainty interval. +// This decreases the number of read restart requests. However, +// NtpClock::Now() falls behind real time. +// c. WallClock always picks real time. However, it relies on a large +// uncertainty interval. This leads to a large number of false read restart +// errors. +// +// This clock has the following desirable properties, +// a. Less restart read requests compared to WallClock. +// b. Prevents stale reads (see below for proof outline of why this is true). +// c. Provides better availability than WallClock since only the +// problematic nodes crash. +// d. Does NOT deviate from real time in steady state. +// +// Thread-safe. +// +// clockbound API +// ============== +// +// Clock accuracy information can be gathered reliably using the client +// library at github.com/aws/clock-bound. Remember that this is accurate only +// when using a Time Sync Service. The library provides three key pieces of +// information: +// 1. The status of the clock: cannot rely on clock information when +// when the clocks are out of sync. +// 2. EARLIEST: the minimum possible value of the reference clock. +// 3. LATEST: the maximum possible value of the reference clock. +// +// Clock Logic +// =========== +// +// Let's call +// a. User specified estimate for clock error as EST_ERROR. +// EST_ERROR = FLAGS_clockbound_clock_error_estimate_usec. +// b. clock error as clock_error. +// clock_error = (latest - earliest)/2. +// +// NOW = earliest + MIN(EST_ERROR, clock_error) +// GLOBAL_LIMIT = latest + EST_ERROR +// +// Reducing Read Restart Requests +// ============================== +// +// The length of uncertainty interval is: +// clock_error + MAX(EST_ERROR, clock_error). +// +// In non-PHC cases, EST_ERROR = 1000us, clock_error = 500us. +// The uncertainty interval is ~= 1000us + 500us = 1500us = 1.5ms. +// This is a significant improvement over the 500ms uncertainty interval +// in WallClock. +// +// Preventing Stale Reads +// ====================== +// +// Setup: +// +// LOCAL_EARLIEST LOCAL_LATEST LOCAL_GLOBAL_LIMIT +// local node: [----LOCAL_NOW-------------] <|> +// +// REMOTE_EARLIEST +// remote node: [----REMOTE_NOW-------] +// +// a. Our node is the local node. Any other node in the cluster is +// a remote node. +// b. The time uncertainty interval returned by AWS clockbound library +// is represented by [---------]. Left bound is EARLIEST and the +// right bound is LATEST. +// EARLIEST LATEST +// c. The time returned by this clock is called NOW. [---NOW---]. +// +// Objective: Prove that NOW on any remote node is within the GLOBAL_LIMIT of +// the local node. +// +// Proof Outline: +// +// Observation1: NOW is no more than EST_ERROR away from EARLIEST. +// This follows from the definition, +// NOW = earliest + MIN(EST_ERROR, clock_error) <= earliest + EST_ERROR. +// +// Observation2: Local and remote uncertainty intervals overlap. +// The reference time must be in both local and remote uncertainty intervals +// because AWS clockbound guarantees that the reference time is within +// the uncertainty interval. So, the intervals at least overlap at +// reference time. This means that REMOTE_EARLIEST <= LOCAL_LATEST. +// +// From observation1 and observation2, it follows that the remote node's +// NOW is atmost EST_ERROR away from the local node's LATEST. This proves +// GLOBAL_LIMIT can be LOCAL_LATEST + EST_ERROR. +// +// Availability +// ============ +// +// This clock has better availability than WallClock despite crashing +// in unsynchronized state. WallClock may also crash in unsynchronized +// state through use of HybridClock when it detects that the clock skew +// is too high. Moreover, it only crashes nodes that are behind in time. +// It has no mechanism to detect which node is out of sync. This is unlike +// clockbound backed clocks which only crash the problematic node. In +// case of WallClock, simply, one out of sync node with a high clock is +// enough to crash every other node in the cluster. This is not a problem +// with this clock. +// +// Reliability +// =========== +// +// Time synchronization is a hard problem. It is already impressive that +// the underlying infrastructure lets us compute an upper bound on the +// clock error. However, this is not without assumptions. The system +// must be configured correctly. AWS recommends that chrony be configured +// with maxclockerror set to 50. The clockbound must also be run with +// --max-drift-rate 50. +// +// ntp_gettime also returns a maxerror value. However, this is less +// reliable than clockbound since +// 1. clockbound accounts for max drift rate using --max-drift-rate flag. +// 2. clockbound also computes PHC Error Bound. This is important for PTP. +// +// Compatibility with Real Time +// ============================ +// +// Terminology: Real Time = CLOCK_REALTIME +// +// Why don't we simply use NtpClock? +// +// In steady state, real time is a really good approximation of the +// reference time. Falling behind real time is not ideal. +// In particular, the external timestamps such as +// such as yb_read_time, yb_lock_status, and timestamps originating in +// postgres, all assume real time. Real time is a better +// approximation to these external timestamps than earliest. +// +// Example: the yb_locks_min_txn_age only displays locks of transactions +// older than the specified age. When Now() falls behind real time, +// some transactions are ommitted from the pg_locks output since they +// are considered too new, even when yb_locks_min_txn_age is zero. +class RealTimeAlignedClock : public PhysicalClock { + public: + explicit RealTimeAlignedClock(PhysicalClockPtr time_source) + : time_source_(time_source) {} + + // Returns earliest + min(clock_error, EST_ERROR). + // Returns an error in Unsynchronized state. + // Also returns clock_error for metrics. + // + // See class comment for the correctness argument. + Result Now() override { + auto [real_time, error] = VERIFY_RESULT(time_source_->Now()); + auto earliest = real_time - error; + auto timepoint = earliest + std::min( + error, FLAGS_clockbound_clock_error_estimate_usec); + return PhysicalTime{timepoint, error}; + } + + // Returns latest + EST_ERROR. + // + // See class comment for the correctness argument. + MicrosTime MaxGlobalTime(PhysicalTime time) override { + // time_point = earliest + min(clock_error, EST_ERROR). + auto earliest = time.time_point - std::min( + time.max_error, FLAGS_clockbound_clock_error_estimate_usec); + auto real_time = earliest + time.max_error; + // For safety, do not use the tighter bound unless + // everyone else is also using the same physical clock. + auto bound = ANNOTATE_UNPROTECTED_READ(FLAGS_clockbound_mixed_clock_mode) + ? FLAGS_max_clock_skew_usec + : time.max_error + FLAGS_clockbound_clock_error_estimate_usec; + return real_time + bound; + } + + private: + PhysicalClockPtr time_source_; +}; + +} // anonymous namespace + +// Returns a physical clock backed by AWS clockbound. +// +// Requires that all the nodes in the cluster are synchronized using a +// Time Sync Service. +// +// Cloud Support +// ============= +// +// Official support is available for AWS. +// Support for Azure/GCP will be planned post AWS adoption. +// However, no additional database changes are expected. +// +// Multi-cloud/Hybrid-cloud support is not planned, so use it at your own risk. +// +// Subject to the above conditions, prefer this clock over WallClock. +// +// Configuration +// ============= +// +// Start cluster with gFlag: time_source = clockbound +// Ensure that clockbound_mixed_clock_mode is false. +// +// Mixed Clock Mode +// ================ +// +// Sometimes, the nodes are not well synchronized. Poorly synchronized and +// unsynchronized clocks are handled transparently by this clock. However, +// this assumes that all the nodes in the cluster are using this clock. When +// at least one of the nodes is using WallClock, and that node is not well +// synchronized, this clock will not be able to provide +// the same guarantees. +// +// The workaround involves setting clockbound_mixed_clock_mode = true. +// +// Please remember to reset clockbound_mixed_clock_mode = false after all +// the nodes in the cluster are using this clock. +// +// Cautious Upgrade +// ================ +// +// 1. Rolling restart with +// a. clockbound_mixed_clock_mode = true (runtime flag) +// b. time_source = clockbound +// 2. Set clockbound_mixed_clock_mode = false. +// +// Cautious Rollback +// ================= +// +// 1. Set clockbound_mixed_clock_mode = true. +// 2. Rolling restart with time_source is empty. +// +// Notice that the cautious upgrade/rollback path is not standard. +PhysicalClockPtr CreateClockboundClock(PhysicalClockPtr time_source) { + // Fake time sources are useful for testing purposes where + // a clockbound agent is not available. + return std::make_shared( + time_source ? time_source : std::make_shared()); +} + +void RegisterClockboundClockProvider() { + HybridClock::RegisterProvider( + kClockboundClockName, + [](const std::string &options) { + return CreateClockboundClock(); + }); +} + +} // namespace yb::server diff --git a/src/yb/server/clockbound_clock.h b/src/yb/server/clockbound_clock.h new file mode 100644 index 000000000000..b74bc90644ab --- /dev/null +++ b/src/yb/server/clockbound_clock.h @@ -0,0 +1,24 @@ +// Copyright (c) YugabyteDB, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +// in compliance with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software distributed under the License +// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +// or implied. See the License for the specific language governing permissions and limitations +// under the License. +// + +#pragma once + +#include "yb/util/physical_time.h" + +namespace yb::server { + +PhysicalClockPtr CreateClockboundClock(PhysicalClockPtr time_source = {}); + +void RegisterClockboundClockProvider(); + +} // namespace yb::server diff --git a/src/yb/server/hybrid_clock.cc b/src/yb/server/hybrid_clock.cc index f41977af064c..70f3e09db84e 100644 --- a/src/yb/server/hybrid_clock.cc +++ b/src/yb/server/hybrid_clock.cc @@ -65,10 +65,10 @@ METRIC_DEFINE_gauge_int64(server, hybrid_clock_skew, "Server clock skew."); DEFINE_UNKNOWN_string(time_source, "", - "The clock source that HybridClock should use (for tests only). " - "Leave empty for WallClock, other values depend on added clock providers and " + "The clock source that HybridClock should use. " + "Leave empty for WallClock, clockbound for ClockboundClock, " + "and other values depend on added clock providers and " "specific for appropriate tests, that adds them."); -TAG_FLAG(time_source, hidden); DEFINE_UNKNOWN_bool(fail_on_out_of_range_clock_skew, true, "In case transactional tables are present, crash the process if clock skew greater " @@ -165,9 +165,13 @@ void HybridClock::NowWithError(HybridTime *hybrid_time, uint64_t *max_error_usec if (now->time_point < current_components.last_usec) { auto delta_us = current_components.last_usec - now->time_point; - if (delta_us > FLAGS_max_clock_skew_usec) { + // Propagated hybrid time cannot be higher than the global limit. + auto max_global_time = clock_->MaxGlobalTime(*now); + if (delta_us > FLAGS_max_clock_skew_usec || + current_components.last_usec > max_global_time) { auto delta = MonoDelta::FromMicroseconds(delta_us); - auto max_allowed = MonoDelta::FromMicroseconds(FLAGS_max_clock_skew_usec); + auto max_allowed = MonoDelta::FromMicroseconds( + std::min(FLAGS_max_clock_skew_usec, max_global_time - now->time_point)); if ((ANNOTATE_UNPROTECTED_READ(FLAGS_fail_on_out_of_range_clock_skew) || (ANNOTATE_UNPROTECTED_READ(FLAGS_clock_skew_force_crash_bound_usec) > 0 && delta_us > ANNOTATE_UNPROTECTED_READ(FLAGS_clock_skew_force_crash_bound_usec))) && diff --git a/src/yb/tools/CMakeLists.txt b/src/yb/tools/CMakeLists.txt index 0be9dd4666f3..80d5efcf0269 100644 --- a/src/yb/tools/CMakeLists.txt +++ b/src/yb/tools/CMakeLists.txt @@ -238,3 +238,7 @@ ADD_YB_TEST_LIBRARY(yb-backup-test_base add_executable(hnsw_tool hnsw_tool.cc) target_link_libraries(hnsw_tool boost_program_options yb_util yb_docdb yb_vector) + +add_executable(clockbound_dump clockbound_dump.cc) +target_link_libraries(clockbound_dump + clockbound yb_common yb_util) diff --git a/src/yb/tools/clockbound_dump.cc b/src/yb/tools/clockbound_dump.cc new file mode 100644 index 000000000000..284bd972a9d8 --- /dev/null +++ b/src/yb/tools/clockbound_dump.cc @@ -0,0 +1,153 @@ +// Copyright (c) YugabyteDB, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +// in compliance with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software distributed under the License +// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +// or implied. See the License for the specific language governing permissions and limitations +// under the License. +// + +// Dumps clockbound's clock information as json. +// +// chronyc does not have enough information to accurately +// determine the clock error in case of PTP. However, +// clockbound fetches information from the PHC device file +// for the same. + +extern "C" { +#include +} + +#include +#include + +#include + +#include "yb/common/json_util.h" + +#include "yb/util/format.h" +#include "yb/util/scope_exit.h" + +namespace yb::tools { + +namespace { + +// Returns exit code 1 +// +// Example output: +// { +// "error": "Segment not initialized" +// } +int DumpError(const std::string& error) { + rapidjson::Document json_error; + json_error.SetObject(); + auto& allocator = json_error.GetAllocator(); + common::AddMember("error", error, &json_error, &allocator); + std::cout << common::PrettyWriteRapidJsonToString(json_error) << std::endl; + return 1; +} + +std::string ClockboundErrToString(const clockbound_err& err) { + switch (err.kind) { + case CLOCKBOUND_ERR_NONE: + return "Internal error: clockbound err cannot be none"; + case CLOCKBOUND_ERR_SYSCALL: + return Format( + "clockbound syscall failed with error: $0, and detail: $1", + strerror(err.sys_errno), err.detail); + case CLOCKBOUND_ERR_SEGMENT_NOT_INITIALIZED: + return "Segment not initialized"; + case CLOCKBOUND_ERR_SEGMENT_MALFORMED: + return "Segment malformed"; + case CLOCKBOUND_ERR_CAUSALITY_BREACH: + return "Segment and clock reads out of order"; + default: + return "Internal error: unrecognized clockbound err"; + } +} + +std::string ClockStatusToString(const clockbound_clock_status& status) { + switch (status) { + case CLOCKBOUND_STA_UNKNOWN: + return "UNKNOWN"; + case CLOCKBOUND_STA_SYNCHRONIZED: + return "SYNCHRONIZED"; + case CLOCKBOUND_STA_FREE_RUNNING: + return "FREE_RUNNING"; + default: + return "INVALID"; + } +} + +void AddTimespecField( + rapidjson::Value& obj, const char* name, const timespec& ts, + rapidjson::Document::AllocatorType& allocator) { + rapidjson::Value timespec_obj(rapidjson::kObjectType); + timespec_obj.AddMember("tv_sec", ts.tv_sec, allocator); + timespec_obj.AddMember("tv_nsec", ts.tv_nsec, allocator); + obj.AddMember(rapidjson::StringRef(name), timespec_obj, allocator); +} + +// Returns exit code 0 +// Dumps the result of clockbound_now to stdout as json. +int DumpNowResult(const clockbound_now_result& now_result) { + rapidjson::Document json_now; + json_now.SetObject(); + auto& allocator = json_now.GetAllocator(); + AddTimespecField(json_now, "earliest", now_result.earliest, allocator); + AddTimespecField(json_now, "latest", now_result.latest, allocator); + std::string clock_status = ClockStatusToString(now_result.clock_status); + common::AddMember("clock_status", clock_status, &json_now, &allocator); + std::cout << common::PrettyWriteRapidJsonToString(json_now) << std::endl; + return 0; +} + +int DumpClockboundNow() { + // Open clockbound ctx + clockbound_err open_err; + auto ctx = clockbound_open(CLOCKBOUND_SHM_DEFAULT_PATH, &open_err); + if (ctx == nullptr) { + return DumpError(ClockboundErrToString(open_err)); + } + auto scope_exit = ScopeExit([ctx] { clockbound_close(ctx); }); + // Fetch clockbound now. + clockbound_now_result now_result; + auto err = clockbound_now(ctx, &now_result); + if (err != nullptr) { + return DumpError(ClockboundErrToString(*err)); + } + return DumpNowResult(now_result); +} + +} // anonymous namespace + +} // namespace yb::tools + +// Example output: +// { +// "earliest": { +// "tv_sec": 1633097600, +// "tv_nsec": 123456789 +// }, +// "latest": { +// "tv_sec": 1633097700, +// "tv_nsec": 987654321 +// }, +// "clock_status": "SYNCHRONIZED" +// } +// +// Example error: +// { +// "error": "Usage: ./clockbound_dump" +// } +int main(int argc, char** argv) { + // No arguments. + if (argc != 1) { + return yb::tools::DumpError("Usage: " + std::string(argv[0])); + } + return yb::tools::DumpClockboundNow(); +} diff --git a/src/yb/tserver/server_main_util.cc b/src/yb/tserver/server_main_util.cc index dc31e8d3998d..773bcd52c37a 100644 --- a/src/yb/tserver/server_main_util.cc +++ b/src/yb/tserver/server_main_util.cc @@ -24,6 +24,7 @@ #include "yb/consensus/consensus_queue.h" #include "yb/consensus/log_util.h" +#include "yb/server/clockbound_clock.h" #include "yb/server/skewed_clock.h" #include "yb/util/debug/trace_event.h" @@ -135,6 +136,7 @@ Status MasterTServerParseFlagsAndInit( FLAGS_stderrthreshold = google::FATAL; server::SkewedClock::Register(); + server::RegisterClockboundClockProvider(); // These are the actual defaults for these gflags for master and TServer. FLAGS_memory_limit_hard_bytes = 0; diff --git a/src/yb/util/monotime.h b/src/yb/util/monotime.h index 1fdb52ba1aab..cad1da237a04 100644 --- a/src/yb/util/monotime.h +++ b/src/yb/util/monotime.h @@ -190,6 +190,12 @@ class MonoTime { static const MonoTime kMax; static const MonoTime kUninitialized; + // Connvert timespec to microseconds. + static uint64_t TimespecToMicros(const struct timespec& ts) { + return ts.tv_sec * kMicrosecondsPerSecond + + ts.tv_nsec / kNanosecondsPerMicrosecond; + } + // The coarse monotonic time is faster to retrieve, but "only" accurate to within a millisecond or // two. The speed difference will depend on your timer hardware. static MonoTime Now(); diff --git a/src/yb/util/physical_time.cc b/src/yb/util/physical_time.cc index c8dffaad189a..ad30e97d0d37 100644 --- a/src/yb/util/physical_time.cc +++ b/src/yb/util/physical_time.cc @@ -138,11 +138,17 @@ const PhysicalClockPtr& AdjTimeClock() { #endif Result MockClock::Now() { + RETURN_NOT_OK(mock_status_); return CheckClockSyncError(value_.load(boost::memory_order_acquire)); } void MockClock::Set(const PhysicalTime& value) { value_.store(value, boost::memory_order_release); + mock_status_ = Status::OK(); +} + +void MockClock::Set(Status status) { + mock_status_ = status; } PhysicalClockPtr MockClock::AsClock() { diff --git a/src/yb/util/physical_time.h b/src/yb/util/physical_time.h index 99791a4d093e..be2a383e95bb 100644 --- a/src/yb/util/physical_time.h +++ b/src/yb/util/physical_time.h @@ -18,7 +18,7 @@ #include -#include "yb/util/status_fwd.h" +#include "yb/util/status.h" namespace yb { @@ -52,6 +52,9 @@ class MockClock : public PhysicalClock { void Set(const PhysicalTime& value); + // Set this to return an error in Now(). + void Set(Status status); + // Constructs PhysicalClockPtr from this object. PhysicalClockPtr AsClock(); @@ -62,6 +65,7 @@ class MockClock : public PhysicalClock { // Set by calls to SetMockClockWallTimeForTests(). // For testing purposes only. boost::atomic value_{{0, 0}}; + Status mock_status_; }; const PhysicalClockPtr& WallClock(); diff --git a/src/yb/yql/pggate/ybc_pggate.cc b/src/yb/yql/pggate/ybc_pggate.cc index ea700e6199fd..f2c6aa8ff9e7 100644 --- a/src/yb/yql/pggate/ybc_pggate.cc +++ b/src/yb/yql/pggate/ybc_pggate.cc @@ -46,6 +46,7 @@ #include "yb/gutil/casts.h" +#include "yb/server/clockbound_clock.h" #include "yb/server/skewed_clock.h" #include "yb/util/atomic.h" @@ -471,6 +472,7 @@ void YBCInitPgGateEx(const YBCPgTypeEntity *data_type_table, int count, PgCallba // However, this is added to allow simulating and testing of some known bugs until we remove // HybridClock usage. server::SkewedClock::Register(); + server::RegisterClockboundClockProvider(); InitThreading(); diff --git a/src/yb/yql/pgwrapper/CMakeLists.txt b/src/yb/yql/pgwrapper/CMakeLists.txt index 028116c44478..b9dbd0b5172b 100644 --- a/src/yb/yql/pgwrapper/CMakeLists.txt +++ b/src/yb/yql/pgwrapper/CMakeLists.txt @@ -109,6 +109,7 @@ set(YB_TEST_LINK_LIBS yb_pgwrapper ql-dml-test-base pg_wrapper_test_base tools_t geo_transactions_test_base pg_locks_test_base rocksdb_tools ${YB_MIN_TEST_LIBS}) ADD_YB_TEST(alter_schema_abort_txn-test) ADD_YB_TEST(alter_table_with_concurrent_txn-test) +ADD_YB_TEST(clockbound_clock-test) ADD_YB_TEST(colocation-test) ADD_YB_TEST(geo_transactions-test) ADD_YB_TEST(geo_transactions_promotion-test) diff --git a/src/yb/yql/pgwrapper/clockbound_clock-test.cc b/src/yb/yql/pgwrapper/clockbound_clock-test.cc new file mode 100644 index 000000000000..927106d34257 --- /dev/null +++ b/src/yb/yql/pgwrapper/clockbound_clock-test.cc @@ -0,0 +1,198 @@ +// Copyright (c) YugabyteDB, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +// in compliance with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software distributed under the License +// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +// or implied. See the License for the specific language governing permissions and limitations +// under the License. +// + +#include +#include + +#include "yb/common/hybrid_time.h" +#include "yb/common/read_hybrid_time.h" + +#include "yb/server/clockbound_clock.h" +#include "yb/util/math_util.h" +#include "yb/util/physical_time.h" +#include "yb/util/status.h" +#include "yb/util/test_util.h" + +DECLARE_uint64(clockbound_clock_error_estimate_usec); +DECLARE_bool(clockbound_mixed_clock_mode); +DECLARE_uint64(max_clock_skew_usec); + +namespace yb::pgwrapper { + +class ClockboundClockTest : public YBTest { + protected: + void SetUp() override { + YBTest::SetUp(); + + // Create a ClockboundClock and register + // a fake underlying clock to test its logic. + fake_clock_ = std::make_shared(); + clockbound_clock_ = server::CreateClockboundClock(fake_clock_); + + // Random number generator for clock error. + std::random_device rd; + auto random_value = rd(); + LOG_WITH_FUNC(INFO) << "Seed: " << random_value; + gen_.emplace(random_value); + } + + // Assert that each global limit is higher than the read point + // of all other hybrid times. + void AssertOverlap(ReadHybridTime ht0, ReadHybridTime ht1) { + ASSERT_LE(ht0.read, ht0.global_limit); + ASSERT_LE(ht1.read, ht0.global_limit); + ASSERT_LE(ht0.read, ht1.global_limit); + ASSERT_LE(ht1.read, ht1.global_limit); + } + + // Clock error varies between 1/4 and 3/4 of the estimated error. + PhysicalTime SampleWellSynchronizedClock(MicrosTime reference_time) { + std::uniform_int_distribution dist{ + FLAGS_clockbound_clock_error_estimate_usec / 4, + 3 * FLAGS_clockbound_clock_error_estimate_usec / 4 + }; + auto earliest = reference_time - dist(*gen_); + auto latest = reference_time + dist(*gen_); + auto error = ceil_div(latest - earliest, MicrosTime(2)); + + auto real_time = earliest + error; + return PhysicalTime{ real_time, error }; + } + + // Clock error varies between 2 and 4 times the estimated error. + PhysicalTime SamplePoorlySynchronizedClock(MicrosTime reference_time) { + std::uniform_int_distribution dist{ + 2 * FLAGS_clockbound_clock_error_estimate_usec, + 4 * FLAGS_clockbound_clock_error_estimate_usec + }; + auto earliest = reference_time - dist(*gen_); + auto latest = reference_time + dist(*gen_); + auto error = yb::ceil_div(latest - earliest, MicrosTime(2)); + + auto real_time = earliest + error; + return PhysicalTime{ real_time, error }; + } + + // Returns time such that earliest = reference_time. + PhysicalTime RefTimeEarliest(MicrosTime reference_time, bool well_sync) { + MicrosTime error = FLAGS_clockbound_clock_error_estimate_usec / 2; + if (!well_sync) { + error *= 3; + } + + auto real_time = reference_time + error; + return PhysicalTime{ real_time, error }; + } + + // Returns time such that latest = reference_time. + PhysicalTime RefTimeLatest(MicrosTime reference_time, bool well_sync) { + auto error = FLAGS_clockbound_clock_error_estimate_usec / 2; + if (!well_sync) { + error *= 3; + } + auto earliest = reference_time - 2 * error; + + auto real_time = earliest + error; + return PhysicalTime{ real_time, error }; + } + + Result GetReadTime(PhysicalTime reference_time) { + fake_clock_->Set(reference_time); + auto now = VERIFY_RESULT(clockbound_clock_->Now()); + auto window = ReadHybridTime::FromMicros(now.time_point); + window.global_limit = + HybridTime::FromMicros(clockbound_clock_->MaxGlobalTime(now)); + return window; + } + + PhysicalClockPtr clockbound_clock_; + std::shared_ptr fake_clock_; + std::optional gen_; +}; + +TEST_F(ClockboundClockTest, PropagateError) { + fake_clock_->Set(STATUS(IOError, "Clock error")); + ASSERT_NOK_STR_CONTAINS(clockbound_clock_->Now(), "Clock error"); +} + +TEST_F(ClockboundClockTest, NoClockError) { + fake_clock_->Set(PhysicalTime{ 100, 0 }); + auto time = ASSERT_RESULT(clockbound_clock_->Now()); + ASSERT_EQ(100, time.time_point); + ASSERT_EQ(0, time.max_error); +} + +TEST_F(ClockboundClockTest, MixedClockMode) { + ANNOTATE_UNPROTECTED_WRITE( + FLAGS_clockbound_mixed_clock_mode) = true; + const auto maxskew = FLAGS_max_clock_skew_usec; + fake_clock_->Set(PhysicalTime{ maxskew , 0 }); + auto time = ASSERT_RESULT(clockbound_clock_->Now()); + ASSERT_EQ(2 * maxskew, clockbound_clock_->MaxGlobalTime(time)); +} + +// Write happens on a node thats ahead of reference clock. +// Read happens on a node thats behind reference clock. +// Ensures that the read window overlaps the write window. +// +// Write: [-----------] +// Read: [-----------] +TEST_F(ClockboundClockTest, ReadAfterWrite) { + for (auto read_sync : { true, false }) { + for (auto write_sync : { true, false }) { + LOG_WITH_FUNC(INFO) << "Read sync: " << read_sync + << ", Write sync: " << write_sync; + auto reference_time = GetCurrentTimeMicros(); + auto read_window = ASSERT_RESULT(GetReadTime( + RefTimeLatest(reference_time, read_sync))); + auto write_window = ASSERT_RESULT(GetReadTime( + RefTimeEarliest(reference_time, write_sync))); + AssertOverlap(read_window, write_window); + } + } +} + +TEST_F(ClockboundClockTest, TwoWellSynchronizedClocks) { + for (int i = 0; i < 10000; i++) { + auto reference_time = GetCurrentTimeMicros(); + auto window0 = ASSERT_RESULT(GetReadTime( + SampleWellSynchronizedClock(reference_time))); + auto window1 = ASSERT_RESULT(GetReadTime( + SampleWellSynchronizedClock(reference_time))); + AssertOverlap(window0, window1); + } +} + +TEST_F(ClockboundClockTest, TwoPoorlySynchronizedClocks) { + for (int i = 0; i < 10000; i++) { + auto reference_time = GetCurrentTimeMicros(); + auto window0 = ASSERT_RESULT(GetReadTime( + SamplePoorlySynchronizedClock(reference_time))); + auto window1 = ASSERT_RESULT(GetReadTime( + SamplePoorlySynchronizedClock(reference_time))); + AssertOverlap(window0, window1); + } +} + +TEST_F(ClockboundClockTest, DifferentlySynchronizedClocks) { + for (int i = 0; i < 10000; i++) { + auto reference_time = GetCurrentTimeMicros(); + auto window0 = ASSERT_RESULT(GetReadTime( + SampleWellSynchronizedClock(reference_time))); + auto window1 = ASSERT_RESULT(GetReadTime( + SamplePoorlySynchronizedClock(reference_time))); + AssertOverlap(window0, window1); + } +} + +} // namespace yb::pgwrapper diff --git a/yb_release_manifest.json b/yb_release_manifest.json index f17b315e8bee..094926fa0b5a 100644 --- a/yb_release_manifest.json +++ b/yb_release_manifest.json @@ -11,6 +11,7 @@ "$BUILD_ROOT/version_metadata.json" ], "bin": [ + "$BUILD_ROOT/bin/clockbound_dump", "$BUILD_ROOT/bin/ldb", "$BUILD_ROOT/bin/log-dump", "$BUILD_ROOT/bin/odyssey", @@ -29,6 +30,8 @@ "bin/bulk_load_cleanup.sh", "bin/bulk_load_helper.sh", "bin/configure", + "bin/configure_clockbound.sh", + "bin/configure_ptp.sh", "bin/log_cleanup.sh", "bin/yb-check-consistency.py", "bin/yugabyted", diff --git a/yugabyted-ui/apiserver/cmd/server/handlers/api_cluster_info.go b/yugabyted-ui/apiserver/cmd/server/handlers/api_cluster_info.go index 6362283bb710..ee8d4f4d9e71 100644 --- a/yugabyted-ui/apiserver/cmd/server/handlers/api_cluster_info.go +++ b/yugabyted-ui/apiserver/cmd/server/handlers/api_cluster_info.go @@ -66,6 +66,9 @@ var WARNING_MSGS = map[string]string{ "insecure" :"Cluster started in an insecure mode without " + "authentication and encryption enabled. For non-production use only, " + "not to be used without firewalls blocking the internet traffic.", + "clockbound": "Clockbound is recommended on AWS clusters. It can reduce read restart errors" + + " significantly in concurrent workloads." + + " Relevant flag: --enhance_time_sync_via_clockbound.", } type SlowQueriesFuture struct { diff --git a/yugabyted-ui/ui/src/features/clusters/details/alerts/alerts.ts b/yugabyted-ui/ui/src/features/clusters/details/alerts/alerts.ts index dc36513c0adb..8ab2b6bb0812 100644 --- a/yugabyted-ui/ui/src/features/clusters/details/alerts/alerts.ts +++ b/yugabyted-ui/ui/src/features/clusters/details/alerts/alerts.ts @@ -55,6 +55,11 @@ export const alertList: AlertListItem[] = [ key: "version mismatch", status: BadgeVariant.Warning, hideConfiguration: true + }, + { + name: "Leverage AWS Time Sync Service", + key: "clockbound", + status: BadgeVariant.Warning, } ];