Skip to content

Commit

Permalink
[PLAT-15563] health check and alert errors when system clock is not i…
Browse files Browse the repository at this point in the history
…n sync.

Summary:
Updated the "node ntp service status" alert and health check to validate the
system clock being reported as "in sync". This is useful if it is not possible to reach
the remote ntp server, as no drift will be reported in these cases, but there is still
an issue.

Test Plan: validated health checks and alerts are thrown

Reviewers: yash.priyam, patnaik.balivada, muthu

Reviewed By: patnaik.balivada, muthu

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D38652
  • Loading branch information
shubin-yb committed Oct 7, 2024
1 parent ecfc1f9 commit d42f4f5
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions managed/src/main/resources/health/node_health.py.template
Original file line number Diff line number Diff line change
Expand Up @@ -2144,7 +2144,13 @@ def _timesyncd_get_clock_drift_ms():
# to be correct. We will return 0 here and handle a not-synced system with other errors.
return 0

# return 1 if ntp service status is good, 0 otherwise.
# A good status is both having timedatectl show the system clock is "in sync" and
# having the specific ntp service (chrony, ntpd, or timesyncd) be running.
def get_ntp_service_status():
# First we check if the clock is synced, and fail if its not
if get_timedatectl_sync() == 0:
return 0
if chrony_exists():
return 1 if is_service_running("chronyd.service") else 0
elif ntp_exists():
Expand All @@ -2153,7 +2159,7 @@ def get_ntp_service_status():
return 1 if ntp_running or ntpd_running else 0
elif timesyncd_exists():
return get_timedatectl_status()
logging.error("unknown time service: must be ntp(d) or chrony")
logging.error("unknown time service: must be ntp(d), chrony, or systemd-timesyncd")
return 0

def chrony_exists():
Expand All @@ -2171,12 +2177,15 @@ def timesyncd_exists():
timesyncd_out = check_output("systemctl status systemd-timesyncd", env)
return "Error" not in timesyncd_out

# Returns 1 if timesyncd is running and synced, 0 otherwise.
# Returns 1 if timesyncd is running 0 otherwise.
def get_timedatectl_status():
# timesyncd is not running
if not is_service_running("systemd-timesyncd.service"):
return 0
return 1

# Returns 1 if system clock is synchronized 0 otherwise.
def get_timedatectl_sync():
env = os.environ.copy()
out = check_output("timedatectl status", env)
if "System clock synchronized: yes" in out:
Expand Down

0 comments on commit d42f4f5

Please sign in to comment.