Enhance condition to take full snapshot during startup. #570
Labels
area/backup
Backup related
area/monitoring
Monitoring (including availability monitoring and alerting) related
kind/enhancement
Enhancement, improvement, extension
priority/2
Priority (lower number equals higher priority)
status/closed
Issue is closed (either delivered or triaged)
Milestone
How to categorize this issue?
/area monitoring
/area backup
/kind enhancement
What would you like to be added:
During startup, etcd-backup-restore should also consider configured FullSnapshotSchedule along with timeStamp of last full snapshot, so that it won't missed any full snapshot.
Why is this needed:
Currently, alert KubeEtcdFullBackupFailed is calculated based on Last FullSnapshot Timestamp, so it checks whether
backup-restore
has taken a full snapshot within last 24h or not. But we have observed some falseKubeEtcdFullBackupFailed
alerts for some shoots if shoot is hibernated and wake up within 24h and determineBackupSchedule is calculatingFullSnapshotSchedule
on the basis of maintenance window.Take this Scenario:
m1
.etcd-backup-restore
took a full snapshot at timestampt1
.t2
(before maintenance window i.e t2<m1) and woken up again att3
. (t3>t2>t1 && t3>m1)t3-t1
<24h
(24h didn’t pass yet), so no new full snapshot had taken by backup-restore as it calculates 23.5h from timeStamp of last full snapshot.24h
, att4
alertKubeEtcdFullBackupFailed
checks the last timestamp of full snapshot and it foundt4-t1>24h
, now an alert has been raised and backup-restore is waiting to take full snapshot according to--schedule
passed to backup-restore and it was calculated on basis of maintenance window of shoot (m1
of next day) which is not yet reached./cc @timuthy @shreyas-s-rao
The text was updated successfully, but these errors were encountered: