Enhance condition to take full snapshot during startup. #570

ishan16696 · 2023-01-04T10:42:10Z

How to categorize this issue?
/area monitoring
/area backup
/kind enhancement

What would you like to be added:
During startup, etcd-backup-restore should also consider configured FullSnapshotSchedule along with timeStamp of last full snapshot, so that it won't missed any full snapshot.

Why is this needed:
Currently, alert KubeEtcdFullBackupFailed is calculated based on Last FullSnapshot Timestamp, so it checks whether backup-restore has taken a full snapshot within last 24h or not. But we have observed some false KubeEtcdFullBackupFailed alerts for some shoots if shoot is hibernated and wake up within 24h and determineBackupSchedule is calculating FullSnapshotSchedule on the basis of maintenance window.

Take this Scenario:

Suppose, shoot have maintenance window m1.
etcd-backup-restore took a full snapshot at timestamp t1 .
Then the cluster was hibernated at t2(before maintenance window i.e t2<m1) and woken up again at t3. (t3>t2>t1 && t3>m1)
When Cluster was woken up, t3-t1< 24h (24h didn’t pass yet), so no new full snapshot had taken by backup-restore as it calculates 23.5h from timeStamp of last full snapshot.
Past 24h, at t4 alert KubeEtcdFullBackupFailed checks the last timestamp of full snapshot and it found t4-t1>24h, now an alert has been raised and backup-restore is waiting to take full snapshot according to --schedule passed to backup-restore and it was calculated on basis of maintenance window of shoot (m1 of next day) which is not yet reached.

/cc @timuthy @shreyas-s-rao

The text was updated successfully, but these errors were encountered:

shreyas-s-rao · 2023-02-13T06:02:04Z

@ishan16696 thanks for resolving this issue. Can you please also raise another issue to calculate the previous full snapshot scheduled time based purely on the cron schedule? As discussed in #574 (review) thread

ishan16696 · 2023-02-13T10:11:25Z

Hi @shreyas-s-rao ,
I have created an issue for follow up: #587

shreyas-s-rao · 2023-02-13T11:20:05Z

Thanks! 🚀

gardener-robot added area/backup Backup related area/monitoring Monitoring (including availability monitoring and alerting) related kind/enhancement Enhancement, improvement, extension labels Jan 4, 2023

ishan16696 mentioned this issue Jan 4, 2023

Improve FullSnapshotSchedule calculation which is passed as flag --schedule to backup-restore. gardener/gardener#7280

Closed

ishan16696 self-assigned this Jan 4, 2023

abdasgupta mentioned this issue Jan 5, 2023

[Feature] Avoid hardcoded full snapshot duration value #277

Closed

abdasgupta added the priority/2 Priority (lower number equals higher priority) label Jan 6, 2023

ishan16696 mentioned this issue Jan 12, 2023

Enhances the decision to take full snapshot during startup to avoid missing of any full-snapshot. #574

Merged

ishan16696 added this to the v0.22.0 milestone Jan 30, 2023

ishan16696 closed this as completed in #574 Feb 13, 2023

gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Feb 13, 2023

ishan16696 mentioned this issue Feb 24, 2023

[Enhancement] Backup-restore should calculate previous cron schedule of full snapshot #587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance condition to take full snapshot during startup. #570

Enhance condition to take full snapshot during startup. #570

ishan16696 commented Jan 4, 2023

shreyas-s-rao commented Feb 13, 2023

ishan16696 commented Feb 13, 2023

shreyas-s-rao commented Feb 13, 2023

Enhance condition to take full snapshot during startup. #570

Enhance condition to take full snapshot during startup. #570

Comments

ishan16696 commented Jan 4, 2023

shreyas-s-rao commented Feb 13, 2023

ishan16696 commented Feb 13, 2023

shreyas-s-rao commented Feb 13, 2023