You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened: @unmarshall and I observed that the Druid end-to-end tests were failing because the restoration process encountered an error. The backup-restore applied an incorrect delta event, resulting in the following error:
time="2025-02-17T09:45:57Z" level=error msg="Failed initialization: error while restoring corrupt data: failed to restore snapshot: mismatched event revision while applying delta snapshot, expected 11 but applied 15 " actor=backup-restore-server
Upon further investigation, we discovered that a separate handling mechanism for applying the first delta snapshot was introduced in this PR: #29.
This mechanism was introduced to address the overlap of events between the full snapshot and the first delta snapshot. However, it fails to account for the case where there is a complete overlap of events between the delta snapshot and the full snapshot:
As a result, backup-restore re-applies some etcd events that should not be reapplied, causing the restoration verification checks to fail and ultimately leading to the restoration failure:
returnfmt.Errorf("failed to connect to etcd KV client: %v", err)
}
etcdRevision:=getResponse.Header.GetRevision()
ifsnap.LastRevision!=etcdRevision {
returnfmt.Errorf("mismatched event revision while applying delta snapshot, expected %d but applied %d ", snap.LastRevision, etcdRevision)
}
How to reproduce it (as minimally and precisely as possible):
Start an etcd server.
Put some dummy data.
Start the backup-restore and make sure to take a full snapshot and delta snapshot which completely overlaps each other and timestamp of an overlapping delta snapshot should be later than that of the latest full snapshot.
example here, full snapshot with 0 to 11 revision completely overlaps with delta snapshot 8 to 11 revision with same timestamp.
ishan16696
changed the title
Apply of overlapping of delta snapshot is wrong
Handling of overlapping of first delta snapshot with full snapshot isn't done correctly.
Feb 17, 2025
How to categorize this issue?
/area disaster-recovery
/kind bug
What happened:
@unmarshall and I observed that the Druid end-to-end tests were failing because the restoration process encountered an error. The backup-restore applied an incorrect delta event, resulting in the following error:
Upon further investigation, we discovered that a separate handling mechanism for applying the first delta snapshot was introduced in this PR: #29.
This mechanism was introduced to address the overlap of events between the full snapshot and the first delta snapshot. However, it fails to account for the case where there is a complete overlap of events between the delta snapshot and the full snapshot:
etcd-backup-restore/pkg/snapshot/restorer/restorer.go
Lines 512 to 522 in 44b7d1b
As a result, backup-restore re-applies some etcd events that should not be reapplied, causing the restoration verification checks to fail and ultimately leading to the restoration failure:
etcd-backup-restore/pkg/snapshot/restorer/restorer.go
Lines 649 to 658 in 44b7d1b
How to reproduce it (as minimally and precisely as possible):
0 to 11
revision completely overlaps with delta snapshot8 to 11
revision with same timestamp.Anything else we need to know?:
We seen several occurence of this issue in the past, Example: #763
The text was updated successfully, but these errors were encountered: