Etcd v3.4 broke snapshot.Metadata.Index < db.consistentIndex
invariant
#17146
Labels
snapshot.Metadata.Index < db.consistentIndex
invariant
#17146
Bug report criteria
What happened?
We had an overloaded etcd running in cluster. It started failing liveness probes and was killed. When it was restarted it immediately failed with
failed to recover v3 backend from snapshot
and started crashlooping.Warning
failed to find [SNAPSHOT-INDEX].snap.db
comes frometcd/server/etcdserver/api/snap/db.go
Lines 76 to 95 in f7be2df
It is executed when
snapshot.Metadata.Index < db.consistentIndex
invariant is broken causing etcd to assume that etcd was crashed when persisting snapshot downloaded from leader.etcd/server/storage/backend.go
Lines 98 to 112 in f7be2df
However, based on logs this was just a normal snapshot and there is no trace of logs that would point to downloading a snapshot. I expect that somehow, the db was not properly flushed to disk
etcd/server/etcdserver/server.go
Line 2104 in f7be2df
What did you expect to happen?
Etcd should:
How can we reproduce it (as minimally and precisely as possible)?
No repro at this time, would like to add dedicated failpoints to reproduce this.
Anything else we need to know?
No response
Etcd version (please run commands below)
v3.4.21
Etcd configuration (command line flags or environment variables)
N/A
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
n/a
Relevant log output
The text was updated successfully, but these errors were encountered: