Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd v3.4 broke snapshot.Metadata.Index < db.consistentIndex invariant #17146

Open
4 tasks done
serathius opened this issue Dec 19, 2023 · 1 comment
Open
4 tasks done

Comments

@serathius
Copy link
Member

serathius commented Dec 19, 2023

Bug report criteria

What happened?

We had an overloaded etcd running in cluster. It started failing liveness probes and was killed. When it was restarted it immediately failed with failed to recover v3 backend from snapshot and started crashlooping.

WARNING 2023-12-16T11:46:09.613Z failed to find [SNAPSHOT-INDEX].snap.db {"error":"snap: snapshot file doesn't exist","snapshot-file-path":"/var/etcd/data/member/snap/0000000050e5a1a3.snap.db","snapshot-index":1357226403}
DEFAULT 2023-12-16T11:46:09.619706767Z panic: failed to recover v3 backend from snapshot

Warning failed to find [SNAPSHOT-INDEX].snap.db comes from

// DBFilePath returns the file path for the snapshot of the database with
// given id. If the snapshot does not exist, it returns error.
func (s *Snapshotter) DBFilePath(id uint64) (string, error) {
if _, err := fileutil.ReadDir(s.dir); err != nil {
return "", err
}
fn := s.dbFilePath(id)
if fileutil.Exist(fn) {
return fn, nil
}
if s.lg != nil {
s.lg.Warn(
"failed to find [SNAPSHOT-INDEX].snap.db",
zap.Uint64("snapshot-index", id),
zap.String("snapshot-file-path", fn),
zap.Error(ErrNoDBSnapshot),
)
}
return "", ErrNoDBSnapshot
}

It is executed when snapshot.Metadata.Index < db.consistentIndex invariant is broken causing etcd to assume that etcd was crashed when persisting snapshot downloaded from leader.

// RecoverSnapshotBackend recovers the DB from a snapshot in case etcd crashes
// before updating the backend db after persisting raft snapshot to disk,
// violating the invariant snapshot.Metadata.Index < db.consistentIndex. In this
// case, replace the db with the snapshot db sent by the leader.
func RecoverSnapshotBackend(cfg config.ServerConfig, oldbe backend.Backend, snapshot raftpb.Snapshot, beExist bool, hooks *BackendHooks) (backend.Backend, error) {
consistentIndex := uint64(0)
if beExist {
consistentIndex, _ = schema.ReadConsistentIndex(oldbe.ReadTx())
}
if snapshot.Metadata.Index <= consistentIndex {
return oldbe, nil
}
oldbe.Close()
return OpenSnapshotBackend(cfg, snap.New(cfg.Logger, cfg.SnapDir()), snapshot, hooks)
}

However, based on logs this was just a normal snapshot and there is no trace of logs that would point to downloading a snapshot. I expect that somehow, the db was not properly flushed to disk

// This guarantees that Backend's consistent_index is >= index of last snapshot.

What did you expect to happen?

Etcd should:

  • Provide more detailed logging to track this invariant
  • Double check if db is properly flushed before saving the snapshot
  • Consider recovering even if invariant is broken.

How can we reproduce it (as minimally and precisely as possible)?

No repro at this time, would like to add dedicated failpoints to reproduce this.

Anything else we need to know?

No response

Etcd version (please run commands below)

v3.4.21

Etcd configuration (command line flags or environment variables)

N/A

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

n/a

Relevant log output

INFO 2023-12-16T11:43:28.194Z triggering snapshot {"local-member-applied-index":1357226403,"local-member-id":"83e6b03f7b740b2f","local-member-snapshot-count":10000,"local-member-snapshot-index":1357216402}
INFO 2023-12-16T11:43:28.203Z saved snapshot {"snapshot-index":1357226403}
INFO 2023-12-16T11:43:28.203Z compacted Raft logs {"compact-index":1357221403}
INFO 2023-12-16T11:43:29.192Z created a new WAL segment {"path":"/var/etcd/data/member/wal/00000000000263cb-0000000050e5a1cd.wal"}
INFO 2023-12-16T11:43:30.407Z purged {"path":"/var/etcd/data/member/snap/0000000000000291-0000000050e4de4e.snap"}
INFO 2023-12-16T11:43:30.469Z purged {"path":"/var/etcd/data/member/wal/00000000000263c6-0000000050e5165c.wal"}
INFO 2023-12-16T11:43:30.557Z trace[2127799171] linearizableReadLoop {"detail":"{readStateIndex:1357226506; appliedIndex:1357226506; }","duration":"114.486174ms","end":"2023-12-16T11:43:30.557Z","start":"2023-12-16T11:43:30.442Z","steps":["trace[2127799171] 'read index received' (duration: 114.481517ms)","trace[2127799171] 'applied index is now lower than readState.Index' (duration: 3.785µs)"]}
INFO 2023-12-16T11:43:31.270Z trace[1399730895] linearizableReadLoop {"detail":"{readStateIndex:1357226530; appliedIndex:1357226530; }","duration":"127.802313ms","end":"2023-12-16T11:43:31.270Z","start":"2023-12-16T11:43:31.142Z","steps":["trace[1399730895] 'read index received' (duration: 127.796686ms)","trace[1399730895] 'applied index is now lower than readState.Index' (duration: 4.432µs)"]}
WARNING 2023-12-16T11:43:46.447Z peer became inactive (message send to peer failed) {"error":"failed to write d2d82cb901dee70e on stream Message (write tcp 192.168.55.3:2380-\u003e192.168.55.10:39104: write: broken pipe)","peer-id":"d2d82cb901dee70e"}
WARNING 2023-12-16T11:43:46.460Z lost TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"d2d82cb901dee70e","stream-writer-type":"stream Message"}
INFO 2023-12-16T11:43:46.462Z set message encoder {"from":"83e6b03f7b740b2f","stream-type":"stream Message","to":"83e6b03f7b740b2f"}
INFO 2023-12-16T11:43:46.462Z peer became active {"peer-id":"d2d82cb901dee70e"}
WARNING 2023-12-16T11:43:46.462Z established TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"d2d82cb901dee70e","stream-writer-type":"stream Message"}
WARNING 2023-12-16T11:43:46.468Z established TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"76c6cefe750f4c59","stream-writer-type":"stream MsgApp v2"}
WARNING 2023-12-16T11:43:46.468Z closed TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"76c6cefe750f4c59","stream-writer-type":"stream MsgApp v2"}
INFO 2023-12-16T11:43:46.468Z set message encoder {"from":"83e6b03f7b740b2f","stream-type":"stream MsgApp v2","to":"83e6b03f7b740b2f"}
WARNING 2023-12-16T11:43:46.469Z closed TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"d2d82cb901dee70e","stream-writer-type":"stream MsgApp v2"}
INFO 2023-12-16T11:43:46.469Z set message encoder {"from":"83e6b03f7b740b2f","stream-type":"stream Message","to":"83e6b03f7b740b2f"}
WARNING 2023-12-16T11:43:46.469Z established TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"d2d82cb901dee70e","stream-writer-type":"stream MsgApp v2"}
WARNING 2023-12-16T11:43:46.469Z closed TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"76c6cefe750f4c59","stream-writer-type":"stream Message"}
WARNING 2023-12-16T11:43:46.469Z established TCP streaming connection with remote peer {"local-member-id":"83e6b03f7b740b2f","remote-peer-id":"76c6cefe750f4c59","stream-writer-type":"stream Message"}
INFO 2023-12-16T11:43:46.469Z set message encoder {"from":"83e6b03f7b740b2f","stream-type":"stream MsgApp v2","to":"83e6b03f7b740b2f"}
WARNING 2023-12-16T11:43:46.941Z waiting for ReadIndex response took too long, retrying {"retry-timeout":"500ms","sent-request-id":806017280953788417}
INFO 2023-12-16T11:43:46.943Z trace[137674103] linearizableReadLoop {"detail":"{readStateIndex:1357227804; appliedIndex:1357227804; }","duration":"505.144572ms","end":"2023-12-16T11:43:46.943Z","start":"2023-12-16T11:43:46.438Z","steps":["trace[137674103] 'read index received' (duration: 505.129536ms)","trace[137674103] 'applied index is now lower than readState.Index' (duration: 12.92µs)"]}
INFO 2023-12-16T11:44:36.983Z created a new WAL segment {"path":"/var/etcd/data/member/wal/00000000000263cc-0000000050e5b55d.wal"}
WARNING 2023-12-16T11:44:39.201974Z /health error; QGET failed etcdserver: request timed out (status code 503)
WARNING 2023-12-16T11:44:41.968676Z /health error; QGET failed etcdserver: request timed out (status code 503)
WARNING 2023-12-16T11:44:42.983626Z /health error; QGET failed etcdserver: request timed out (status code 503)
INFO 2023-12-16T11:45:00.473Z purged {"path":"/var/etcd/data/member/wal/00000000000263c7-0000000050e53024.wal"}
INFO 2023-12-16T11:45:01.801Z 83e6b03f7b740b2f [logterm: 659, index: 1357233199, vote: 0] cast MsgPreVote for 76c6cefe750f4c59 [logterm: 659, index: 1357233199] at term 659
INFO 2023-12-16T11:45:01.804Z 83e6b03f7b740b2f became follower at term 660
INFO 2023-12-16T11:45:01.804Z raft.node: 83e6b03f7b740b2f lost leader d2d82cb901dee70e at term 660
INFO 2023-12-16T11:45:01.804Z 83e6b03f7b740b2f [term: 659] received a MsgVote message with higher term from 76c6cefe750f4c59 [term: 660]
INFO 2023-12-16T11:45:01.804Z 83e6b03f7b740b2f [logterm: 659, index: 1357233199, vote: 0] cast MsgVote for 76c6cefe750f4c59 [logterm: 659, index: 1357233199] at term 660
INFO 2023-12-16T11:45:01.806Z raft.node: 83e6b03f7b740b2f elected leader 76c6cefe750f4c59 at term 660
INFO 2023-12-16T11:45:07.688Z received signal; shutting down {"signal":"terminated"}
INFO 2023-12-16T11:45:07.690Z closing etcd server {"advertise-client-urls":["https://127.0.0.1:2379"],"advertise-peer-urls":["https://192.168.55.3:2380"],"data-dir":"/var/etcd/data","name":"etcd-192.168.55.3"}
INFO 2023-12-16T11:45:14.694Z skipped leadership transfer; local server is not leader {"current-leader-member-id":"76c6cefe750f4c59","local-member-id":"83e6b03f7b740b2f"}
INFO 2023-12-16T11:45:18.701Z 83e6b03f7b740b2f [logterm: 660, index: 1357234139, vote: 76c6cefe750f4c59] ignored MsgPreVote from 76c6cefe750f4c59 [logterm: 660, index: 1357234139] at term 660: lease is not expired (remaining ticks: 10)
INFO 2023-12-16T11:45:19.854Z 83e6b03f7b740b2f [logterm: 660, index: 1357234139, vote: 76c6cefe750f4c59] ignored MsgPreVote from d2d82cb901dee70e [logterm: 660, index: 1357234139] at term 660: lease is not expired (remaining ticks: 10)
INFO 2023-12-16T11:45:19.857Z 83e6b03f7b740b2f [logterm: 660, index: 1357234139, vote: 76c6cefe750f4c59] ignored MsgVote from d2d82cb901dee70e [logterm: 660, index: 1357234139] at term 660: lease is not expired (remaining ticks: 10)
INFO 2023-12-16T11:45:19.859Z 83e6b03f7b740b2f [term: 660] received a MsgApp message with higher term from d2d82cb901dee70e [term: 661]
INFO 2023-12-16T11:45:19.859Z 83e6b03f7b740b2f became follower at term 661
INFO 2023-12-16T11:45:19.859Z raft.node: 83e6b03f7b740b2f changed leader from 76c6cefe750f4c59 to d2d82cb901dee70e at term 661

Etcd was restarted

WARNING 2023-12-16T11:45:41.837711Z check file permission: directory "/var/etcd/data" exist, but the permission is "drwxr-xr-x". The recommended permission is "-rwx------" to prevent possible unprivileged access to the data.
INFO 2023-12-16T11:45:51.839Z db file is flocked by another process, or taking too long {"path":"/var/etcd/data/member/snap/db","took":"10.000919296s"}
INFO 2023-12-16T11:46:05.371Z recovered v2 store from snapshot {"snapshot-index":1357226403,"snapshot-size":"98 kB"}
INFO 2023-12-16T11:46:05.372Z restored last compact revision {"meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":1316246562}
DEFAULT 2023-12-16T11:46:09.613Z failed to recover v3 backend from snapshot {"error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/etcdserver.NewServer\n\tgo.etcd.io/etcd/etcdserver/server.go:473\ngo.etcd.io/etcd/embed.StartEtcd\n\tgo.etcd.io/etcd/embed/etcd.go:218\ngo.etcd.io/etcd/etcdmain.startEtcd\n\tgo.etcd.io/etcd/etcdmain/etcd.go:302\ngo.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/etcdmain/etcd.go:144\ngo.etcd.io/etcd/etcdmain.Main\n\tgo.etcd.io/etcd/etcdmain/main.go:46\nmain.main\n\tgo.etcd.io/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
WARNING 2023-12-16T11:46:09.613Z failed to find [SNAPSHOT-INDEX].snap.db {"error":"snap: snapshot file doesn't exist","snapshot-file-path":"/var/etcd/data/member/snap/0000000050e5a1a3.snap.db","snapshot-index":1357226403}
DEFAULT 2023-12-16T11:46:09.619706767Z panic: failed to recover v3 backend from snapshot
DEFAULT 2023-12-16T11:46:09.619866958Z panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xc1a07e] goroutine 1 [running]: go.etcd.io/etcd/etcdserver.NewServer.func2(0xc000452de0, 0xc000451ca0) go.etcd.io/etcd/etcdserver/server.go:345 +0x3e panic(0xed6ba0, 0xc0003261f0) /usr/local/go/src/runtime/panic.go:971 +0x499 go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00029e630, 0xc007e7c000, 0x1, 0x1) /workspace/louhi_ws/go/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:229 +0x565 go.uber.org/zap.(*Logger).Panic(0xc000114c60, 0x1097425, 0x2a, 0xc007e7c000, 0x1, 0x1) /workspace/louhi_ws/go/pkg/mod/go.uber.org/[email protected]/logger.go:225 +0x85 go.etcd.io/etcd/etcdserver.NewServer(0x7ffd9ebad365, 0x11, 0x0, 0x0, 0x0, 0x0, 0xc00016fcb0, 0x1, 0x1, 0xc00016fa70, ...) go.etcd.io/etcd/etcdserver/server.go:473 +0x3ab3 go.etcd.io/etcd/embed.StartEtcd(0xc000404000, 0xc00015a580, 0x0, 0x0) go.etcd.io/etcd/embed/etcd.go:218 +0xa38 go.etcd.io/etcd/etcdmain.startEtcd(0xc000404000, 0x106c356, 0x6, 0xc000126a01, 0x2) go.etcd.io/etcd/etcdmain/etcd.go:302 +0x32 go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2() go.etcd.io/etcd/etcdmain/etcd.go:144 +0x2cba go.etcd.io/etcd/etcdmain.Main() go.etcd.io/etcd/etcdmain/main.go:46 +0x37 main.main() go.etcd.io/etcd/main.go:28 +0x25
@serathius
Copy link
Member Author

Not sure how it's possible that db had older consistent index then snapshot (1357226403) that was 1.5 minutes long.
Without db file it might be hard to exclude hardware issue. The original file was purged by oncall to restore the cluster. Will try to look for it.

For now my main concern would be adding proper logging to etcd to improve debugging such issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant