Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing delete event on watch opened on same revision as compaction request #19179

Closed
4 tasks
serathius opened this issue Jan 13, 2025 · 14 comments
Closed
4 tasks
Labels
area/robustness-testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/v3.4 release/v3.5 type/bug

Comments

@serathius
Copy link
Member

serathius commented Jan 13, 2025

Bug report criteria

What happened?

Starting from 9 January we started getting failures on presubmit tests.

Presubmit history goes up to December 31, with failures only starting on implying the issue is new.
image

Failues are due to resumable guarantee being broken

 logger.go:146: 2025-01-10T22:33:35.465Z	ERROR	Broke watch guarantee	{"guarantee": "resumable", "client": 4, "request": {"Key":"/registry/pods/","Revision":409,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":true}, "got-event": {"Type":"delete-operation","Key":"/registry/pods/default/jCocA","Value":{"Value":"","Hash":0},"Revision":410,"IsCreate":false,"PrevValue":{"Value":{"Value":"143","Hash":0},"ModRevision":146}}, "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/OL767","Value":{"Value":"","Hash":0},"Revision":409,"IsCreate":false}}
    validate.go:48: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window

From history visualizations I have seen it follows pattern:

  • Delete operation on rev X
  • Compect on Rev X
  • Etcd crashes on Rev X
  • Watch opened on Rev X

image

What did you expect to happen?

Resumable guarantee should not be broken.

How can we reproduce it (as minimally and precisely as possible)?

Didn't yet managed to reproduce it locally.

Anything else we need to know?

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-amd64/1877585036438409216
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877364764741472256
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877466683589791744
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877575502907052032
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877585037260492800
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877678423757819904
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877842586459181056
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1878101264374435840
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1878113522366287872
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1878196741560340480

Etcd version (please run commands below)

I was not able to reproduce the issue outside of CI, so I haven't confirmed other versions

Etcd configuration (command line flags or environment variables)

N/A

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

N/A

Relevant log output

No response

@serathius serathius added type/bug area/robustness-testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jan 13, 2025
@serathius
Copy link
Member Author

@serathius
Copy link
Member Author

@ahrtr
Copy link
Member

ahrtr commented Jan 13, 2025

Failues are due to resumable guarantee being broken

 logger.go:146: 2025-01-10T22:33:35.465Z	ERROR	Broke watch guarantee	{"guarantee": "resumable", "client": 4, "request": {"Key":"/registry/pods/","Revision":409,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":true}, "got-event": {"Type":"delete-operation","Key":"/registry/pods/default/jCocA","Value":{"Value":"","Hash":0},"Revision":410,"IsCreate":false,"PrevValue":{"Value":{"Value":"143","Hash":0},"ModRevision":146}}, "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/OL767","Value":{"Value":"","Hash":0},"Revision":409,"IsCreate":false}}
    validate.go:48: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window
 "got-event": {"Type":"delete-operation","Key":"/registry/pods/default/jCocA","Value":{"Value":"","Hash":0},"Revision":410,"IsCreate":false,"PrevValue":{"Value":{"Value":"143","Hash":0},"ModRevision":146}}, 
 "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/OL767","Value":{"Value":"","Hash":0},"Revision":409,"IsCreate":false}}

Both 409 and 410 are deletion (tombstone) revisions, and 410 was compacted (see log below), so it's expected that the first watched event was 410 instead of 409. Probably a test issue?

/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28628): {"level":"info","ts":"2025-01-10T05:40:37.527993Z","caller":"mvcc/index.go:194","msg":"compact tree index","revision":410}
......
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.582396Z","caller":"mvcc/kvstore.go:407","msg":"kvstore restored","current-rev":410}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.583902Z","caller":"mvcc/kvstore.go:416","msg":"resume scheduled compaction","scheduled-compact-revision":410}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.583952Z","caller":"mvcc/index.go:194","msg":"compact tree index","revision":410}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.587428Z","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":410,"took":"3.394801ms","hash":2280919731,"current-db-size-bytes":61440,"current-db-size":"61 kB","current-db-size-in-use-bytes":36864,"current-db-size-in-use":"37 kB"}
/home/prow/go/src/github.com/etcd-io/etcd/bin/etcd (TestRobustnessRegressionIssue17780-test-0) (28642): {"level":"info","ts":"2025-01-10T05:40:37.587457Z","caller":"mvcc/kvstore.go:250","msg":"previous compaction was interrupted, skip storing compaction hash value"}

@serathius
Copy link
Member Author

Both 409 and 410 are deletion (tombstone) revisions, and 410 was compacted (see log below), so it's expected that the first watched event was 410 instead of 409. Probably a test issue?

I don't follow, compaction is expected to be exclusive (compaction on rev 409, should preserve event with rev 409), while watch is expected to be inclusive (watch from rev 409, should include event on rev 409).

This can be validated by running following commands.

etcd $ ./bin/etcdctl put a 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
etcd $ ./bin/etcdctl del a -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"deleted":1}
etcd $ ./bin/etcdctl compact 3 -w json
compacted revision 3
etcd $ ./bin/etcdctl watch --rev 3 a -w json
{"Header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"Events":[{"type":1,"kv":{"key":"YQ==","mod_revision":3}}],"CompactRevision":0,"Canceled":false,"Created":false}

@ahrtr
Copy link
Member

ahrtr commented Jan 13, 2025

I don't follow, compaction is expected to be exclusive (compaction on rev 409, should preserve event with rev 409)

I was saying actually the compacted revision was 410 instead of 409. So it's expected behaviour that the first watched event was 410. Somehow robustness test was expecting to see 409.

To clarify, I got the log from the first link (pasted below) you provided.

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-amd64/1877585036438409216

@ahrtr
Copy link
Member

ahrtr commented Jan 13, 2025

This can be validated by running following commands.

etcd $ ./bin/etcdctl put a 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
etcd $ ./bin/etcdctl del a -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"deleted":1}
etcd $ ./bin/etcdctl compact 3 -w json
compacted revision 3
etcd $ ./bin/etcdctl watch --rev 3 a -w json
{"Header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2},"Events":[{"type":1,"kv":{"key":"YQ==","mod_revision":3}}],"CompactRevision":0,"Canceled":false,"Created":false}

Yes, I am aware of it. It's exactly what we fixed in #18274. See also my previous summary #18089 (comment)

@fuweid
Copy link
Member

fuweid commented Jan 13, 2025

Trying to reproduce this case https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-amd64/1877585036438409216 ("--experimental-compaction-batch-limit=300")

 2            91      norm    header:<ID:15517315581963036513 > txn:<compare:<target:MOD key:"/registry/pods/default/VrSyx" mod_revision:0 > success:<request_put:<key:"/registry/pods/default/VrSyx" value:"86" > > >

2           413      norm    header:<ID:15517315581963036837 > txn:<compare:<target:MOD key:"/registry/pods/default/VrSyx" mod_revision:88 > success:<request_delete_range:<key:"/registry/pods/default/VrSyx" > > failure:<request_range:<key:"/registry/pods/default/VrSyx" > > >

First batch compaction will delete created revision. And second round will trigger panic.
And then etcd restores. If restore code doesn't insert this key, it seems like we will miss this key in index and then compaction will remove it as well.

} else if !isTombstone(rkv.key) {

@fuweid
Copy link
Member

fuweid commented Jan 13, 2025

I think I reproduced it.

# terminal 1
$ GOFAIL_HTTP="127.0.0.1:1234" bin/etcd --experimental-compaction-batch-limit=5
# terminal 2
curl http://127.0.0.1:1234/compactBeforeSetFinishedCompact -XPUT -d'panic()'

$ ./bin/etcdctl put a 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":3}}

➜  ./bin/etcdctl put b 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":3}}

➜  ./bin/etcdctl put c 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":4,"raft_term":3}}

➜  e./bin/etcdctl put d 1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":5,"raft_term":3}}

➜  ./bin/etcdctl del a -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":6,"raft_term":3},"deleted":1}

➜  ./bin/etcdctl compact 6 --physical=true
{"level":"warn","ts":"2025-01-13T16:26:22.138400-0500","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002b90e0/127.0.0.1:2379","method":"/etcdserverpb.KV/Compact","attempt":0,"error":"rpc error: code = Unavailable desc = error reading from server: EOF"}
Error: rpc error: code = Unavailable desc = error reading from server: EOF

# go to terminal 1 to restart etcd

➜  ./bin/etcdctl watch "a" --prefix --rev=6
(empty)

cc @ahrtr @serathius

I think we should update logic at restore

} else if !isTombstone(rkv.key) {

@serathius
Copy link
Member Author

Awesome finding @fuweid, any guess why the issue started showing only recently (9 Jan) and was not visible before? Would be good to know to improve robustness test.

@serathius serathius changed the title TestRobustnessRegression/Issue17780 is failing with broken resumable watch guarantee Missing delete event on watch opened on same revision as compaction request Jan 14, 2025
@serathius
Copy link
Member Author

I was saying actually the compacted revision was 410 instead of 409. So it's expected behaviour that the first watched event was 410. Somehow robustness test was expecting to see 409.

Right, but fact that etcd didn't reject watch on compacted revision means that at that time this revision was not yet compacted.

@ahrtr
Copy link
Member

ahrtr commented Jan 14, 2025

First batch compaction will delete created revision. And second round will trigger panic. And then etcd restores. If restore code doesn't insert this key, it seems like we will miss this key in index and then compaction will remove it as well.

} else if !isTombstone(rkv.key) {

It's a valid point. Thanks for the catch! I think it's an edge case that we missed in #18274. The fix is easy, but it might take some time to add test to prevent any potential impact and fix any broken test cases.

This isn't a regression, also it's low possibility to happen in K8s (see #18089 (comment)). So I suggest that we don't block the release of 3.5.18.

I see another related potential issue. When etcdserver gets started, it will finish previous uncompleted compaction, but it's executed async. In theory, there is a very small window that the finishedCompactRev doesn't match the real compacted revisions. So a watch client may not get an ErrCompacted response, but it should. Accordingly the watch server may silently drop some events. Again, it's hard to reproduce, also usually it's unlikely to see it in practice. It's OK to fix it separately or together with this one.

if _, err := s.compactLockfree(scheduledCompact); err != nil {

serathius added a commit to serathius/etcd that referenced this issue Jan 16, 2025
Signed-off-by: Marek Siarkowicz <[email protected]>
serathius added a commit to serathius/etcd that referenced this issue Jan 16, 2025
Signed-off-by: Marek Siarkowicz <[email protected]>
@ahrtr
Copy link
Member

ahrtr commented Jan 22, 2025

@fuweid could you update the changelog for both 3.5 and. 3.4? thx

cc @ivanvc @jmhbnz

fuweid added a commit to fuweid/etcd that referenced this issue Jan 22, 2025
fuweid added a commit to fuweid/etcd that referenced this issue Jan 22, 2025
fuweid added a commit to fuweid/etcd that referenced this issue Jan 22, 2025
ahrtr added a commit that referenced this issue Jan 22, 2025
@fuweid
Copy link
Member

fuweid commented Jan 22, 2025

@fuweid could you update the changelog for both 3.5 and. 3.4? thx

cc @ivanvc @jmhbnz

Updated.

@ahrtr
Copy link
Member

ahrtr commented Jan 22, 2025

All done. thx

@ahrtr ahrtr closed this as completed Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness-testing priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release/v3.4 release/v3.5 type/bug
Development

No branches or pull requests

3 participants