You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
set network partition between tikv leader and tikv follower/tiflash, but keep accessible between some tidb and follower/tiflash
2. What did you expect to see? (Required)
follower/learner read sql can be handle be retry other accessible and catch up follower or tiflash
3. What did you see instead (Required)
infinite retry and report kv unavaliable error
4. Affected version (Required)
4.0.0.rc2
5. Root Cause Analysis
in #16933 we introduce a mechanism that rechecks store liveness when sending requests failed, it works well for leader based requests.
but for follower or learner requests, this may introduce infinitely retry.
when there is a network partition between the leader and followers/leaners, but accessible between TiDB-Server and followers and leaners, followers and learner will return timeout error when they can not catch up with leader due to network partition, but rechecks store liveness still can success, but it's better to retry other peers immediately in this situation.
The text was updated successfully, but these errors were encountered:
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
set network partition between tikv leader and tikv follower/tiflash, but keep accessible between some tidb and follower/tiflash
2. What did you expect to see? (Required)
follower/learner read sql can be handle be retry other accessible and catch up follower or tiflash
3. What did you see instead (Required)
infinite retry and report kv unavaliable error
4. Affected version (Required)
4.0.0.rc2
5. Root Cause Analysis
in #16933 we introduce a mechanism that rechecks store liveness when sending requests failed, it works well for leader based requests.
but for follower or learner requests, this may introduce infinitely retry.
when there is a network partition between the leader and followers/leaners, but accessible between TiDB-Server and followers and leaners, followers and learner will return timeout error when they can not catch up with leader due to network partition, but rechecks store liveness still can success, but it's better to retry other peers immediately in this situation.
The text was updated successfully, but these errors were encountered: