Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infinite follower/learner retry when network partition only between leader and follower/learner #17442

Closed
lysu opened this issue May 27, 2020 · 0 comments · Fixed by #17441
Closed
Assignees
Labels
severity/major type/bug The issue is confirmed as a bug.

Comments

@lysu
Copy link
Contributor

lysu commented May 27, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

set network partition between tikv leader and tikv follower/tiflash, but keep accessible between some tidb and follower/tiflash

2. What did you expect to see? (Required)

follower/learner read sql can be handle be retry other accessible and catch up follower or tiflash

3. What did you see instead (Required)

infinite retry and report kv unavaliable error

4. Affected version (Required)

4.0.0.rc2

5. Root Cause Analysis

in #16933 we introduce a mechanism that rechecks store liveness when sending requests failed, it works well for leader based requests.

but for follower or learner requests, this may introduce infinitely retry.

when there is a network partition between the leader and followers/leaners, but accessible between TiDB-Server and followers and leaners, followers and learner will return timeout error when they can not catch up with leader due to network partition, but rechecks store liveness still can success, but it's better to retry other peers immediately in this situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
2 participants