-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/tikv: fix a concurrency bug that may cause the batchClient timeout #22239
Conversation
No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md |
/bench |
LGTM |
LGTM |
/merge |
/run-all-tests |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-3.0 in PR #22335 |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-4.0 in PR #22336 |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-5.0-rc in PR #22337 |
} | ||
|
||
// TiDB will not send batch commands to TiFlash, to resolve the conflict with Batch Cop Request. | ||
enableBatch := req.StoreTp != kv.TiDB && req.StoreTp != kv.TiFlash | ||
c.recycleMu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this will impact the send speed?
…out (#22239) (#22337) Signed-off-by: ti-srebot <[email protected]>
…out (#22239) (#22336) Signed-off-by: ti-srebot <[email protected]>
What problem does this PR solve?
closes #22334
Problem Summary:
The
recycleIdleConnArray()
logic has a bug: when one goroutinegetConnArray()
and the other goroutine recycle the idle connection, the prior goroutine may get a stalebatchConn
which is closed already.sendBatchRequest()
using that stalebatchConn
would block until timeout.What is changed and how it works?
What's Changed:
This is not enough to protect the conn from been recycle and close.
Now the whole sending process is protected by the read lock, and modify conn map should obtain the write lock.
How it Works:
As long as the sending operation hold the read lock, the recycle connection operation need to wait to obtain the write lock.
Related changes
Maybe we can cherry-pick it to 5.0, it's rare to see this bug in the production environment.
Check List
Tests
Release note