Retrying sending requests to a stuck TiKV may cost too much time #50432
Labels
affects-5.4
This bug affects the 5.4.x(LTS) versions.
affects-6.1
This bug affects the 6.1.x(LTS) versions.
affects-6.5
This bug affects the 6.5.x(LTS) versions.
affects-7.1
This bug affects the 7.1.x(LTS) versions.
affects-7.5
This bug affects the 7.5.x(LTS) versions.
affects-7.6
severity/major
sig/transaction
SIG:Transaction
type/bug
The issue is confirmed as a bug.
Bug Report
In client-go, when sending request to TiKV and there is an RPC error, it can retry several times. It's mostly controlled by a hard-coded constant:
It looks reasonable somehow. However, sometimes the RPC errors are thrown after being blocked for a long time, and then interrupted due to timeout. If we retry for 10 times, it can cost 10 times the timeout (
ReadTimeoutShort
30s orReadTimeoutMedium
60s). We currently suspect that this behavior causes TiDB's recovery time of service unnecessarily long when one of the TiKV node encounters problem.The text was updated successfully, but these errors were encountered: