Retrying sending requests to a stuck TiKV may cost too much time #50432

MyonKeminta · 2024-01-15T09:23:26Z

Bug Report

In client-go, when sending request to TiKV and there is an RPC error, it can retry several times. It's mostly controlled by a hard-coded constant:

const maxReplicaAttempt = 10

It looks reasonable somehow. However, sometimes the RPC errors are thrown after being blocked for a long time, and then interrupted due to timeout. If we retry for 10 times, it can cost 10 times the timeout (ReadTimeoutShort 30s or ReadTimeoutMedium 60s). We currently suspect that this behavior causes TiDB's recovery time of service unnecessarily long when one of the TiKV node encounters problem.

The text was updated successfully, but these errors were encountered:

…g time on RPC timeout and unnecessary backoff on NotLeader errors (#50506) close #50432

ti-chi-bot bot added may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Jan 16, 2024

MyonKeminta changed the title ~~Too much retries when sending requests to a stuck TiKV~~ Retrying sending requests to a stuck TiKV may cost too much time Jan 16, 2024

MyonKeminta added affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. and removed may-affects-6.1 may-affects-5.4 This bug maybe affects 5.4.x versions. labels Jan 16, 2024

ti-chi-bot bot closed this as completed in #50506 Jan 23, 2024

ti-chi-bot bot pushed a commit that referenced this issue Jan 23, 2024

store/tikv: Update client-go to fix issues about retrying for too lon…

8a53c48

…g time on RPC timeout and unnecessary backoff on NotLeader errors (#50506) close #50432

jebter added the sig/transaction SIG:Transaction label Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrying sending requests to a stuck TiKV may cost too much time #50432

Retrying sending requests to a stuck TiKV may cost too much time #50432

MyonKeminta commented Jan 15, 2024

Retrying sending requests to a stuck TiKV may cost too much time #50432

Retrying sending requests to a stuck TiKV may cost too much time #50432

Comments

MyonKeminta commented Jan 15, 2024

Bug Report