-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backoff pd api when fails #6556
Comments
ref tikv/pd#6556, close #14964 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv/pd#6556, close tikv#14964 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: tonyxuqqi <[email protected]>
This |
and for tikv's pd client, @rleungx has increased the retry interval. tikv/tikv#14954. if not works, we may need consider backoff and increase the max retry time. cc @rleungx |
@nolouch thanks for the reply, the behavior we observe it there are excessive, a few thousand qps, getMember calls from tidb components to pd when pd leader is already having issue in tidb, it could also actively triggers i would be great to reduce has some backoff for this particular scenario, because the pd leader is already having issue at that point, any further load could make thing worse |
Got, On the TiDB side, all requests should already have a backoff mechanism via client-go's backoff but some paths may not be covered. like you said the |
BTW, tikv/tikv#13673, in pd-client v2 , we do not need to retry in the inner of the client, so we no need backoff to reduce the requests. then this problem can be significantly improved. |
…15191) ref tikv/pd#6556, close #15184 The store heartbeat will report periodically, no need to do retires - do not retry the store heartbeat - change `remain_reconnect_count` as `remain_request_count` - fix some metrics Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv/pd#6556, close tikv#15184 Signed-off-by: ti-chi-bot <[email protected]>
ref tikv/pd#6556, close tikv#15184 Signed-off-by: ti-chi-bot <[email protected]>
ref tikv/pd#6556, close tikv#15184 Signed-off-by: ti-chi-bot <[email protected]>
…15191) (#15231) ref tikv/pd#6556, close #15184 The store heartbeat will report periodically, no need to do retires - do not retry the store heartbeat - change `remain_reconnect_count` as `remain_request_count` - fix some metrics Signed-off-by: ti-chi-bot <[email protected]> Signed-off-by: nolouch <[email protected]> Co-authored-by: ShuNing <[email protected]> Co-authored-by: nolouch <[email protected]>
TiDB Side#6978 try to reduce the GetMemeber Request. we can see the preliminary test results: The RPC call was reduced from 3.22k to 170 ops, which is relative to the TiDB numbers and client requests for triaging checkLeader. This reduction could be more significant in larger cluster scenarios. And more tests are necessary to ensure that no further issues arise. |
close #5739, ref #6556 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
TiKV Sidetikv/tikv#15429 try to reduce the retries in tikv side(all tidb is no workload), details test can see in PR, the result like: BeforeAfter |
ref #6556 Signed-off-by: husharp <[email protected]>
ref tikv/pd#6556, close #15428 pc_client: add store-level backoff for the reconnect retries Signed-off-by: nolouch <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv/pd#6556, close tikv#15428 Signed-off-by: ti-chi-bot <[email protected]>
ref tikv/pd#6556, close tikv#15428 Signed-off-by: ti-chi-bot <[email protected]>
…15191) (#15232) ref tikv/pd#6556, close #15184 The store heartbeat will report periodically, no need to do retires - do not retry the store heartbeat - change `remain_reconnect_count` as `remain_request_count` - fix some metrics Signed-off-by: ti-chi-bot <[email protected]> Signed-off-by: nolouch <[email protected]> Co-authored-by: ShuNing <[email protected]> Co-authored-by: nolouch <[email protected]>
ref tikv/pd#6556, close #15428 pc_client: add store-level backoff for the reconnect retries Signed-off-by: ti-chi-bot <[email protected]> Signed-off-by: nolouch <[email protected]> Co-authored-by: ShuNing <[email protected]> Co-authored-by: nolouch <[email protected]>
ref tikv/pd#6556, close #15428 pc_client: add store-level backoff for the reconnect retries Signed-off-by: ti-chi-bot <[email protected]> Signed-off-by: nolouch <[email protected]> Co-authored-by: ShuNing <[email protected]> Co-authored-by: nolouch <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Enhancement Task
particularly, like pd get member request should have a backoff, otherwise it could overload pd and prevent it recover from some temporary issues
like for v6.5.1, https://github.com/tikv/pd/blob/v6.5.1/client/base_client.go#L306
The text was updated successfully, but these errors were encountered: