-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/copr: support batch coprocessor requests by store #39525
Conversation
Signed-off-by: you06 <[email protected]> fix missing max value Signed-off-by: you06 <[email protected]>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Signed-off-by: you06 <[email protected]>
Signed-off-by: you06 <[email protected]>
Signed-off-by: you06 <[email protected]>
store/copr/coprocessor.go
Outdated
} | ||
task := batchedTask.task | ||
if regionErr := batchResp.GetRegionError(); regionErr != nil { | ||
logutil.BgLogger().Info("DBG region error", zap.String("err", regionErr.String())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug log?
store/copr/coprocessor.go
Outdated
var err error | ||
resolveLockDetail, err = worker.handleLockErr(bo, lockErr, task) | ||
if err != nil { | ||
return nil, err | ||
} | ||
return []*copTask{task}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still handle the remaining batch responses and merge them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same for region error, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that error(lock, region miss, others) is returned in the original response while the batched responses returns ok
.
All the lock errors may lead to lock resolving, and all the region errors may lead to region miss retry.
If the order is not quired, we could return the results of the successful responses and not execute them again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error from worker.handleLockErr
means that it's failed to resolve lock or a backoff timeout, don't we return the error to the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem in L1148. But if the lock is resolved, I think we shouldn't only return the task itself? BatchResponses
may include success and failure results and they should be either returned through the channel or returned to retry. (Or is there anything I misunderstand here...?)
taskID := uint64(0) | ||
var store2Idx map[uint64]int | ||
if req.StoreBatchSize > 0 { | ||
store2Idx = make(map[uint64]int, 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the cache.SplitKeyRangesByBuckets(bo, ranges)
, the ranges are split by the ordered region range, and the ranges within each region are also ordered. For example
Region 1 Region 2 Region 3
[1, 2], [3, 4] [5, 10], [15, 20] [21, 25]
task1 task2 task3 task4 task5
So if the KeepOrder
is required, I think the batch processing could still work. The difference is that if the order is required, the coprocessor client could not response to the caller if task5
has finished while task2
does not.
@sticnarf @you06
What do you think? Please correct me if I missed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a tiny change to the example.
Region 1 Region 2 Region 3 Region 4
[1, 2] [3, 4] [5, 10], [15, 20] [21, 25]
task1 task2 task3 task4 task5
Suppose region1, region 2 and region 4 are located in store1, and region 3 is located in store2, there are two batch methods:
- [task1, task2, task5], [task3, task4]
In this way, we archive the maximum batch size, and task5 should wait until task4 is received.
- [task1, task2], [task3, task4], [task5]
In this way, we don't need to reorder the responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This largely reduces the effect of batching. It's possible that a batch involves hundreds of regions in one store. It's very common that region ranges intersect between stores.
Instead, I think we should store the range or the order index of the response and sort them after receiving all of them. This can be done in the next iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking about the ordering related work in the next iteration is fine to me, by now we could just disable batching if order is required to make it simple.
Signed-off-by: you06 <[email protected]>
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 12b3c39
|
There're some lint errors that need fixing. |
There are some mistakes when processing lock resolve details, fix the lint by now. |
Signed-off-by: you06 <[email protected]>
Signed-off-by: you06 <[email protected]>
Signed-off-by: you06 <[email protected]>
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 8ec0efc
|
Signed-off-by: you06 <[email protected]>
/merge |
This pull request has been accepted and is ready to merge. Commit hash: c1011b3
|
/run-mysql-test |
3 similar comments
/run-mysql-test |
/run-mysql-test |
/run-mysql-test |
TiDB MergeCI notify🔴 Bad News! [3] CI still failing after this pr merged.
|
Signed-off-by: you06 [email protected]
What problem does this PR solve?
Issue Number: ref #39361
Problem Summary:
Fanout query creates too many table reader tasks.
What is changed and how it works?
Batching the tasks by store reduces the number of RPC requests and serialize/deserialize cost. In the fanout scenario, this mechanism will batch the fanout tasks together.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.