store/copr: support batch coprocessor requests by store #39525

you06 · 2022-12-01T03:29:44Z

What problem does this PR solve?

Issue Number: ref #39361

Problem Summary:

Fanout query creates too many table reader tasks.

What is changed and how it works?

Batching the tasks by store reduces the number of RPC requests and serialize/deserialize cost. In the fanout scenario, this mechanism will batch the fanout tasks together.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Support batch coprocessor requests by store.

Signed-off-by: you06 <[email protected]> fix missing max value Signed-off-by: you06 <[email protected]>

ti-chi-bot · 2022-12-01T03:29:46Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

cfzjywxk
sticnarf

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Signed-off-by: you06 <[email protected]>

sessionctx/variable/sysvar.go

sticnarf · 2022-12-01T07:27:14Z

store/copr/coprocessor.go

+		}
+		task := batchedTask.task
+		if regionErr := batchResp.GetRegionError(); regionErr != nil {
+			logutil.BgLogger().Info("DBG region error", zap.String("err", regionErr.String()))


sticnarf · 2022-12-01T07:29:26Z

store/copr/coprocessor.go

+		var err error
+		resolveLockDetail, err = worker.handleLockErr(bo, lockErr, task)
+		if err != nil {
+			return nil, err
 		}
 		return []*copTask{task}, nil


We should still handle the remaining batch responses and merge them.

The same for region error, I think.

It is possible that error(lock, region miss, others) is returned in the original response while the batched responses returns ok.
All the lock errors may lead to lock resolving, and all the region errors may lead to region miss retry.
If the order is not quired, we could return the results of the successful responses and not execute them again.

The error from worker.handleLockErr means that it's failed to resolve lock or a backoff timeout, don't we return the error to the client?

No problem in L1148. But if the lock is resolved, I think we shouldn't only return the task itself? BatchResponses may include success and failure results and they should be either returned through the channel or returned to retry. (Or is there anything I misunderstand here...?)

cfzjywxk · 2022-12-01T07:35:03Z

store/copr/coprocessor.go

+	taskID := uint64(0)
+	var store2Idx map[uint64]int
+	if req.StoreBatchSize > 0 {
+		store2Idx = make(map[uint64]int, 16)


After the cache.SplitKeyRangesByBuckets(bo, ranges), the ranges are split by the ordered region range, and the ranges within each region are also ordered. For example

Region 1 Region 2 Region 3 [1, 2], [3, 4] [5, 10], [15, 20] [21, 25] task1 task2 task3 task4 task5

So if the KeepOrder is required, I think the batch processing could still work. The difference is that if the order is required, the coprocessor client could not response to the caller if task5 has finished while task2 does not.

@sticnarf @you06
What do you think? Please correct me if I missed anything.

I made a tiny change to the example.

Region 1 Region 2 Region 3 Region 4 [1, 2] [3, 4] [5, 10], [15, 20] [21, 25] task1 task2 task3 task4 task5

Suppose region1, region 2 and region 4 are located in store1, and region 3 is located in store2, there are two batch methods:

[task1, task2, task5], [task3, task4]

In this way, we archive the maximum batch size, and task5 should wait until task4 is received.

[task1, task2], [task3, task4], [task5]

In this way, we don't need to reorder the responses.

This largely reduces the effect of batching. It's possible that a batch involves hundreds of regions in one store. It's very common that region ranges intersect between stores.

Instead, I think we should store the range or the order index of the response and sort them after receiving all of them. This can be done in the next iterations.

Checking about the ordering related work in the next iteration is fine to me, by now we could just disable batching if order is required to make it simple.

Signed-off-by: you06 <[email protected]>

cfzjywxk · 2022-12-01T11:46:34Z

/merge

ti-chi-bot · 2022-12-01T11:46:38Z

This pull request has been accepted and is ready to merge.

Commit hash: 12b3c39

sticnarf · 2022-12-01T11:55:36Z

There're some lint errors that need fixing.

you06 · 2022-12-01T12:11:02Z

There're some lint errors that need fixing.

There are some mistakes when processing lock resolve details, fix the lint by now.

Signed-off-by: you06 <[email protected]>

cfzjywxk · 2022-12-01T14:39:08Z

/merge

ti-chi-bot · 2022-12-01T14:39:11Z

This pull request has been accepted and is ready to merge.

Commit hash: 8ec0efc

Signed-off-by: you06 <[email protected]>

you06 · 2022-12-01T14:47:00Z

/merge

ti-chi-bot · 2022-12-01T14:47:06Z

This pull request has been accepted and is ready to merge.

Commit hash: c1011b3

you06 · 2022-12-01T15:04:06Z

/run-mysql-test

you06 · 2022-12-01T15:11:55Z

/run-mysql-test

you06 · 2022-12-01T15:18:36Z

/run-mysql-test

you06 · 2022-12-01T15:19:05Z

/run-mysql-test

sre-bot · 2022-12-01T16:37:41Z

TiDB MergeCI notify

🔴 Bad News! [3] CI still failing after this pr merged.
These failed integration tests don't seem to be introduced by the current PR.

CI Name	Result	Duration	Compare with Parent commit
idc-jenkins-ci-tidb/integration-common-test	🔴 failed 1, success 16, total 17	34 min	Existing failure
idc-jenkins-ci/integration-cdc-test	🔴 failed 2, success 38, total 40	24 min	Existing failure
idc-jenkins-ci-tidb/common-test	🔴 failed 1, success 10, total 11	14 min	Existing failure
idc-jenkins-ci-tidb/tics-test	🟢 all 1 tests passed	5 min 57 sec	Existing passed
idc-jenkins-ci-tidb/integration-ddl-test	🟢 all 6 tests passed	5 min 1 sec	Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1	🟢 all 26 tests passed	4 min 50 sec	Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2	🟢 all 28 tests passed	4 min 37 sec	Existing passed
idc-jenkins-ci-tidb/mybatis-test	🟢 all 1 tests passed	4 min 21 sec	Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test	🟢 all 1 tests passed	2 min 31 sec	Existing passed
idc-jenkins-ci-tidb/plugin-test	🟢 build success, plugin test success	4min	Existing passed

support batch store cop

1a60f21

Signed-off-by: you06 <[email protected]> fix missing max value Signed-off-by: you06 <[email protected]>

you06 requested a review from a team as a code owner December 1, 2022 03:29

ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 1, 2022

you06 added 3 commits December 1, 2022 12:11

compatible with extra concurrency

b2b1f5c

Signed-off-by: you06 <[email protected]>

Merge branch 'master' into batch-cop

370df8d

upgrade client-go & bazel

b691f5d

Signed-off-by: you06 <[email protected]>

ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 1, 2022

handle error

72d8a25

Signed-off-by: you06 <[email protected]>

cfzjywxk requested review from sticnarf and cfzjywxk December 1, 2022 07:13

sticnarf reviewed Dec 1, 2022

View reviewed changes

cfzjywxk reviewed Dec 1, 2022

View reviewed changes

you06 added 2 commits December 1, 2022 18:33

fix error handle, address comment

234e3ea

Signed-off-by: you06 <[email protected]>

Merge branch 'master' into batch-cop

12b3c39

sticnarf approved these changes Dec 1, 2022

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 1, 2022

cfzjywxk approved these changes Dec 1, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 1, 2022

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 1, 2022

Merge branch 'master' into batch-cop

f109581

Merge branch 'master' into batch-cop

73d2901

fix lint

5d779b2

Signed-off-by: you06 <[email protected]>

ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Dec 1, 2022

you06 added 5 commits December 1, 2022 20:31

remove unused func

84ca2fa

Signed-off-by: you06 <[email protected]>

Merge branch 'master' into batch-cop

f4f183d

Merge branch 'master' into batch-cop

50769cc

Merge branch 'master' into batch-cop

d6e4a5a

tiny fix

8ec0efc

Signed-off-by: you06 <[email protected]>

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 1, 2022

ti-chi-bot and others added 2 commits December 1, 2022 22:39

Merge branch 'master' into batch-cop

b34e889

fix lock handle

c1011b3

Signed-off-by: you06 <[email protected]>

ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Dec 1, 2022

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 1, 2022

Merge branch 'master' into batch-cop

0c4ac8e

ti-chi-bot merged commit 9d9eaca into pingcap:master Dec 1, 2022

cfzjywxk mentioned this pull request Dec 16, 2022

corp: support batch coprocessor for tikv to reduce task and RPC numbers #39361

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store/copr: support batch coprocessor requests by store #39525

store/copr: support batch coprocessor requests by store #39525

you06 commented Dec 1, 2022 •

edited

Loading

ti-chi-bot commented Dec 1, 2022 •

edited

Loading

sticnarf Dec 1, 2022

sticnarf Dec 1, 2022

sticnarf Dec 1, 2022

cfzjywxk Dec 1, 2022

you06 Dec 1, 2022

sticnarf Dec 1, 2022

cfzjywxk Dec 1, 2022

you06 Dec 1, 2022

sticnarf Dec 1, 2022 •

edited

Loading

cfzjywxk Dec 1, 2022

cfzjywxk commented Dec 1, 2022

ti-chi-bot commented Dec 1, 2022

sticnarf commented Dec 1, 2022

you06 commented Dec 1, 2022

cfzjywxk commented Dec 1, 2022

ti-chi-bot commented Dec 1, 2022

you06 commented Dec 1, 2022

ti-chi-bot commented Dec 1, 2022

you06 commented Dec 1, 2022

you06 commented Dec 1, 2022

you06 commented Dec 1, 2022

you06 commented Dec 1, 2022

sre-bot commented Dec 1, 2022

store/copr: support batch coprocessor requests by store #39525

store/copr: support batch coprocessor requests by store #39525

Conversation

you06 commented Dec 1, 2022 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sticnarf Dec 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfzjywxk commented Dec 1, 2022

ti-chi-bot commented Dec 1, 2022

sticnarf commented Dec 1, 2022

you06 commented Dec 1, 2022

cfzjywxk commented Dec 1, 2022

ti-chi-bot commented Dec 1, 2022

you06 commented Dec 1, 2022

ti-chi-bot commented Dec 1, 2022

you06 commented Dec 1, 2022

you06 commented Dec 1, 2022

you06 commented Dec 1, 2022

you06 commented Dec 1, 2022

sre-bot commented Dec 1, 2022

TiDB MergeCI notify

you06 commented Dec 1, 2022 •

edited

Loading

ti-chi-bot commented Dec 1, 2022 •

edited

Loading

sticnarf Dec 1, 2022 •

edited

Loading