Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sink(ticdc): optimize buffer sink flush from O(N^2) to O(N) #3899

Merged
merged 7 commits into from
Dec 17, 2021

Conversation

overvenus
Copy link
Member

@overvenus overvenus commented Dec 16, 2021

What problem does this PR solve?

Optimize buffer sink flush from O(N^2) to O(N).

O(N^2): It flushes all tables for every flushEvent, and each table will periodically generate flushEvents.
O(N): It flushes tables according to flushEvent.tableID.

Close https://github.com/pingcap/ticdc/issues/3900

Benchmark:

$ go test -benchmem -run='^$' -bench '^BenchmarkRun$' github.com/pingcap/ticdc/cdc/sink
goos: linux
goarch: amd64
pkg: github.com/pingcap/ticdc/cdc/sink
cpu: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
BenchmarkRun/1_table(s)-40                745132              1566 ns/op              16 B/op          1 allocs/op
BenchmarkRun/4_table(s)-40                477426              2459 ns/op              64 B/op          4 allocs/op
BenchmarkRun/16_table(s)-40               171915              6228 ns/op             256 B/op         16 allocs/op
BenchmarkRun/64_table(s)-40                57037             22452 ns/op            1024 B/op         64 allocs/op
BenchmarkRun/256_table(s)-40               12612             89534 ns/op            4096 B/op        256 allocs/op
BenchmarkRun/1024_table(s)-40               3826            325067 ns/op           16384 B/op       1024 allocs/op
BenchmarkRun/4096_table(s)-40                854           1312032 ns/op           65536 B/op       4096 allocs/op
BenchmarkRun/16384_table(s)-40               208           5243129 ns/op          262145 B/op      16384 allocs/op
BenchmarkRun/65536_table(s)-40                55          19329650 ns/op         1048577 B/op      65536 allocs/op

$ benchstat bin/master.log bin/opt-sink.log
name                   old time/op    new time/op    delta
Run/1_table(s)-40        1.04µs ± 0%    1.66µs ±14%     ~     (p=0.333 n=1+5)
Run/4_table(s)-40        8.75µs ± 0%    2.44µs ± 3%     ~     (p=0.333 n=1+5)
Run/16_table(s)-40        103µs ± 0%       6µs ± 2%     ~     (p=0.333 n=1+5)
Run/64_table(s)-40       1.48ms ± 0%    0.02ms ± 3%     ~     (p=0.333 n=1+5)
Run/256_table(s)-40      22.8ms ± 0%     0.1ms ± 7%     ~     (p=0.333 n=1+5)
Run/1024_table(s)-40      368ms ± 0%       0ms ± 4%     ~     (p=0.333 n=1+5)
Run/4096_table(s)-40      6.01s ± 0%     0.00s ± 3%     ~     (p=0.333 n=1+5)
Run/16384_table(s)-40     98.0s ± 0%      0.0s ± 5%     ~     (p=0.333 n=1+5)

name                   old alloc/op   new alloc/op   delta
Run/1_table(s)-40         24.0B ± 0%     16.0B ± 0%     ~     (p=1.667 n=1+5)
Run/4_table(s)-40          192B ± 0%       64B ± 0%     ~     (p=1.667 n=1+5)
Run/16_table(s)-40       2.30kB ± 0%    0.26kB ± 0%     ~     (p=1.667 n=1+5)
Run/64_table(s)-40       33.8kB ± 0%     1.0kB ± 0%     ~     (p=1.667 n=1+5)
Run/256_table(s)-40       528kB ± 0%       4kB ± 0%     ~     (p=1.667 n=1+5)
Run/1024_table(s)-40     8.40MB ± 0%    0.02MB ± 0%     ~     (p=1.667 n=1+5)
Run/4096_table(s)-40      134MB ± 0%       0MB ± 0%     ~     (p=1.667 n=1+5)
Run/16384_table(s)-40    2.15GB ± 0%    0.00GB ± 0%  -99.99%  (p=0.000 n=1+5)

name                   old allocs/op  new allocs/op  delta
Run/1_table(s)-40          2.00 ± 0%      1.00 ± 0%     ~     (p=1.667 n=1+5)
Run/4_table(s)-40          20.0 ± 0%       4.0 ± 0%     ~     (p=1.667 n=1+5)
Run/16_table(s)-40          272 ± 0%        16 ± 0%     ~     (p=1.667 n=1+5)
Run/64_table(s)-40        4.16k ± 0%     0.06k ± 0%     ~     (p=1.667 n=1+5)
Run/256_table(s)-40       65.8k ± 0%      0.3k ± 0%     ~     (p=1.667 n=1+5)
Run/1024_table(s)-40      1.05M ± 0%     0.00M ± 0%     ~     (p=1.667 n=1+5)
Run/4096_table(s)-40      16.8M ± 0%      0.0M ± 0%     ~     (p=1.667 n=1+5)
Run/16384_table(s)-40      268M ± 0%        0M ± 0%     ~     (p=1.667 n=1+5)

Check List

Tests

  • Unit test
  • Integration test

Related changes

  • Need to cherry-pick to the release branch

Release note

Reduce checkpoint lag when capturing many tables.

@overvenus overvenus added component/sink Sink component. subject/performance Denotes an issue or pull request is related to replication performance. needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. needs-cherry-pick-release-5.0 Should cherry pick this PR to release-5.0 branch. needs-cherry-pick-release-5.1 Should cherry pick this PR to release-5.1 branch. needs-cherry-pick-release-5.2 Should cherry pick this PR to release-5.2 branch. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. labels Dec 16, 2021
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Dec 16, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • amyangfei
  • asddongmen

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 16, 2021
@overvenus overvenus changed the title Opt sink sink(ticdc): optimize buffer sink flush from O(N^2) to O(N) Dec 16, 2021
@overvenus overvenus requested a review from zhaoxinyu December 16, 2021 02:27
Signed-off-by: Neil Shen <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2021

Codecov Report

Merging #3899 (2294a28) into master (3873d39) will decrease coverage by 1.8444%.
The diff coverage is 39.8692%.

Flag Coverage Δ
cdc 58.1477% <45.8646%> (-0.0889%) ⬇️
dm 52.7659% <0.0000%> (-3.2687%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #3899        +/-   ##
================================================
- Coverage   57.0741%   55.2297%   -1.8445%     
================================================
  Files           478        480         +2     
  Lines         56551      58445      +1894     
================================================
+ Hits          32276      32279         +3     
- Misses        20978      22871      +1893     
+ Partials       3297       3295         -2     

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 16, 2021
@asddongmen asddongmen added the area/ticdc Issues or PRs related to TiCDC. label Dec 16, 2021
@sdojjy
Copy link
Member

sdojjy commented Dec 16, 2021

Can you add some description about how it works?

@overvenus
Copy link
Member Author

Can you add some description about how it works?

Updated, see description.

@overvenus
Copy link
Member Author

/run-kafka-integration-test
/run-integration-tests

startEmit := time.Now()
// find all rows before resolvedTs and emit to backend sink
for i := 0; i < batchSize; i++ {
tableID, resolvedTs := batch[i].tableID, batch[i].resolvedTs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are many flush events in batch for one table, does it help to improve the performance to find the max resovedTs per table first, do not need to search and flush multi times?

if !atomic.CompareAndSwapInt64(&m.flushing, 0, 1) {
return m.getCheckpointTs(tableID), nil
}
m.flushMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, mq sink is not thread-safe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manager flushes bufferSink concurrently, and bufferSink itself is thread-safe.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 17, 2021
@ti-chi-bot
Copy link
Member

@zhaoxinyu: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@overvenus
Copy link
Member Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: f7fbff2

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 17, 2021
@ti-chi-bot ti-chi-bot merged commit b5a52ce into pingcap:master Dec 17, 2021
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Dec 17, 2021
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #3947.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Dec 17, 2021
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #3948.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Dec 17, 2021
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #3949.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Dec 17, 2021
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #3950.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Dec 17, 2021
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #3951.

zhaoxinyu pushed a commit to zhaoxinyu/ticdc that referenced this pull request Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. component/sink Sink component. needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. needs-cherry-pick-release-5.0 Should cherry pick this PR to release-5.0 branch. needs-cherry-pick-release-5.1 Should cherry pick this PR to release-5.1 branch. needs-cherry-pick-release-5.2 Should cherry pick this PR to release-5.2 branch. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. subject/performance Denotes an issue or pull request is related to replication performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkpoint lag gets larger as the number captured tables increases
7 participants