Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kvexec] merge join #8561

Merged
merged 19 commits into from
Nov 22, 2024
Merged

[kvexec] merge join #8561

merged 19 commits into from
Nov 22, 2024

Conversation

max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Nov 14, 2024

This isn't the best perf win on linux, but it counteracts the sql.Row interface PR which otherwise would swing merge join +30% in the wrong direction.

goos: darwin
goarch: arm64
pkg: github.com/dolthub/dolt/go/performance/microsysbench
                │  before.txt  │           after.txt           │
                │    sec/op    │    sec/op     vs base         │
OltpJoinScan-12   680.6µ ± 26%   612.1µ ± 17%  ~ (p=0.240 n=6)

                │  before.txt  │              after.txt              │
                │     B/op     │     B/op      vs base               │
OltpJoinScan-12   163.8Ki ± 0%   123.8Ki ± 0%  -24.42% (p=0.002 n=6)

                │ before.txt  │             after.txt              │
                │  allocs/op  │  allocs/op   vs base               │
OltpJoinScan-12   5.906k ± 0%   4.233k ± 0%  -28.33% (p=0.002 n=6)

TODO:

  • left join
  • nulls and other edge cases
  • execute full comparer

@max-hoffman
Copy link
Contributor Author

#benchmark

Copy link

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
302ab0b ok 5937457
version total_tests
302ab0b 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
4fa6227 ok 5937457
version total_tests
4fa6227 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

test_name from_latency_p95 to_latency_p95 percent_change
tpcc-scale-factor-1 61.08 59.99 -1.78
test_name from_server_name from_server_version from_tps to_server_name to_server_version to_tps percent_change
tpcc-scale-factor-1 dolt 0e34d26 40.64 dolt 4fa6227 40.7 0.15

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

read_tests from_latency to_latency percent_change
covering_index_scan 0.62 0.62 0.0
groupby_scan 16.41 16.41 0.0
index_join 2.26 2.26 0.0
index_join_scan 1.79 1.64 -8.38
index_scan 53.85 55.82 3.66
oltp_point_select 0.26 0.27 3.85
oltp_read_only 5.28 5.37 1.7
select_random_points 0.64 0.65 1.56
select_random_ranges 0.63 0.64 1.59
table_scan 54.83 55.82 1.81
types_table_scan 139.85 144.97 3.66
write_tests from_latency to_latency percent_change
oltp_delete_insert 5.77 5.88 1.91
oltp_insert 2.91 2.91 0.0
oltp_read_write 11.24 11.45 1.87
oltp_update_index 2.91 2.97 2.06
oltp_update_non_index 2.86 2.91 1.75
oltp_write_only 5.88 5.99 1.87
types_delete_insert 6.21 6.21 0.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
f7225f7 ok 5937457
version total_tests
f7225f7 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
0a7e70a ok 5937457
version total_tests
0a7e70a 5937457
correctness_percentage
100.0

@max-hoffman
Copy link
Contributor Author

#benchmark

Copy link

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
e8f4ead ok 5937457
version total_tests
e8f4ead 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

test_name from_latency_p95 to_latency_p95 percent_change
tpcc-scale-factor-1 57.87 58.92 1.81
test_name from_server_name from_server_version from_tps to_server_name to_server_version to_tps percent_change
tpcc-scale-factor-1 dolt f4e529a 41.65 dolt e8f4ead 41.42 -0.55

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

read_tests from_latency to_latency percent_change
covering_index_scan 0.62 0.69 11.29
groupby_scan 16.71 16.41 -1.8
index_join 2.26 2.26 0.0
index_join_scan 1.82 1.44 -20.88
index_scan 54.83 54.83 0.0
oltp_point_select 0.27 0.27 0.0
oltp_read_only 5.37 5.37 0.0
select_random_points 0.65 0.65 0.0
select_random_ranges 0.64 0.64 0.0
table_scan 55.82 55.82 0.0
types_table_scan 144.97 142.39 -1.78
write_tests from_latency to_latency percent_change
oltp_delete_insert 5.88 5.88 0.0
oltp_insert 2.91 2.91 0.0
oltp_read_write 11.45 11.45 0.0
oltp_update_index 2.97 2.97 0.0
oltp_update_non_index 2.91 2.91 0.0
oltp_write_only 5.99 5.99 0.0
types_delete_insert 6.21 6.21 0.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c1e9358 ok 5937457
version total_tests
c1e9358 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
fdabb0a ok 5937457
version total_tests
fdabb0a 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
b76555f ok 5937457
version total_tests
b76555f 5937457
correctness_percentage
100.0

Copy link
Contributor

@jycor jycor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using opposite logic is more readable:
https://github.com/dolthub/dolt/compare/max/kv-merge-join...james/refactor?expand=1

While gotos aren't the best, I think it's still pretty understandable, so LGTM

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
f67d302 ok 5937457
version total_tests
f67d302 5937457
correctness_percentage
100.0

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as bad as you built it up to be, generally not too hard to understand.

I think readability would be improved by making the loops explicit, keeping goto statements for true jumps rather than "go to beginning of this loop"

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
05c79da ok 5937457
version total_tests
05c79da 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
93cf89d ok 5937457
version total_tests
93cf89d 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
bac6b60 ok 5937457
version total_tests
bac6b60 5937457
correctness_percentage
100.0

@max-hoffman max-hoffman merged commit d5534d1 into main Nov 22, 2024
21 checks passed
@max-hoffman max-hoffman deleted the max/kv-merge-join branch November 22, 2024 23:19
Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.05 1.6
batching batch sql 10000 1 0.08 1.5
batching by line sql 10000 1 0.08 1.5
blob 1 blob 200000 1 0.91 3.74 3.52
blob 2 blobs 200000 1 0.88 4.27 4.48
blob no blob 200000 1 0.88 2.34 2.02
col type datetime 200000 1 0.83 2.87 2.77
col type varchar 200000 1 0.68 3.31 2.78
config width 2 cols 200000 1 0.78 2.4 1.94
config width 32 cols 200000 1 1.82 2.02 2.44
config width 8 cols 200000 1 0.99 2.23 1.91
pk type float 200000 1 0.88 2.15 1.82
pk type int 200000 1 0.78 2.37 1.96
pk type varchar 200000 1 1.58 1.56 1.53
row count 1.6mm 1600000 1 5.53 2.8 2.27
row count 400k 400000 1 1.44 2.63 2.14
row count 800k 800000 1 2.85 2.71 2.18
secondary index four index 200000 1 3.43 1.41 1.09
secondary index no secondary 200000 1 0.89 2.29 2.01
secondary index one index 200000 1 1.12 2.34 1.98
secondary index two index 200000 1 1.97 1.71 1.39
sorting shuffled 1mm 1000000 0 5.6 2.52 2.21
sorting sorted 1mm 1000000 1 5.57 2.49 2.32

Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.13
dolt_blame_commit_filter system table 3.03
dolt_commit_ancestors_commit_filter system table 0.59
dolt_commits_commit_filter system table 1.11
dolt_diff_log_join_from_commit system table 2.45
dolt_diff_log_join_to_commit system table 2.37
dolt_diff_table_from_commit_filter system table 1.13
dolt_diff_table_to_commit_filter system table 1.22
dolt_diffs_commit_filter system table 1
dolt_history_commit_filter system table 1.39
dolt_log_commit_filter system table 1.11

Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.7
adds_updates_deletes 60000 60000 60000 3.81
deletes_only 0 60000 0 1.88
updates_only 0 0 60000 2.44

Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.06 1.33
batching batch sql 10000 1 0.09 1.33
batching by line sql 10000 1 0.09 1.33
blob 1 blob 200000 1 0.87 3.77 3.77
blob 2 blobs 200000 1 0.85 4.39 4.62
blob no blob 200000 1 0.91 2.23 1.98
col type datetime 200000 1 0.79 2.99 2.89
col type varchar 200000 1 0.68 3.25 2.84
config width 2 cols 200000 1 0.76 2.45 2.01
config width 32 cols 200000 1 1.84 2.01 2.46
config width 8 cols 200000 1 0.93 2.37 2.08
pk type float 200000 1 0.85 2.24 1.82
pk type int 200000 1 0.8 2.35 1.91
pk type varchar 200000 1 1.72 1.44 1.23
row count 1.6mm 1600000 1 5.56 2.81 2.29
row count 400k 400000 1 1.41 2.68 2.21
row count 800k 800000 1 2.75 2.8 2.29
secondary index four index 200000 1 3.53 1.38 1.07
secondary index no secondary 200000 1 0.88 2.34 2.03
secondary index one index 200000 1 1.09 2.38 1.99
secondary index two index 200000 1 1.92 1.78 1.42
sorting shuffled 1mm 1000000 0 4.9 2.78 2.31
sorting sorted 1mm 1000000 1 4.94 2.75 2.3

Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.11
dolt_blame_commit_filter system table 2.91
dolt_commit_ancestors_commit_filter system table 0.61
dolt_commits_commit_filter system table 1
dolt_diff_log_join_from_commit system table 2.4
dolt_diff_log_join_to_commit system table 2.28
dolt_diff_table_from_commit_filter system table 1.16
dolt_diff_table_to_commit_filter system table 1.22
dolt_diffs_commit_filter system table 1
dolt_history_commit_filter system table 1.36
dolt_log_commit_filter system table 1.11

Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.73
adds_updates_deletes 60000 60000 60000 3.79
deletes_only 0 60000 0 1.87
updates_only 0 0 60000 2.45

Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.06 1.33
batching batch sql 10000 1 0.09 1.33
batching by line sql 10000 1 0.12 1
blob 1 blob 200000 1 0.87 3.85 3.66
blob 2 blobs 200000 1 0.89 4.22 4.37
blob no blob 200000 1 0.88 2.32 2.02
col type datetime 200000 1 0.8 2.91 2.82
col type varchar 200000 1 0.67 3.51 3
config width 2 cols 200000 1 0.76 2.42 2.04
config width 32 cols 200000 1 1.86 1.99 2.37
config width 8 cols 200000 1 0.93 2.37 2.06
pk type float 200000 1 0.87 2.28 1.83
pk type int 200000 1 0.78 2.68 1.92
pk type varchar 200000 1 1.5 1.63 1.39
row count 1.6mm 1600000 1 5.61 2.77 2.26
row count 400k 400000 1 1.4 2.68 2.2
row count 800k 800000 1 2.91 2.63 2.13
secondary index four index 200000 1 3.58 1.34 1.03
secondary index no secondary 200000 1 0.9 2.29 1.99
secondary index one index 200000 1 1.11 2.31 1.98
secondary index two index 200000 1 1.95 1.72 1.37
sorting shuffled 1mm 1000000 0 5.24 2.7 2.3
sorting sorted 1mm 1000000 1 5.3 2.61 2.22

Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.12
dolt_blame_commit_filter system table 2.94
dolt_commit_ancestors_commit_filter system table 0.61
dolt_commits_commit_filter system table 1
dolt_diff_log_join_from_commit system table 2.35
dolt_diff_log_join_to_commit system table 2.28
dolt_diff_table_from_commit_filter system table 1.13
dolt_diff_table_to_commit_filter system table 1.19
dolt_diffs_commit_filter system table 1
dolt_history_commit_filter system table 1.46
dolt_log_commit_filter system table 1.16

Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.72
adds_updates_deletes 60000 60000 60000 3.77
deletes_only 0 60000 0 1.85
updates_only 0 0 60000 2.39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants