Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql: change join reader batch size to be specified in bytes #39471

Closed
asubiotto opened this issue Aug 8, 2019 · 0 comments · Fixed by #48058
Closed

distsql: change join reader batch size to be specified in bytes #39471

asubiotto opened this issue Aug 8, 2019 · 0 comments · Fixed by #48058
Assignees
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Milestone

Comments

@asubiotto
Copy link
Contributor

asubiotto commented Aug 8, 2019

A prior version of this issue had this description:

This has already been done in the index joiner (commit message from #38622):

To amortize the cost of looking up rows, we create a batch of 100 rows
to use in one lookup request. Since the relationship is 1:1 in the index
joiner (we're doing a lookup on the primary index), the result size is
the same size as the request batch.

This commit increases the size of this batch to 10k, increasing the
result size for each lookup to 10k. This results in some significant
performance gains: e.g. tpch query 6 drops to a fifth of its original
runtime on a scalefactor 10 dataset due to amortizing lookups. Note that
this comes with increased memory usage per request. However, the
tableReader limits results for its scans to 10k as well, and there is no
good reason to allow normal scans to use more memory than index joiner
lookups. In the absence of proper accounting for KV responses, the
strategy of allowing index lookups to use the same resources and have
the same limitations as normal scans makes sense.

However the join reader is different because it is not guaranteed that there is a 1:1 relationship between lookup and result rows. #38614 tracked the addition of information for when this is the case, so we should use this to conditionally increase the join reader batch size in those cases.

Updated issue

The idea is the same but the KV interface now allows us to specify a TargetBytes for the results. This means that we can increase the batch size (or even change it to bytes) for lookups where the equality columns do not form a key. Unfortunately, using TargetBytes in the key case means that the lookup requests won't be parallelized, so we'll have to evaluate this tradeoff.

@asubiotto asubiotto added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-execution Relating to SQL execution. labels Aug 8, 2019
@asubiotto asubiotto added this to the 19.2 milestone Aug 8, 2019
@asubiotto asubiotto changed the title distsql: conditionally increase join reader batch size from 100 to 10k distsql: change join reader batch size to be specified in bytes Apr 14, 2020
@asubiotto asubiotto self-assigned this Apr 14, 2020
@asubiotto asubiotto modified the milestones: 19.2, 20.2 Apr 14, 2020
@craig craig bot closed this as completed in 273f90c May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant