distsql: change join reader batch size to be specified in bytes #39471

asubiotto · 2019-08-08T20:16:57Z

A prior version of this issue had this description:

This has already been done in the index joiner (commit message from #38622):

To amortize the cost of looking up rows, we create a batch of 100 rows
to use in one lookup request. Since the relationship is 1:1 in the index
joiner (we're doing a lookup on the primary index), the result size is
the same size as the request batch.

This commit increases the size of this batch to 10k, increasing the
result size for each lookup to 10k. This results in some significant
performance gains: e.g. tpch query 6 drops to a fifth of its original
runtime on a scalefactor 10 dataset due to amortizing lookups. Note that
this comes with increased memory usage per request. However, the
tableReader limits results for its scans to 10k as well, and there is no
good reason to allow normal scans to use more memory than index joiner
lookups. In the absence of proper accounting for KV responses, the
strategy of allowing index lookups to use the same resources and have
the same limitations as normal scans makes sense.

However the join reader is different because it is not guaranteed that there is a 1:1 relationship between lookup and result rows. #38614 tracked the addition of information for when this is the case, so we should use this to conditionally increase the join reader batch size in those cases.

Updated issue

The idea is the same but the KV interface now allows us to specify a TargetBytes for the results. This means that we can increase the batch size (or even change it to bytes) for lookups where the equality columns do not form a key. Unfortunately, using TargetBytes in the key case means that the lookup requests won't be parallelized, so we'll have to evaluate this tradeoff.

The text was updated successfully, but these errors were encountered:

asubiotto added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-execution Relating to SQL execution. labels Aug 8, 2019

asubiotto added this to the 19.2 milestone Aug 8, 2019

This was referenced Sep 13, 2019

sql: join reader doesn't parallelize single-key spans #40748

Closed

sql: move indexJoiner into joinReader #40749

Closed

asubiotto mentioned this issue Apr 14, 2020

rowexec: investigate and improve lookup join performance #47472

Closed

5 tasks

asubiotto changed the title ~~distsql: conditionally increase join reader batch size from 100 to 10k~~ distsql: change join reader batch size to be specified in bytes Apr 14, 2020

asubiotto self-assigned this Apr 14, 2020

asubiotto modified the milestones: 19.2, 20.2 Apr 14, 2020

This was referenced Apr 27, 2020

rowexec: change lookup join batch size to be specified in bytes #48058

Merged

rowexec: speed up lookup joins when no ordering is required #48117

Closed

craig bot closed this as completed in 273f90c May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distsql: change join reader batch size to be specified in bytes #39471

distsql: change join reader batch size to be specified in bytes #39471

asubiotto commented Aug 8, 2019 •

edited

Loading

distsql: change join reader batch size to be specified in bytes #39471

distsql: change join reader batch size to be specified in bytes #39471

Comments

asubiotto commented Aug 8, 2019 • edited Loading

A prior version of this issue had this description:

Updated issue

asubiotto commented Aug 8, 2019 •

edited

Loading