rowexec: increase the batch size for join reader ordering strategy #84433

yuzefovich · 2022-07-14T15:24:44Z

This commit increases the default value of
sql.distsql.join_reader_ordering_strategy.batch_size cluster setting
(which determines the input row batch size for the lookup joins when
ordering needs to be maintained) from 10KiB to 100KiB since the previous
number is likely to have been too conservative. We have seen support
cases (https://github.com/cockroachlabs/support/issues/1627) where
bumping up this setting was needed to get reasonable performance, we
also change this setting to 100KiB in our TPC-E setup
(https://github.com/cockroachlabs/tpc-e/blob/d47d3ea5ce71ecb1be5e0e466a8aa7c94af63d17/tier-a/src/schema.rs#L404).

I did some historical digging, and I believe that the number 10KiB was
chosen somewhat arbitrarily with no real justification in #48058. That
PR changed how we measure the input row batches from the number of rows
to the memory footprint of the input rows. Prior to that change we had
100 rows limit, so my guess that 10KiB was somewhat dependent on that
number.

The reason we don't want this batch size to be too large is that we
buffer all looked up rows in a disk-backed container, so if too many
responses come back (because we included too many input rows in the
batch), that container has to spill to disk. To make sure we don't
regress in such scenarios this commit teaches the join reader to lower
the batch size bytes limit if the container does spill to disk, until
10 KiB which is treated as the minimum.

Release note: None

cockroach-teamcity · 2022-07-14T15:24:52Z

This change is

yuzefovich · 2022-07-14T17:25:28Z

The micro-benchmarks are here. They show that the only noticeable slowdown is when each input row results in many looked up rows (matchratio=onetosixtyfour) as well as when there are many inputs rows and the memory is limited. This slowdown is expected, but it seems acceptable to me given that this is somewhat unrealistic setup (we spill to disk, but never get a chance to lower the batch size dynamically since the input rows ran out).

DrewKimball

Should we be increasing the batch size in the case when we don't use the streamer? My understanding was that part of the reason we hadn't increased the batch size before is because it might have increased the likelihood of OOM since the old path isn't as careful around handling memory as the streamer.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

yuzefovich

That's a good question.

I think we should be safe to increase the batch size here regardless of the fact whether the streamer is used or not. We need to consider two cases:

the KV requests are parallelized and no memory limits are set on BatchRequests - this is the case when lookup columns form a key, meaning that each lookup can return at most one result. Given this at most one lookup row behavior, I think it would be safer to increase the batch size limit even further than the current proposed 100KiB (e.g. we do 2MiB when ordering doesn't have to be maintained).
the KV requests are not parallelized, so the memory limits are set - this is the case when each input row can result in many lookup rows. Here we do get the protection of TargetBytes limit, so we should be safe to increase the batch size.

I believe there is no difference between "lookup join with ordering" and "lookup join with no ordering" from the OOM stability perspective - both strategies have the same KV requests parallelization behavior, so we could have exactly the same batch size for both (i.e. 2MiB). However, the difference between two strategies is that the former has to buffer looked up rows in a container, possibly spilling to disk, so if we use too large of the batch size, the performance will drop. Thus, we chose to have much smaller batch size for the ordering case. I believe the value of 10KiB was too conservative and we should bump it up, especially given a very simple change to lower it in case we do spill to disk, and I don't anticipate the increase in OOMs because of it.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2)

DrewKimball

Thanks for the explanation! Do you think it would be worth it to allow the batch size to grow dynamically as well, up to a larger limit? Maybe using some strategy like bumping up the batch size if the row container isn't exceeding half capacity.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

This commit increases the default value of `sql.distsql.join_reader_ordering_strategy.batch_size` cluster setting (which determines the input row batch size for the lookup joins when ordering needs to be maintained) from 10KiB to 100KiB since the previous number is likely to have been too conservative. We have seen support cases (cockroachlabs/support#1627) where bumping up this setting was needed to get reasonable performance, we also change this setting to 100KiB in our TPC-E setup (https://github.com/cockroachlabs/tpc-e/blob/d47d3ea5ce71ecb1be5e0e466a8aa7c94af63d17/tier-a/src/schema.rs#L404). I did some historical digging, and I believe that the number 10KiB was chosen somewhat arbitrarily with no real justification in cockroachdb#48058. That PR changed how we measure the input row batches from the number of rows to the memory footprint of the input rows. Prior to that change we had 100 rows limit, so my guess that 10KiB was somewhat dependent on that number. The reason we don't want this batch size to be too large is that we buffer all looked up rows in a disk-backed container, so if too many responses come back (because we included too many input rows in the batch), that container has to spill to disk. To make sure we don't regress in such scenarios this commit teaches the join reader to lower the batch size bytes limit if the container does spill to disk, until 10 KiB which is treated as the minimum. Release note: None

yuzefovich

Yes, this is a good point. Growing and shrinking the limit dynamically seems more difficult, so I added a TODO for that idea.

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2)

craig · 2022-07-15T01:08:37Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich force-pushed the bump-jr-ordering-batch-size branch from 269bbe2 to 4ea7ead Compare July 14, 2022 17:11

yuzefovich marked this pull request as ready for review July 14, 2022 17:22

yuzefovich requested a review from a team as a code owner July 14, 2022 17:22

yuzefovich requested review from michae2 and DrewKimball July 14, 2022 17:22

DrewKimball reviewed Jul 14, 2022

View reviewed changes

yuzefovich force-pushed the bump-jr-ordering-batch-size branch from 4ea7ead to c3c1480 Compare July 14, 2022 20:49

yuzefovich commented Jul 14, 2022

View reviewed changes

DrewKimball approved these changes Jul 14, 2022

View reviewed changes

yuzefovich force-pushed the bump-jr-ordering-batch-size branch from c3c1480 to e9f1c05 Compare July 14, 2022 22:59

yuzefovich commented Jul 14, 2022

View reviewed changes

craig bot merged commit e8818d5 into cockroachdb:master Jul 15, 2022

yuzefovich deleted the bump-jr-ordering-batch-size branch July 15, 2022 01:30

nicktrav mentioned this pull request Jul 15, 2022

roachtest: sstable-corruption/table failed #84479

Closed

DrewKimball mentioned this pull request Oct 5, 2022

rowexec: regression in joinReaderOrderingStrategy allocations #89366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rowexec: increase the batch size for join reader ordering strategy #84433

rowexec: increase the batch size for join reader ordering strategy #84433

yuzefovich commented Jul 14, 2022

cockroach-teamcity commented Jul 14, 2022

yuzefovich commented Jul 14, 2022

DrewKimball left a comment

yuzefovich left a comment

DrewKimball left a comment

yuzefovich left a comment

craig bot commented Jul 15, 2022

rowexec: increase the batch size for join reader ordering strategy #84433

rowexec: increase the batch size for join reader ordering strategy #84433

Conversation

yuzefovich commented Jul 14, 2022

cockroach-teamcity commented Jul 14, 2022

yuzefovich commented Jul 14, 2022

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Jul 15, 2022