colexec: retune hash table constants in all of its uses #56968

yuzefovich · 2020-11-20T19:02:52Z

colexec: adjust benchmarks to include cases with small number of tuples

This commit adjusts several benchmarks in order to run them against
inputs with small number of tuples. The reasoning for such change is
that we intend to remove the vectorize_row_count_threshold, so we now
should be paying attention to both small and large input sets.

Additionally, this commit removes the "with-nulls" cases from the
aggregation benchmarks because the speed of "no-nulls" and "with-nulls"
are roughly the same, so having both doesn't provide us more useful
info, yet slows down the benchmark by a factor of 2.

Release note: None

colexec: retune hash table constants in all of its uses

This commit changes the load factor and the initial number of buckets of
the hash table according to the recent micro-benchmarking and running of
TPCH queries. The numbers needed an update since we made the hash table
as well as the operators that use it more dynamic. The benchmarks show
an improvement on the inputs with small sizes, sometimes a regression on
inputs roughly of coldata.BatchSize() in size, and mostly improvements
on larger inputs. This trade-off seems beneficial to me.

Release note: None

cockroach-teamcity · 2020-11-20T19:03:07Z

This change is

yuzefovich · 2020-11-20T19:04:48Z

Note that I run the benchmarks with #56935 cherry-picked.

The benchmarks are:

asubiotto

Reviewed 4 of 4 files at r1, 8 of 8 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/colexec/aggregators_test.go, line 995 at r1 (raw file):

			Vec:              col,
			N:                numInputRows,
			NullProbability:  0,

It's an important change to get rid of null probability in benchmarks, right? What's the reasoning? Could you include it in a commit message?

pkg/sql/colexec/hashjoiner.go, line 240 at r2 (raw file):

// This number was chosen after running the micro-benchmarks and relevant
// TPCH queries using tpchvec/bench.
var HashJoinerInitialNumBuckets = uint64(256)

Why var instead of const? Doesn't look like other packages need to modify this (which I don't think would be good anyway). If they need another value, they just need to pass that into the constructor, right?

This commit adjusts several benchmarks in order to run them against inputs with small number of tuples. The reasoning for such change is that we intend to remove the vectorize_row_count_threshold, so we now should be paying attention to both small and large input sets. Additionally, this commit removes the "with-nulls" cases from the aggregation benchmarks because the speed of "no-nulls" and "with-nulls" are roughly the same, so having both doesn't provide us more useful info, yet slows down the benchmark by a factor of 2. Release note: None

This commit changes the load factor and the initial number of buckets of the hash table according to the recent micro-benchmarking and running of TPCH queries. The numbers needed an update since we made the hash table as well as the operators that use it more dynamic. The benchmarks show an improvement on the inputs with small sizes, sometimes a regression on inputs roughly of `coldata.BatchSize()` in size, and mostly improvements on larger inputs. This trade-off seems beneficial to me. Release note: None

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)

pkg/sql/colexec/aggregators_test.go, line 995 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

It's an important change to get rid of null probability in benchmarks, right? What's the reasoning? Could you include it in a commit message?

The speed of no-nulls and with-nulls cases are roughly the same, so having both doesn't provide us much new information, yet slows down running the benchmark by a factor of 2. Updated the commit message.

pkg/sql/colexec/hashjoiner.go, line 240 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why var instead of const? Doesn't look like other packages need to modify this (which I don't think would be good anyway). If they need another value, they just need to pass that into the constructor, right?

Good catch. I left it originally like this because at first I had coldata.BatchSize() as the value, but after tuning I chose a different one (that can be a const). Fixed.

asubiotto

Reviewed 8 of 8 files at r3, 8 of 8 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

yuzefovich · 2020-12-01T15:41:04Z

TFTR!

bors r+

craig · 2020-12-01T16:57:31Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich requested review from asubiotto and a team November 20, 2020 19:02

asubiotto suggested changes Nov 23, 2020

View reviewed changes

yuzefovich added 2 commits November 30, 2020 09:26

yuzefovich force-pushed the vec-tuning branch from dc7dbaf to 11e6312 Compare November 30, 2020 17:32

yuzefovich commented Nov 30, 2020

View reviewed changes

asubiotto approved these changes Dec 1, 2020

View reviewed changes

craig bot merged commit e6f9184 into cockroachdb:master Dec 1, 2020

yuzefovich deleted the vec-tuning branch December 1, 2020 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: retune hash table constants in all of its uses #56968

colexec: retune hash table constants in all of its uses #56968

yuzefovich commented Nov 20, 2020 •

edited

Loading

cockroach-teamcity commented Nov 20, 2020

yuzefovich commented Nov 20, 2020

asubiotto left a comment

yuzefovich left a comment

asubiotto left a comment

yuzefovich commented Dec 1, 2020

craig bot commented Dec 1, 2020

colexec: retune hash table constants in all of its uses #56968

colexec: retune hash table constants in all of its uses #56968

Conversation

yuzefovich commented Nov 20, 2020 • edited Loading

cockroach-teamcity commented Nov 20, 2020

yuzefovich commented Nov 20, 2020

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich commented Dec 1, 2020

craig bot commented Dec 1, 2020

yuzefovich commented Nov 20, 2020 •

edited

Loading