Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: retune hash table constants in all of its uses #56968

Merged
merged 2 commits into from
Dec 1, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Nov 20, 2020

colexec: adjust benchmarks to include cases with small number of tuples

This commit adjusts several benchmarks in order to run them against
inputs with small number of tuples. The reasoning for such change is
that we intend to remove the vectorize_row_count_threshold, so we now
should be paying attention to both small and large input sets.

Additionally, this commit removes the "with-nulls" cases from the
aggregation benchmarks because the speed of "no-nulls" and "with-nulls"
are roughly the same, so having both doesn't provide us more useful
info, yet slows down the benchmark by a factor of 2.

Release note: None

colexec: retune hash table constants in all of its uses

This commit changes the load factor and the initial number of buckets of
the hash table according to the recent micro-benchmarking and running of
TPCH queries. The numbers needed an update since we made the hash table
as well as the operators that use it more dynamic. The benchmarks show
an improvement on the inputs with small sizes, sometimes a regression on
inputs roughly of coldata.BatchSize() in size, and mostly improvements
on larger inputs. This trade-off seems beneficial to me.

Release note: None

@yuzefovich yuzefovich requested review from asubiotto and a team November 20, 2020 19:02
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

Note that I run the benchmarks with #56935 cherry-picked.

The benchmarks are:

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, 8 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/aggregators_test.go, line 995 at r1 (raw file):

			Vec:              col,
			N:                numInputRows,
			NullProbability:  0,

It's an important change to get rid of null probability in benchmarks, right? What's the reasoning? Could you include it in a commit message?


pkg/sql/colexec/hashjoiner.go, line 240 at r2 (raw file):

// This number was chosen after running the micro-benchmarks and relevant
// TPCH queries using tpchvec/bench.
var HashJoinerInitialNumBuckets = uint64(256)

Why var instead of const? Doesn't look like other packages need to modify this (which I don't think would be good anyway). If they need another value, they just need to pass that into the constructor, right?

This commit adjusts several benchmarks in order to run them against
inputs with small number of tuples. The reasoning for such change is
that we intend to remove the vectorize_row_count_threshold, so we now
should be paying attention to both small and large input sets.

Additionally, this commit removes the "with-nulls" cases from the
aggregation benchmarks because the speed of "no-nulls" and "with-nulls"
are roughly the same, so having both doesn't provide us more useful
info, yet slows down the benchmark by a factor of 2.

Release note: None
This commit changes the load factor and the initial number of buckets of
the hash table according to the recent micro-benchmarking and running of
TPCH queries. The numbers needed an update since we made the hash table
as well as the operators that use it more dynamic. The benchmarks show
an improvement on the inputs with small sizes, sometimes a regression on
inputs roughly of `coldata.BatchSize()` in size, and mostly improvements
on larger inputs. This trade-off seems beneficial to me.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colexec/aggregators_test.go, line 995 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

It's an important change to get rid of null probability in benchmarks, right? What's the reasoning? Could you include it in a commit message?

The speed of no-nulls and with-nulls cases are roughly the same, so having both doesn't provide us much new information, yet slows down running the benchmark by a factor of 2. Updated the commit message.


pkg/sql/colexec/hashjoiner.go, line 240 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Why var instead of const? Doesn't look like other packages need to modify this (which I don't think would be good anyway). If they need another value, they just need to pass that into the constructor, right?

Good catch. I left it originally like this because at first I had coldata.BatchSize() as the value, but after tuning I chose a different one (that can be a const). Fixed.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 8 of 8 files at r3, 8 of 8 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 1, 2020

Build succeeded:

@craig craig bot merged commit e6f9184 into cockroachdb:master Dec 1, 2020
@yuzefovich yuzefovich deleted the vec-tuning branch December 1, 2020 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants