-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: retune hash table constants in all of its uses #56968
Conversation
Note that I run the benchmarks with #56935 cherry-picked. The benchmarks are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 8 of 8 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colexec/aggregators_test.go, line 995 at r1 (raw file):
Vec: col, N: numInputRows, NullProbability: 0,
It's an important change to get rid of null probability in benchmarks, right? What's the reasoning? Could you include it in a commit message?
pkg/sql/colexec/hashjoiner.go, line 240 at r2 (raw file):
// This number was chosen after running the micro-benchmarks and relevant // TPCH queries using tpchvec/bench. var HashJoinerInitialNumBuckets = uint64(256)
Why var
instead of const
? Doesn't look like other packages need to modify this (which I don't think would be good anyway). If they need another value, they just need to pass that into the constructor, right?
This commit adjusts several benchmarks in order to run them against inputs with small number of tuples. The reasoning for such change is that we intend to remove the vectorize_row_count_threshold, so we now should be paying attention to both small and large input sets. Additionally, this commit removes the "with-nulls" cases from the aggregation benchmarks because the speed of "no-nulls" and "with-nulls" are roughly the same, so having both doesn't provide us more useful info, yet slows down the benchmark by a factor of 2. Release note: None
This commit changes the load factor and the initial number of buckets of the hash table according to the recent micro-benchmarking and running of TPCH queries. The numbers needed an update since we made the hash table as well as the operators that use it more dynamic. The benchmarks show an improvement on the inputs with small sizes, sometimes a regression on inputs roughly of `coldata.BatchSize()` in size, and mostly improvements on larger inputs. This trade-off seems beneficial to me. Release note: None
dc7dbaf
to
11e6312
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/colexec/aggregators_test.go, line 995 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
It's an important change to get rid of null probability in benchmarks, right? What's the reasoning? Could you include it in a commit message?
The speed of no-nulls and with-nulls cases are roughly the same, so having both doesn't provide us much new information, yet slows down running the benchmark by a factor of 2. Updated the commit message.
pkg/sql/colexec/hashjoiner.go, line 240 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Why
var
instead ofconst
? Doesn't look like other packages need to modify this (which I don't think would be good anyway). If they need another value, they just need to pass that into the constructor, right?
Good catch. I left it originally like this because at first I had coldata.BatchSize()
as the value, but after tuning I chose a different one (that can be a const
). Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r3, 8 of 8 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
TFTR! bors r+ |
Build succeeded: |
colexec: adjust benchmarks to include cases with small number of tuples
This commit adjusts several benchmarks in order to run them against
inputs with small number of tuples. The reasoning for such change is
that we intend to remove the vectorize_row_count_threshold, so we now
should be paying attention to both small and large input sets.
Additionally, this commit removes the "with-nulls" cases from the
aggregation benchmarks because the speed of "no-nulls" and "with-nulls"
are roughly the same, so having both doesn't provide us more useful
info, yet slows down the benchmark by a factor of 2.
Release note: None
colexec: retune hash table constants in all of its uses
This commit changes the load factor and the initial number of buckets of
the hash table according to the recent micro-benchmarking and running of
TPCH queries. The numbers needed an update since we made the hash table
as well as the operators that use it more dynamic. The benchmarks show
an improvement on the inputs with small sizes, sometimes a regression on
inputs roughly of
coldata.BatchSize()
in size, and mostly improvementson larger inputs. This trade-off seems beneficial to me.
Release note: None