-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unpeel group by 3 ways to enable vectorization #7949
unpeel group by 3 ways to enable vectorization #7949
Conversation
d784614
to
cbe0ffd
Compare
Codecov Report
@@ Coverage Diff @@
## master #7949 +/- ##
=============================================
- Coverage 71.20% 27.54% -43.67%
=============================================
Files 1593 1584 -9
Lines 82477 82144 -333
Branches 12305 12269 -36
=============================================
- Hits 58729 22623 -36106
- Misses 19788 57461 +37673
+ Partials 3960 2060 -1900
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
bf263db
to
10670dd
Compare
I am guessing the |
...n/java/org/apache/pinot/core/query/aggregation/groupby/DictionaryBasedGroupKeyGenerator.java
Show resolved
Hide resolved
Yes, the multiplications ( |
...n/java/org/apache/pinot/core/query/aggregation/groupby/DictionaryBasedGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/pinot/core/query/aggregation/groupby/DictionaryBasedGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
10670dd
to
47108d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This rearranges the group key assignment in 2 steps - split the loop to compute the group ids from the loop which marks the existence of the groups in a block, and then specialises the group id calculation loop for cases where there are 2 groups and 3 groups respectively. This simplifies the group id calculation loop to the extent that C2 can vectorize it on JDK11. This speeds up the code up at least 5x compared to a loop which can handle any number of group by expressions in a microbenchmark small enough to examine the generated code:
The difference in code generation can be seen with perfasm (for a 3D group by, cardinalities 100 x 30 x 50):
before:
The vectorized group id assignment is so fast that it's not the bottleneck afterwards - nearly 70% of the time is now spent in group marking, and only 20% computing group ids. The marking can probably be improved but it's harder to parallelise than computing group ids.