Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prune unselected THEN statements in CaseTransformFunction #8138

Merged

Conversation

richardstartin
Copy link
Member

@richardstartin richardstartin commented Feb 5, 2022

This PR prunes evaluation of THEN cases when they are not selected at the block level. This can happen when e.g. the conditions are applied to the values of a sorted column, or if a CASE holding is very rare and the THEN statement is a fallback.

This also includes some reductions in allocations in LiteralTransformFunction and LogicalOperatorTransformFunction, which are often used with CASE statements.

This speeds up query evaluation even when there are no branches to prune because selecting the cases is slightly more efficient, and some unnecessary allocations are removed:

master

Benchmark               (_intBaseValue)  (_numRows)                                                                                                                                                                                                                                                         (_query)  Mode  Cnt      Score      Error  Units
BenchmarkQueries.query                0     1500000  SELECT SUM(CASE WHEN (INT_COL > 123 AND INT_COL < 599999) THEN INT_COL ELSE 0 END) AS total_sum,MAX(CASE WHEN (INT_COL > 123 AND INT_COL < 599999) THEN INT_COL ELSE 0 END) AS total_avg FROM MyTable WHERE NO_INDEX_INT_COL > 5 AND NO_INDEX_INT_COL < 1499999  avgt    5  52684.909 ± 8678.865  us/op

branch

Benchmark               (_intBaseValue)  (_numRows)                                                                                                                                                                                                                                                         (_query)  Mode  Cnt      Score      Error  Units
BenchmarkQueries.query                0     1500000  SELECT SUM(CASE WHEN (INT_COL > 123 AND INT_COL < 599999) THEN INT_COL ELSE 0 END) AS total_sum,MAX(CASE WHEN (INT_COL > 123 AND INT_COL < 599999) THEN INT_COL ELSE 0 END) AS total_avg FROM MyTable WHERE NO_INDEX_INT_COL > 5 AND NO_INDEX_INT_COL < 1499999  avgt    5  49619.396 ± 1900.814  us/op

size arrays to the block size
do not eagerly format exception messages
construct BigDecimal only once in LiteralTransformFunction
@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2022

Codecov Report

Merging #8138 (2679a3f) into master (8bbf93a) will decrease coverage by 40.72%.
The diff coverage is 41.12%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #8138       +/-   ##
=============================================
- Coverage     71.39%   30.67%   -40.73%     
=============================================
  Files          1624     1613       -11     
  Lines         84198    83886      -312     
  Branches      12602    12584       -18     
=============================================
- Hits          60116    25734    -34382     
- Misses        19970    55859    +35889     
+ Partials       4112     2293     -1819     
Flag Coverage Δ
integration1 28.91% <41.12%> (+0.04%) ⬆️
integration2 27.58% <41.12%> (-0.09%) ⬇️
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...orm/function/LogicalOperatorTransformFunction.java 69.56% <0.00%> (-26.27%) ⬇️
...ator/transform/function/CaseTransformFunction.java 41.47% <37.34%> (-23.19%) ⬇️
...r/transform/function/LiteralTransformFunction.java 44.68% <68.42%> (-32.54%) ⬇️
.../java/org/apache/pinot/spi/utils/BooleanUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/data/MetricFieldSpec.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/spi/utils/BigDecimalUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/common/tier/TierFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/spi/config/table/TableType.java 0.00% <0.00%> (-100.00%) ⬇️
.../org/apache/pinot/spi/data/DimensionFieldSpec.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1118 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bbf93a...2679a3f. Read the comment docs.

@@ -58,6 +57,8 @@

private List<TransformFunction> _whenStatements = new ArrayList<>();
private List<TransformFunction> _elseThenStatements = new ArrayList<>();
private boolean[] _selections;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) brief comment to explain what is _selections. I am guessing this is to track if a statement is selected or not ?

@@ -102,8 +104,9 @@ private TransformResultMetadata calculateResultMetadata() {
for (int i = 0; i < numThenStatements; i++) {
TransformFunction thenStatement = _elseThenStatements.get(i + 1);
TransformResultMetadata thenStatementResultMetadata = thenStatement.getResultMetadata();
Preconditions.checkState(thenStatementResultMetadata.isSingleValue(),
String.format("Unsupported multi-value expression in the THEN clause of index: %d", i));
if (!thenStatementResultMetadata.isSingleValue()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this instead of Preconditions.checkState() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably to avoid String.format being evaluated unnecessarily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise you call String.format("Unsupported multi-value expression in the THEN clause of index: %d", i) every time a function is initialised, which will jump out at you fairly quickly in an allocation profile.

numSelections++;
}
}
_numSelections = numSelections;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a bitmap instead of boolean array ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you have fewer than 64 cases (a large case statement) all updates to the bitmap would be to the same word, which creates a data dependency in the loop, which slows the loop down.

}
int numWhenStatements = _whenStatements.size();
for (int i = 0; i < numWhenStatements; i++) {
for (int i = numWhenStatements - 1; i >= 0; i--) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why loop needs to be reversed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows branch-free setting of the highest priority case below (note that the statement numbers increase)

int[] intValues = transformFunction.transformToIntValuesSV(projectionBlock);
if (_numSelections == 1) {
System.arraycopy(intValues, 0, _intResults, 0, numDocs);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is checking for _numSelections == 1 and copy really needed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alternative is the loop below which handles the generic case, which is a lot slower.

@siddharthteotia siddharthteotia merged commit df1c268 into apache:master Feb 7, 2022
@richardstartin
Copy link
Member Author

There was some flakiness in the controller tests which should be fixed by @klsince in #8154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants