-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper null handling in SELECT, ORDER BY, DISTINCT, and GROUP BY #8927
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8927 +/- ##
============================================
- Coverage 70.13% 63.68% -6.45%
+ Complexity 4741 4719 -22
============================================
Files 1831 1784 -47
Lines 96382 94815 -1567
Branches 14408 14371 -37
============================================
- Hits 67594 60381 -7213
- Misses 24125 30101 +5976
+ Partials 4663 4333 -330
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In high level, we should not decide whether to enable query null handling based on the whether there is null value vector. This flag should be passed from the broker as a query option, and we can introduce another table config to enable it. This can ensure all the segments are executed using the same mode.
pinot-core/src/main/java/org/apache/pinot/core/common/RowBasedBlockValueFetcher.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/combine/GroupByOrderByCombineOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/SelectionOrderByOperator.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/core/operator/streaming/StreamingSelectionOnlyCombineOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/TransformOperator.java
Outdated
Show resolved
Hide resolved
.../org/apache/pinot/core/operator/combine/MinMaxValueBasedSelectionOrderByCombineOperator.java
Outdated
Show resolved
Hide resolved
@@ -50,6 +53,26 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataS | |||
_dataBlockCache = dataBlockCache; | |||
_column = column; | |||
_dataSource = dataSource; | |||
NullValueVectorReader nullValueReader = _dataSource == null ? null : _dataSource.getNullValueVector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data source should never be null
. The computation should be lazy, and the operator should decide whether to read it based on if null handling is enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to do computation lazily, and removed null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the logic to getNullBitmap()
method. Please note that this method is called only when nullHandlingEnabled
is set to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are plenty of comments, but the PR is in good shape overall. Good job!
pinot-core/src/main/java/org/apache/pinot/core/common/datablock/RowDataBlock.java
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/util/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
Show resolved
Hide resolved
...a/org/apache/pinot/core/query/distinct/raw/RawStringSingleColumnDistinctOrderByExecutor.java
Outdated
Show resolved
Hide resolved
...java/org/apache/pinot/core/query/distinct/raw/RawStringSingleColumnDistinctOnlyExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good.
cc @siddharthteotia to also take a look since this change might have performance impact
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpecUtils.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
Outdated
Show resolved
Hide resolved
...g/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
...g/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
...g/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorService.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/util/QueryOptionsUtils.java
Outdated
Show resolved
Hide resolved
Related to #8697 |
for (Object[] row : rows) { | ||
for (int i = 0; i < numColumns; i++) { | ||
Object columnValue = row[i]; | ||
if (columnValue == null) { | ||
row[i] = colDefaultNullValues[i]; | ||
nullBitmaps[i].add(rowId); | ||
} | ||
} | ||
rowId++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we merge this part with the main loop below? any specific reason we had to loop through the rows one more time?
Proper NULL handling in:
For Single-valued columns of all data types with:
Unit tests:
Updated
BigDecimalQueriesTest
to work with null values.Added
BooleanNullEnabledNoDictQueriesTest
.Added AllNullQueriesTest where all values are null.
Added NullEnabledQueriesTest where some values are nulls.