Fix broken BIG_DECIMAL aggregations (MIN / MAX / SUM / AVG) in the multi-stage query engine #14689
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MIN
/MAX
/SUM
/AVG
) and this is attempted to be cast toBigDecimal
when constructing the data block to be sent over the wire. The cast is attempted because the return type forMIN
/MAX
/SUM
/AVG
is inferred asDECIMAL
when the input operand type isDECIMAL
(orBIG_DECIMAL
in Pinot) due to the use of the standard operators in Calcite. We don't want to change this type inference logic because it's backward incompatible (the actual return type for users would change if we change this) and also because we want to move towards actual polymorphic aggregation functions (i.e., not just the appearance of polymorphism through casting). Note that this isn't an issue in the leaf aggregation because the return type there is determined asDOUBLE
through the AggregationFunctionType (see here).TypeUtils::convert
(that handles single-stage -> multi-stage type conversion) includes a type check before converting because there are some aggregation functions likeSUMPRECISION
that actually returnBigDecimal
values and we don't want to lose precision there by converting through doubles.DOUBLE
). However, this has backward compatibility implications and we need to be careful about not breaking the v1 engine. Right now, the output type to end users will still be the same as the input operand type in the MSQE but there can be loss of precision (MAX(BIG_DECIMAL col) -> DOUBLE output -> cast to BIG_DECIMAL (loss of precision)
for instance).