-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage] Derive SUM return type to be PostgreSQL compatible #11151
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11151 +/- ##
==========================================
+ Coverage 0.00% 0.11% +0.11%
==========================================
Files 2214 2231 +17
Lines 119705 120181 +476
Branches 18163 18227 +64
==========================================
+ Hits 0 137 +137
- Misses 119705 120024 +319
- Partials 0 20 +20
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 31 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
dc7b159
to
8d4c33e
Compare
@@ -21,7 +21,7 @@ where | |||
) | |||
and ps_availqty > ( | |||
select | |||
0.5 * sum(l_quantity) | |||
CAST(0.5 * SUM(l_quantity) AS DECIMAL(19, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with this is b/c the 0.5 has DECIMAL value and it is a Literal
this will cause Calcite to explicitly hoist the SUM results by a specific precision and scale.
no one will actually use this in practices, and i suggest we disable this test until further fix can be put in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is for join schema mismatch from both side, one is ps_availqty
the type is decimal(19, 1)
, the other side is 0.5 * SUM(l_quantity)
, which is decimal(21,1)
.
@@ -313,22 +313,78 @@ | |||
}, | |||
{ | |||
"description": "Correlated join", | |||
"sql": "EXPLAIN PLAN FOR SELECT a.col1 FROM a WHERE a.col4 > (SELECT 0.5 * SUM(b.col3) FROM b WHERE b.col2 = a.col2 AND b.col1 = a.col1)", | |||
"sql": "EXPLAIN PLAN FOR SELECT a.col1 FROM a WHERE a.col4 > (SELECT CAST(0.5 * SUM(b.col3) AS DECIMAL(19, 1)) FROM b WHERE b.col2 = a.col2 AND b.col1 = a.col1)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast as double here and put a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query failed due to a de-correlation issue for type mismatch of DECIMAL(19, 1)
and ``DECIMAL(21, 1)`
java.lang.AssertionError: Cannot add expression of different type to set:
set type is RecordType(VARCHAR NOT NULL col1, VARCHAR NOT NULL col2, INTEGER NOT NULL col3, DECIMAL(1000, 0) NOT NULL col4, BOOLEAN NOT NULL col5, INTEGER NOT NULL col6, BIGINT NOT NULL ts, DECIMAL(21, 1) EXPR$0) NOT NULL
expression type is RecordType(VARCHAR NOT NULL col1, VARCHAR NOT NULL col2, INTEGER NOT NULL col3, DECIMAL(1000, 0) NOT NULL col4, BOOLEAN NOT NULL col5, INTEGER NOT NULL col6, BIGINT NOT NULL ts, DECIMAL(19, 1) EXPR$0) NOT NULL
set is rel#6543:LogicalCorrelate.(left=HepRelVertex#6533,right=HepRelVertex#6542,correlation=$cor0,joinType=left,requiredColumns={0, 1})
expression is LogicalProject(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4], col6=[$5], ts=[$6], EXPR$0=[*(0.5:DECIMAL(2, 1), $7)])
LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{0, 1}])
LogicalTableScan(table=[[a]])
LogicalAggregate(group=[{}], agg#0=[SUM($0)])
LogicalProject(col3=[$2])
LogicalFilter(condition=[AND(=($1, $cor0.col2), =($0, $cor0.col1))])
LogicalTableScan(table=[[b]])
at org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:391)
at org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:60)
at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:269)
at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:284)
at org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.onMatch2(RelDecorrelator.java:2703)
at org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.onMatch(RelDecorrelator.java:2610)
at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:337)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:420)
at org.apache.calcite.plan.hep.HepPlanner.executeRuleInstance(HepPlanner.java:243)
at org.apache.calcite.plan.hep.HepInstruction$RuleInstance$State.execute(HepInstruction.java:178)
at org.apache.calcite.plan.hep.HepPlanner.lambda$executeProgram$0(HepPlanner.java:211)
at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:210)
at org.apache.calcite.plan.hep.HepProgram$State.execute(HepProgram.java:118)
at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:205)
at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:191)
at org.apache.calcite.sql2rel.RelDecorrelator.decorrelate(RelDecorrelator.java:291)
at org.apache.calcite.sql2rel.RelDecorrelator.decorrelateQuery(RelDecorrelator.java:230)
at org.apache.pinot.query.QueryEnvironment.decorrelateIfNeeded(QueryEnvironment.java:282)
at org.apache.pinot.query.QueryEnvironment.compileQuery(QueryEnvironment.java:275)
at org.apache.pinot.query.QueryEnvironment.explainQuery(QueryEnvironment.java:202)
at org.apache.pinot.query.QueryEnvironment.explainQuery(QueryEnvironment.java:220)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally the max precision and scale is 1000, per: https://github.com/apache/pinot/blob/master/pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeSystem.java#L32
So the intermediate decimal results type may go pretty wild.
...on-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
Show resolved
Hide resolved
8d4c33e
to
79a07f9
Compare
We still need to find a way to solve this decimal type hoist problem. |
79a07f9
to
7b3ee2b
Compare
based on https://www.postgresql.org/docs/8.2/functions-aggregate.html and https://learn.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver16 it doesn't seem to have a consistent requirement for type hoisting in SQL standards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// SUM result will overflow INTEGER | ||
query = "SELECT SUM(CAST(ActualElapsedTime AS DOUBLE)) FROM mytable"; | ||
|
||
testQuery(query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a runtime test that verifies SUM (and other aggregations) on BIG_DECIMAL column? I remember testing this long back and there were some issues. Not sure if they've been fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really. given the fact that we are currently working around the type-hoisting behavior to not do any automatic conversion. it should be same type in same type out
For reference, here is the latest PostgreSQL document for result types: https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE |
i think what's missing is that
one solution I can think of is to register our own behavior by adding our own OperatorTable and our own return type inference this way we don't need to worry about the TypeFactory/TypeSystem overrides which intricately went through Calcite internal where we don't have direct control. |
7b3ee2b
to
cc4e26d
Compare
621b47b
to
b6c9f2f
Compare
bede511
to
292099d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thank you for fixing this issue
pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeSystem.java
Show resolved
Hide resolved
292099d
to
cd72e87
Compare
Derive sum return type for v2.
Based on https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE
for query:
Before:
data:image/s3,"s3://crabby-images/24187/24187f1ca0d50d72977d928a3d661b0ae4ba3726" alt="image"
After:
data:image/s3,"s3://crabby-images/dcb82/dcb82e79836095a62aaa63ddb5a65c6fcb0a358d" alt="image"