-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage] adding support for range predicate #9445
Conversation
walterddr
commented
Sep 21, 2022
•
edited
Loading
edited
- adding rules to decompose range predicate into OR joined simple comparison.
- if the look up is point or complemented points, we should still use IN / NOT IN
- if the search result in a constant value (true of false) it might trigger logical value literal node, which is also supported here.
@@ -200,6 +200,13 @@ private Object[][] provideTestSql() { | |||
new Object[]{"SELECT a.col1, b.col2 FROM a JOIN b ON a.col1 = b.col1 " | |||
+ " WHERE a.col1 IN ('foo') AND b.col2 NOT IN ('')"}, | |||
|
|||
// Range conditions with continuous and non-continuous range. | |||
new Object[]{"SELECT a.col1, b.col2 FROM a JOIN b ON a.col1 = b.col1 " | |||
+ " WHERE a.col3 IN (1, 2, 3) OR (a.col3 > 10 AND a.col3 < 50)"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also check the case where someone tries an impossible range condition or conditions with overlaps? I was running into an issue with those edge-cases in my PR. Examples:
(a.col3 > 10 AND a.col3 < 20) AND (a.col3 > 30 AND a.col3 < 40) # Impossible range
(a.col3 > 10 AND a.col3 < 20) AND (a.col3 > 15 AND a.col3 < 25) # Overlapping range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was planning to split that into a separate PR but yeah sure let me add that to this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is col IN (<single value>)
getting rewritten to col = <val>
?
FWIW, in the current engine we do this rewrite but not really worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single IN yes will be rewrite
Codecov Report
@@ Coverage Diff @@
## master #9445 +/- ##
============================================
- Coverage 69.87% 67.10% -2.77%
- Complexity 4790 4912 +122
============================================
Files 1890 1412 -478
Lines 100961 73743 -27218
Branches 15353 11785 -3568
============================================
- Hits 70543 49484 -21059
+ Misses 25451 20676 -4775
+ Partials 4967 3583 -1384
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
super(stageId); | ||
} | ||
|
||
public ValueNode(int currentStageId, DataSchema dataSchema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General suggestion -- for every new Operator
/ StageNode
introduced, I think we should add a corresponding query compilation test to validate the plan, expressions etc. In the current engine, we have done that for quite some time in CalciteSqlCompilerTest.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one yes. ValueNode
is a special node generated by calcite when the query evaluates back to a constant (regardless of underlying table)
value node replaces table scan as a root.
+ " WHERE a.col3 IN (1, 2, 3) OR (a.col3 > 10 AND a.col3 < 50)"}, | ||
|
||
new Object[]{"SELECT col1, SUM(col3) FROM a WHERE a.col3 BETWEEN 23 AND 36 " | ||
+ " GROUP BY col1 HAVING SUM(col3) > 10.0 AND MIN(col3) <> 123 AND MAX(col3) BETWEEN 10 AND 20"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current engine non-literals (function expressions) on RHS get rewritten by moving to LHS. Do we support that here ?
WHERE a > b + 20
-> WHERE a - b > 20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. this is not done by the range predicate but by the compilation on server.
* adding rules to decompose range predicate into OR joined simple comparison. * if the look up is point or complemented points, we should still use IN / NOT IN * if the search result in a constant value (true of false) it might trigger logical value literal node, which is also supported here. Co-authored-by: Rong Rong <[email protected]>