-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true) #30898
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133252 has finished for PR 30898 at commit
|
# Conflicts: # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushFoldableIntoBranchesSuite.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133342 has finished for PR 30898 at commit
|
Test build #133369 has finished for PR 30898 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Looks fine if the tests pass. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133372 has finished for PR 30898 at commit
|
@@ -475,6 +475,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { | |||
case If(TrueLiteral, trueValue, _) => trueValue | |||
case If(FalseLiteral, _, falseValue) => falseValue | |||
case If(Literal(null, _), _, falseValue) => falseValue | |||
case If(_, TrueLiteral, TrueLiteral) => TrueLiteral | |||
case If(_, FalseLiteral, FalseLiteral) => FalseLiteral | |||
case If(cond, TrueLiteral, FalseLiteral) => cond | |||
case If(cond, FalseLiteral, TrueLiteral) => Not(cond) | |||
case If(cond, trueValue, falseValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangyum, doesn't this cover the case above by this case match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for deterministic expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditions you added here also should only for deterministic ones then because cond
will not be executed, see #21848 (comment) as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the tests you added at https://github.com/apache/spark/pull/30898/files#diff-c18cf81d53bb8508802317a785a74f2dadb0487d88e0164efb36f89ce64973e3R210-R213 will pass because of the case match here.
@@ -475,6 +475,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { | |||
case If(TrueLiteral, trueValue, _) => trueValue | |||
case If(FalseLiteral, _, falseValue) => falseValue | |||
case If(Literal(null, _), _, falseValue) => falseValue | |||
case If(_, TrueLiteral, TrueLiteral) => TrueLiteral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This skips evaluating the cond
, which can be wrong if cond
is non-deterministic.
Kubernetes integration test starting |
Kubernetes integration test status failure |
@@ -484,6 +484,9 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { | |||
case If(cond, FalseLiteral, l @ Literal(null, _)) if !cond.nullable => And(Not(cond), l) | |||
case If(cond, TrueLiteral, l @ Literal(null, _)) if !cond.nullable => Or(cond, l) | |||
|
|||
case CaseWhen(Seq((cond, TrueLiteral)), Some(FalseLiteral)) => cond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we already optimize single-branch CASE WHEN to IF and get this optimized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so this follows #21850 (comment) , LGTM
Test build #133428 has finished for PR 30898 at commit
|
@@ -243,4 +243,40 @@ class SimplifyConditionalSuite extends PlanTest with ExpressionEvalHelper with P | |||
Literal.create(null, IntegerType)) | |||
} | |||
} | |||
|
|||
test("SPARK-33884: simplify CaseWhen clauses with (true and false) and (false and true)") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test adapt from
Lines 203 to 237 in c2eac1d
test("SPARK-33845: remove unnecessary if when the outputs are boolean type") { | |
// verify the boolean equivalence of all transformations involved | |
val fields = Seq( | |
'cond.boolean.notNull, | |
'cond_nullable.boolean, | |
'a.boolean, | |
'b.boolean | |
) | |
val Seq(cond, cond_nullable, a, b) = fields.zipWithIndex.map { case (f, i) => f.at(i) } | |
val exprs = Seq( | |
// actual expressions of the transformations: original -> transformed | |
If(cond, true, false) -> cond, | |
If(cond, false, true) -> !cond, | |
If(cond_nullable, true, false) -> (cond_nullable <=> true), | |
If(cond_nullable, false, true) -> (!(cond_nullable <=> true))) | |
// check plans | |
for ((originalExpr, expectedExpr) <- exprs) { | |
assertEquivalent(originalExpr, expectedExpr) | |
} | |
// check evaluation | |
val binaryBooleanValues = Seq(true, false) | |
val ternaryBooleanValues = Seq(true, false, null) | |
for (condVal <- binaryBooleanValues; | |
condNullableVal <- ternaryBooleanValues; | |
aVal <- ternaryBooleanValues; | |
bVal <- ternaryBooleanValues; | |
(originalExpr, expectedExpr) <- exprs) { | |
val inputRow = create_row(condVal, condNullableVal, aVal, bVal) | |
val optimizedVal = evaluateWithoutCodegen(expectedExpr, inputRow) | |
checkEvaluation(originalExpr, optimizedVal, inputRow) | |
} | |
} |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133462 has finished for PR 30898 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
This pr simplify
CaseWhen
clauses with (true and false) and (false and true):Why are the changes needed?
Improve query performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.