Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true) #30898

Closed
wants to merge 9 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Dec 23, 2020

What changes were proposed in this pull request?

This pr simplify CaseWhenclauses with (true and false) and (false and true):

Expression cond.nullable After simplify
case when cond then true else false end true cond <=> true
case when cond then true else false end false cond
case when cond then false else true end true !(cond <=> true)
case when cond then false else true end false !cond

Why are the changes needed?

Improve query performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Dec 23, 2020
@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37850/

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37850/

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Test build #133252 has finished for PR 30898 at commit 44733f8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# Conflicts:
#	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushFoldableIntoBranchesSuite.scala
#	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
@SparkQA
Copy link

SparkQA commented Dec 24, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37933/

@SparkQA
Copy link

SparkQA commented Dec 24, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37933/

@SparkQA
Copy link

SparkQA commented Dec 24, 2020

Test build #133342 has finished for PR 30898 at commit 15c74ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • static class MergeShuffleFile
  • class ContextAwareIterator[+T](val context: TaskContext, val delegate: Iterator[T])
  • implicit class DslAttr(attr: UnresolvedAttribute) extends ImplicitAttribute

@wangyum wangyum changed the title [SPARK-33884][SQL] Simplify CaseWhen when one clause is null and another is boolean [SPARK-33884][SQL] Smplify conditional if all branches are foldable boolean type Dec 25, 2020
@wangyum wangyum changed the title [SPARK-33884][SQL] Smplify conditional if all branches are foldable boolean type [SPARK-33884][SQL] Simplify conditional if all branches are foldable boolean type Dec 25, 2020
@SparkQA
Copy link

SparkQA commented Dec 25, 2020

Test build #133369 has finished for PR 30898 at commit 91ec8b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 25, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37960/

@SparkQA
Copy link

SparkQA commented Dec 25, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37960/

@maropu
Copy link
Member

maropu commented Dec 25, 2020

Looks fine if the tests pass.

@SparkQA
Copy link

SparkQA commented Dec 25, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37963/

@SparkQA
Copy link

SparkQA commented Dec 25, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37963/

@SparkQA
Copy link

SparkQA commented Dec 25, 2020

Test build #133372 has finished for PR 30898 at commit 871d29f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -475,6 +475,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
case If(TrueLiteral, trueValue, _) => trueValue
case If(FalseLiteral, _, falseValue) => falseValue
case If(Literal(null, _), _, falseValue) => falseValue
case If(_, TrueLiteral, TrueLiteral) => TrueLiteral
case If(_, FalseLiteral, FalseLiteral) => FalseLiteral
case If(cond, TrueLiteral, FalseLiteral) => cond
case If(cond, FalseLiteral, TrueLiteral) => Not(cond)
case If(cond, trueValue, falseValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum, doesn't this cover the case above by this case match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for deterministic expressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditions you added here also should only for deterministic ones then because cond will not be executed, see #21848 (comment) as an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -475,6 +475,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
case If(TrueLiteral, trueValue, _) => trueValue
case If(FalseLiteral, _, falseValue) => falseValue
case If(Literal(null, _), _, falseValue) => falseValue
case If(_, TrueLiteral, TrueLiteral) => TrueLiteral
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips evaluating the cond, which can be wrong if cond is non-deterministic.

@wangyum wangyum changed the title [SPARK-33884][SQL] Simplify conditional if all branches are foldable boolean type [SPARK-33884][SQL] Simplify CaseWhenclauses with (true and false) and (false and true) Dec 28, 2020
@SparkQA
Copy link

SparkQA commented Dec 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38019/

@SparkQA
Copy link

SparkQA commented Dec 28, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38019/

@@ -484,6 +484,9 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
case If(cond, FalseLiteral, l @ Literal(null, _)) if !cond.nullable => And(Not(cond), l)
case If(cond, TrueLiteral, l @ Literal(null, _)) if !cond.nullable => Or(cond, l)

case CaseWhen(Seq((cond, TrueLiteral)), Some(FalseLiteral)) => cond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we already optimize single-branch CASE WHEN to IF and get this optimized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so this follows #21850 (comment) , LGTM

@SparkQA
Copy link

SparkQA commented Dec 28, 2020

Test build #133428 has finished for PR 30898 at commit ae3b284.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -243,4 +243,40 @@ class SimplifyConditionalSuite extends PlanTest with ExpressionEvalHelper with P
Literal.create(null, IntegerType))
}
}

test("SPARK-33884: simplify CaseWhen clauses with (true and false) and (false and true)") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test adapt from

test("SPARK-33845: remove unnecessary if when the outputs are boolean type") {
// verify the boolean equivalence of all transformations involved
val fields = Seq(
'cond.boolean.notNull,
'cond_nullable.boolean,
'a.boolean,
'b.boolean
)
val Seq(cond, cond_nullable, a, b) = fields.zipWithIndex.map { case (f, i) => f.at(i) }
val exprs = Seq(
// actual expressions of the transformations: original -> transformed
If(cond, true, false) -> cond,
If(cond, false, true) -> !cond,
If(cond_nullable, true, false) -> (cond_nullable <=> true),
If(cond_nullable, false, true) -> (!(cond_nullable <=> true)))
// check plans
for ((originalExpr, expectedExpr) <- exprs) {
assertEquivalent(originalExpr, expectedExpr)
}
// check evaluation
val binaryBooleanValues = Seq(true, false)
val ternaryBooleanValues = Seq(true, false, null)
for (condVal <- binaryBooleanValues;
condNullableVal <- ternaryBooleanValues;
aVal <- ternaryBooleanValues;
bVal <- ternaryBooleanValues;
(originalExpr, expectedExpr) <- exprs) {
val inputRow = create_row(condVal, condNullableVal, aVal, bVal)
val optimizedVal = evaluateWithoutCodegen(expectedExpr, inputRow)
checkEvaluation(originalExpr, optimizedVal, inputRow)
}
}

@SparkQA
Copy link

SparkQA commented Dec 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38051/

@SparkQA
Copy link

SparkQA commented Dec 29, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38051/

@SparkQA
Copy link

SparkQA commented Dec 29, 2020

Test build #133462 has finished for PR 30898 at commit d3b072e.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f7bdea3 Dec 29, 2020
@wangyum wangyum deleted the SPARK-33884 branch December 29, 2020 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants