-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand #31933
[SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand #31933
Conversation
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala
Show resolved
Hide resolved
Test build #136359 has finished for PR 31933 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #136368 has finished for PR 31933 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
allowExisting: Boolean, | ||
replace: Boolean, | ||
viewType: ViewType) | ||
extends RunnableCommand { | ||
|
||
import ViewHelper._ | ||
|
||
override def innerChildren: Seq[QueryPlan[_]] = Seq(child) | ||
override def plansToCheckAnalysis: Seq[LogicalPlan] = Seq(analyzedPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to run checkAnalysis on the analyzed plan, otherwise, for the following:
sql("CREATE TABLE view_base_table (key int, data varchar(20)) USING PARQUET")
sql("CREATE VIEW key_dependent_view AS SELECT * FROM view_base_table GROUP BY key")
, view creation works fine, whereas it should have failed with:
org.apache.spark.sql.AnalysisException
expression 'spark_catalog.default.view_base_table.data' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.
@@ -167,6 +167,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { | |||
case _: ShowTableExtended => | |||
throw new AnalysisException("SHOW TABLE EXTENDED is not supported for v2 tables.") | |||
|
|||
case c: Command => | |||
c.plansToCheckAnalysis.foreach(checkAnalysis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems hacky? But we cannot make the analyzed plan as children
for CreateViewCommaned
. The reason is that the View
will be optimized away (in the optimizer), and the verification that checks if a permanent view references temp views will fail.
@@ -28,6 +28,7 @@ trait Command extends LogicalPlan { | |||
override def output: Seq[Attribute] = Seq.empty | |||
override def producedAttributes: AttributeSet = outputSet | |||
override def children: Seq[LogicalPlan] = Seq.empty | |||
def plansToCheckAnalysis: Seq[LogicalPlan] = Seq.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse innerChildren
instead of adding this?
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136379 has finished for PR 31933 at commit
|
Test build #136418 has started for PR 31933 at commit |
|
||
// Check analysis on internal nodes. | ||
plan match { | ||
case c: Command => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move it to L682?
Test build #136471 has finished for PR 31933 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #136486 has finished for PR 31933 at commit
|
…rom_create_temp_view
Kubernetes integration test starting |
Kubernetes integration test status failure |
thanks, merging to master! |
Test build #136496 has finished for PR 31933 at commit
|
@@ -94,14 +94,21 @@ case class CacheTableAsSelectExec( | |||
override lazy val relationName: String = tempViewName | |||
|
|||
override lazy val planToCache: LogicalPlan = { | |||
// CacheTableAsSelectExec.query is not resolved yet (e.g., not a child of CacheTableAsSelect) | |||
// in order to skip optimizing it; note that we need to pass an analyzed plan to | |||
// CreateViewCommand for the cache to work correctly. Thus, the query is analyzed below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we further clean this up by creating CreateViewStatement
instead of CreateViewCommand
below?
// If the plan cannot be analyzed, throw an exception and don't proceed. | ||
val qe = sparkSession.sessionState.executePlan(child) | ||
qe.assertAnalyzed() | ||
val analyzedPlan = qe.analyzed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is a regression. CreateViewCommand
can be created from CreateViewStatement
in ResolveSessionCatalog
. However, ResolveSessionCatalog
is inside extendedResolutionRules
, which means the analyzedPlan
doesn't go through postHocResolutionRules
.
I think we need to make CreateViewCommand.analyzedPlan
a real child, so that it can go through all the analyzer rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. One issue with "make CreateViewCommand.analyzedPlan a real child" is that it will become an optimized plan, which will affect the caching. I can try to make certain command's children to skip optimizer. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great to have such a mechanism to skip optimizer, useful to CacheTableAsSelect
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I created #32032 to handle this scenario. Please let me know what you think. TIA!
I've reverted it due to the regression mentioned in #31933 (comment) |
…dren to be removed once the command is marked as analyzed ### What changes were proposed in this pull request? This PR proposes to introduce the `AnalysisOnlyCommand` trait such that a command that extends this trait can have its children only analyzed, but not optimized. There is a corresponding analysis rule `HandleAnalysisOnlyCommand` that marks the command as analyzed after all other analysis rules are run. This can be useful if a logical plan has children where they need to be only analyzed, but not optimized - e.g., `CREATE VIEW` or `CACHE TABLE AS`. This also addresses the issue found in #31933. This PR also updates `CreateViewCommand`, `CacheTableAsSelect`, and `AlterViewAsCommand` to use the new trait / rule such that their children are only analyzed. ### Why are the changes needed? To address the issue where the plan is unnecessarily re-analyzed in `CreateViewCommand`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests should cover the changes. Closes #32032 from imback82/skip_transform. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR proposes to remove re-analyzing the already analyzed plan for
CreateViewCommand
as discussed https://github.com/apache/spark/pull/31273/files#r581592786.Why are the changes needed?
No need to analyze the plan if it's already analyzed.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests should cover this.