-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed. #32032
Changes from 13 commits
b6b5304
b98c15c
43f70b2
12fdbe9
e6e9061
ffceb11
acb74a1
b78cfdb
5fda1f1
594981a
0ab3653
7823044
445e1f9
a0a7297
111ef8b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3370,10 +3370,11 @@ class Dataset[T] private[sql]( | |
comment = None, | ||
properties = Map.empty, | ||
originalText = None, | ||
child = logicalPlan, | ||
plan = logicalPlan, | ||
allowExisting = false, | ||
replace = replace, | ||
viewType = viewType) | ||
viewType = viewType, | ||
isAnalyzed = true) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since |
||
} | ||
|
||
/** | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,7 +29,7 @@ import org.apache.spark.sql.catalyst.analysis.{GlobalTempView, LocalTempView, Pe | |
import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType, SessionCatalog, TemporaryViewRelation} | ||
import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, SubqueryExpression, UserDefinedExpression} | ||
import org.apache.spark.sql.catalyst.plans.QueryPlan | ||
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, View} | ||
import org.apache.spark.sql.catalyst.plans.logical.{AnalysisOnlyCommand, LogicalPlan, Project, View} | ||
import org.apache.spark.sql.catalyst.util.CharVarcharUtils | ||
import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.NamespaceHelper | ||
import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf} | ||
|
@@ -48,29 +48,39 @@ import org.apache.spark.sql.util.SchemaUtils | |
* @param properties the properties of this view. | ||
* @param originalText the original SQL text of this view, can be None if this view is created via | ||
* Dataset API. | ||
* @param child the logical plan that represents the view; this is used to generate the logical | ||
* plan for temporary view and the view schema. | ||
* @param plan the logical plan that represents the view; this is used to generate the logical | ||
* plan for temporary view and the view schema. | ||
* @param allowExisting if true, and if the view already exists, noop; if false, and if the view | ||
* already exists, throws analysis exception. | ||
* @param replace if true, and if the view already exists, updates it; if false, and if the view | ||
* already exists, throws analysis exception. | ||
* @param viewType the expected view type to be created with this command. | ||
* @param isAnalyzed whether this command is analyzed or not. | ||
*/ | ||
case class CreateViewCommand( | ||
name: TableIdentifier, | ||
userSpecifiedColumns: Seq[(String, Option[String])], | ||
comment: Option[String], | ||
properties: Map[String, String], | ||
originalText: Option[String], | ||
child: LogicalPlan, | ||
plan: LogicalPlan, | ||
allowExisting: Boolean, | ||
replace: Boolean, | ||
viewType: ViewType) | ||
extends LeafRunnableCommand { | ||
viewType: ViewType, | ||
isAnalyzed: Boolean = false) extends RunnableCommand with AnalysisOnlyCommand { | ||
|
||
import ViewHelper._ | ||
|
||
override def innerChildren: Seq[QueryPlan[_]] = Seq(child) | ||
override protected def withNewChildrenInternal( | ||
newChildren: IndexedSeq[LogicalPlan]): CreateViewCommand = | ||
copy(plan = newChildren.head) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
|
||
override def innerChildren: Seq[QueryPlan[_]] = Seq(plan) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shall we only include it as inner children only when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cloud-fan which UI are you referring to? If you are referring to the Spark UI, I see the following and it shows the same even if I use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about the "optimized plan" in the EXPLAIN result? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see the following for the both cases. Were you expecting something different?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. interesting. I thought there will be problems if a plan is in both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, the reason is that when explain runs, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but the "Parsed Logical Plan" should be unresolved and the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Parsed Logical Plan" is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah now I get it. If there is a place that creates I think it's safer to avoid that to be future-proof. e.g.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I will do a follow up PR. |
||
|
||
// `plan` needs to be analyzed, but shouldn't be optimized so that caching works correctly. | ||
override def childrenToAnalyze: Seq[LogicalPlan] = plan :: Nil | ||
|
||
def markAsAnalyzed(): LogicalPlan = copy(isAnalyzed = true) | ||
|
||
if (viewType == PersistedView) { | ||
require(originalText.isDefined, "'originalText' must be provided to create permanent view") | ||
|
@@ -96,10 +106,10 @@ case class CreateViewCommand( | |
} | ||
|
||
override def run(sparkSession: SparkSession): Seq[Row] = { | ||
// If the plan cannot be analyzed, throw an exception and don't proceed. | ||
val qe = sparkSession.sessionState.executePlan(child) | ||
qe.assertAnalyzed() | ||
val analyzedPlan = qe.analyzed | ||
if (!isAnalyzed) { | ||
throw new AnalysisException("The logical plan that represents the view is not analyzed.") | ||
} | ||
val analyzedPlan = plan | ||
|
||
if (userSpecifiedColumns.nonEmpty && | ||
userSpecifiedColumns.length != analyzedPlan.output.length) { | ||
|
@@ -233,12 +243,21 @@ case class CreateViewCommand( | |
case class AlterViewAsCommand( | ||
name: TableIdentifier, | ||
originalText: String, | ||
query: LogicalPlan) extends LeafRunnableCommand { | ||
query: LogicalPlan, | ||
isAnalyzed: Boolean = false) extends RunnableCommand with AnalysisOnlyCommand { | ||
|
||
import ViewHelper._ | ||
|
||
override protected def withNewChildrenInternal( | ||
newChildren: IndexedSeq[LogicalPlan]): AlterViewAsCommand = | ||
copy(query = newChildren.head) | ||
|
||
override def innerChildren: Seq[QueryPlan[_]] = Seq(query) | ||
|
||
override def childrenToAnalyze: Seq[LogicalPlan] = query :: Nil | ||
|
||
def markAsAnalyzed(): LogicalPlan = copy(isAnalyzed = true) | ||
|
||
override def run(session: SparkSession): Seq[Row] = { | ||
if (session.sessionState.catalog.isTempView(name)) { | ||
alterTemporaryView(session, query) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -94,19 +94,19 @@ case class CacheTableAsSelectExec( | |
override lazy val relationName: String = tempViewName | ||
|
||
override lazy val planToCache: LogicalPlan = { | ||
Dataset.ofRows(sparkSession, | ||
CreateViewCommand( | ||
name = TableIdentifier(tempViewName), | ||
userSpecifiedColumns = Nil, | ||
comment = None, | ||
properties = Map.empty, | ||
originalText = Some(originalText), | ||
child = query, | ||
allowExisting = false, | ||
replace = false, | ||
viewType = LocalTempView | ||
) | ||
) | ||
CreateViewCommand( | ||
name = TableIdentifier(tempViewName), | ||
userSpecifiedColumns = Nil, | ||
comment = None, | ||
properties = Map.empty, | ||
originalText = Some(originalText), | ||
plan = query, | ||
allowExisting = false, | ||
replace = false, | ||
viewType = LocalTempView, | ||
isAnalyzed = true | ||
).run(sparkSession) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can just call |
||
|
||
dataFrameForCachedPlan.logicalPlan | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that
newChildren
is empty? Probably safer to addif (isAnalyzed) ... else ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be called only when there exist children. Maybe an assert is better
assert(!isAnalyzed)
? WDYT?spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Lines 345 to 350 in e40fce9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert SGTM