-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose sql_to_statement
and statement_to_plan
on SessionState
#4958
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this API makes a lot of sense.
cc @tustvold who I think had some thoughts in this area, but I believe this PR keeps the same spirit of encapsulation.
use crate::catalog::information_schema::INFORMATION_SCHEMA_TABLES; | ||
use datafusion_sql::parser::Statement as DFStatement; | ||
use sqlparser::ast::*; | ||
use std::collections::hash_map::Entry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These use
statements seem out of place 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed? They just "came along for the ride" following the "do no harm" principle. That's not a great excuse though.
I think they are in the function because this file was becoming very long, and I suspect collisions with parser::Statement
and ast::Statement
and probably others was becoming untenable so at some point someone just said "I'll keep them local to this function as to avoid having to deal with all that".
I suspect the ultimate solution would be one or more of:
- Split this file up into multiple more reasonably sized ones
- Rename things such that it is easier to avoid collisions, or keep namespace names short and fully qualify them
sql_to_statement
and statement_to_plan
on SessionContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
sql_to_statement
and statement_to_plan
on SessionContext
sql_to_statement
and statement_to_plan
on SessionState
pub async fn create_logical_plan(&self, sql: &str) -> Result<LogicalPlan> { | ||
let statement = self.sql_to_statement(sql)?; | ||
let plan = self.statement_to_plan(statement).await?; | ||
Ok(plan) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be too late by now, but I have a slightly different proposal for this issue (thanks for addressing it btw!).
Namely, while this solution handles the stated case well it doesn't address the general problem of SqlToRel
now being hard/impossible to instantiate properly (with the new trait bound for ContextProvider
, which is only implemented for a private struct, and the entire information_schema
module also being private further complicating a custom implementation of ContextProvider
). This in turn forbids usage of other useful public SqlToRel
methods, such as sql_to_expr
.
My proposal would be to simply rename statement_to_plan
to statement_to_planner
, and then return SqlToRel::new(&provider)
at line 1740. Subsequently, to address the original issue just revise this part:
pub async fn create_logical_plan(&self, sql: &str) -> Result<LogicalPlan> { | |
let statement = self.sql_to_statement(sql)?; | |
let plan = self.statement_to_plan(statement).await?; | |
Ok(plan) | |
} | |
pub async fn create_logical_plan(&self, sql: &str) -> Result<LogicalPlan> { | |
let statement = self.sql_to_statement(sql)?; | |
let planner = self.statement_to_planner(statement).await?; | |
let plan = planner.statement_to_plan(statement); | |
Ok(plan) | |
} |
while anyone else can then get easy access to the planner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gruuya your suggestion makes sense and I'd be amenable to merging a PR containing the changes you described, but in the interest of expedience, given the approvals on this PR, I'd like to merge it as-is.
If you file an issue, I can take a look next time I'm free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namely, while this solution handles the stated case well it doesn't address the general problem of SqlToRel now being hard/impossible to instantiate properly (with the new trait bound for ContextProvider, which is only implemented for a private struct, and the entire information_schema module also being private further complicating a custom implementation of ContextProvider). This in turn forbids usage of other useful public SqlToRel methods, such as sql_to_expr.
@gruuya I don't think we have contemplated using SqlToRel
directly in the past. If that is a usecase that is important, what I found works well is to make an example demonstrating what you are doing. Here are some examples:
- https://github.com/apache/arrow-datafusion/blob/master/datafusion-examples/examples/rewrite_expr.rs
- https://github.com/apache/arrow-datafusion/blob/master/datafusion-examples/examples/custom_datasource.rs
An example has several advantages:
- It ensures your usecase is clearly documented so people who are changing DataFusion know it is used and don't break it accidentally
- It then gives you an example to start from if the interface changes (you can look at the example to see how the interface may have changed)
- It helps others who might have the same need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point; let me see if I can make a self-contained example for the usecase we have and then try my luck with a PR.
Namely, we use sql_to_expr
to convert the WHERE
clauses in UPDATE
and DELETE
statements from sqlparser Expr
s into Datafusion Expr
s during logical planning. Subsequently, during physical planning of our custom logical extension nodes for UPDATE
and DELETE
we employ these expressions in order to achieve the desired action using available Datafusion primitives (pruning, scanning, filtering, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In parallel you may be interested in some work @avantgardnerio is working on here #4902 and likely elsewhere to add better INSERT/UPDATE/DELETE support directly to DataFusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting; will check it out thanks!
Benchmark runs are scheduled for baseline = 7062c2e and contender = 2fdc7b8. 2fdc7b8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4957.
Rationale for this change
Described in issue.
What changes are included in this PR?
Splitting planning and parsing.
Are these changes tested?
Transitively, yes.
Are there any user-facing changes?
New API methods, no breaking changes.