Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Avoid cloning as many Ident during SQL planning #4534

Merged
merged 3 commits into from
Dec 12, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 6, 2022

Draft as it builds on #4530

Which issue does this PR close?

N/A

Rationale for this change

I noticed a bunch of redundant copying while working on #4530 but wanted to keep that PR smaller

What changes are included in this PR?

Remove redundant cloning

Are these changes tested?

covered by existing tests

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Dec 6, 2022
// Normalize an identifier to a lowercase string unless the identifier is quoted.
pub(crate) fn normalize_ident(id: &Ident) -> String {
match id.quote_style {
Some(_) => id.value.clone(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the value is always cloned which is not necessary for most uses when we already have an owned string

@alamb alamb force-pushed the alamb/remove_normalized_indent branch from d4f1c9c to 895b4cd Compare December 7, 2022 19:13
@github-actions github-actions bot removed sqllogictest SQL Logic Tests (.slt) logical-expr Logical plan and expressions core Core DataFusion crate labels Dec 7, 2022
@@ -446,7 +444,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {

for cte in with.cte_tables {
// A `WITH` block can't use the same name more than once
let cte_name = normalize_ident(&cte.alias.name);
let cte_name = normalize_ident(cte.alias.name.clone());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously normalize_indent always cloned. Now it only clones in a few places and most of the time can reuse the String in the sqlparser-ast directly

@@ -661,7 +659,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
.iter()
.any(|x| x.option == ColumnOption::Null);
fields.push(Field::new(
&normalize_ident(&column.name),
&normalize_ident(column.name),
Copy link
Contributor Author

@alamb alamb Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate to simply drop the String immediately, but Field::new requires a &str (it can't take the String). Filed upstream: https://github.com/apache/arrow-rs/pull/3288/files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, Look like we can do improvement like above SubqueryAlias::try_new().

@alamb alamb changed the title Minor: Avoid cloning some Ident as much during planning Minor: Avoid cloning as many Ident during SQL planning Dec 7, 2022
@alamb alamb marked this pull request as ready for review December 7, 2022 19:36
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Dec 11, 2022
Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackwener I wonder if you have time or interest to review this PR?

)
}
}

fn apply_expr_alias(plan: LogicalPlan, idents: &Vec<Ident>) -> Result<LogicalPlan> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function took a reference but the caller immediately drops the actual Vec. This PR reuses it

pub fn try_new(plan: LogicalPlan, alias: &str) -> datafusion_common::Result<Self> {
pub fn try_new(
plan: LogicalPlan,
alias: impl Into<String>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows SubqueryAlias::try_new to take a String if the caller has one or a &str that will be copied into a new String if needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation👍, copy just when call &str.

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I review them carefully, It make sense to me.
Thanks @alamb.

pub fn try_new(plan: LogicalPlan, alias: &str) -> datafusion_common::Result<Self> {
pub fn try_new(
plan: LogicalPlan,
alias: impl Into<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation👍, copy just when call &str.

@jackwener
Copy link
Member

BTW, I think we can also avoid clone in some method in LogicalPlanBuilder().
We can it in followup PR.

@alamb
Copy link
Contributor Author

alamb commented Dec 12, 2022

Thank you for the review @jackwener

@alamb alamb merged commit 2457ce4 into apache:master Dec 12, 2022
@alamb alamb deleted the alamb/remove_normalized_indent branch December 12, 2022 21:15
@ursabot
Copy link

ursabot commented Dec 13, 2022

Benchmark runs are scheduled for baseline = 4ecf3e7 and contender = 2457ce4. 2457ce4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants