Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: from_plan shouldn't use original schema #6595

Merged
merged 1 commit into from
Jun 15, 2023
Merged

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Jun 8, 2023

Which issue does this PR close?

part of #6596.
Closes #6613

Rationale for this change

What changes are included in this PR?

In original code, project call from_plan, it will keep original schema even if expression are different.
It's wrong! Because different expression will have different schema. So I correct it.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jun 8, 2023
@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jun 8, 2023
@jackwener jackwener changed the title expression return wrong expression fix: simplify expression sometimes need to convert type. Jun 8, 2023
Comment on lines -733 to -739
LogicalPlan::Projection(Projection { schema, .. }) => {
Ok(LogicalPlan::Projection(Projection::try_new_with_schema(
expr.to_vec(),
Arc::new(inputs[0].clone()),
schema.clone(),
)?))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here exist a bug, we shouldn't use original schema, because it can be changed.
Original code hidden some BUG.

@jackwener jackwener changed the title fix: simplify expression sometimes need to convert type. fix: from_plan shouldn't use original schema & simplify expression need to convert type. Jun 8, 2023
@jackwener jackwener force-pushed the expr branch 2 times, most recently from bd31450 to 0276537 Compare June 10, 2023 06:04
@jackwener jackwener marked this pull request as ready for review June 10, 2023 06:04
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jackwener -- this PR is definitely heading in the right direction. Fixing the projection schema is definitely exposing a bunch of bugs.

There are a few things that don't look right in the PR that I commented on.

Thanks again for trying to make things better. It is really apprecaited

@@ -384,8 +384,7 @@ impl DFSchema {
let self_fields = self.fields().iter();
let other_fields = other.fields().iter();
self_fields.zip(other_fields).all(|(f1, f2)| {
f1.qualifier() == f2.qualifier()
&& f1.name() == f2.name()
f1.qualified_name() == f2.qualified_name()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it matters, but I believe that qualified_name creates a new String where the previous version avoids that allocation.

Copy link
Member Author

@jackwener jackwener Jun 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this, I have a new idea on weekend. we may need handle alias schema() to specify the schema.

Because alias('t1.a'), field is qualifier: none, name: t1.a, we hope field is qualifier: t1, name: a.

I will do it in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related issue #6681

@@ -110,10 +111,10 @@ select array_position(['h', 'e', 'l', 'l', 'o'], 'l', 4), array_position([1, 2,
4 5 2

# array_positions scalar function
query III
query error DataFusion error: SQL error: ParserError\("Expected an SQL statement, found: caused"\)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these errors look not quite right -- I think there is something wrong with sqllogictest --complete with multi-line errors 🤔

@jackwener jackwener marked this pull request as draft June 12, 2023 09:07
@jackwener jackwener changed the title fix: from_plan shouldn't use original schema & simplify expression need to convert type. fix: from_plan shouldn't use original schema Jun 13, 2023
@github-actions github-actions bot removed the optimizer Optimizer rules label Jun 13, 2023
@jackwener jackwener marked this pull request as ready for review June 13, 2023 08:05
@jackwener jackwener requested a review from alamb June 13, 2023 08:05
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jackwener this looks like a nice improvement to me

However, it does seem to introduce a regression in some of the array functions recently introduced by @izveigor . The #6596 (comment) comment suggests to me we need to support NULL in array, so it is a known issue but it might be good to get @izveigor 's opinion

@@ -512,15 +512,22 @@ async fn test_regex_expressions() -> Result<()> {

#[tokio::test]
async fn test_cast_expressions() -> Result<()> {
test_expression!("CAST('0' AS INT)", "0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it to slt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a good follow on PR

Seems like the hope to was to move it to slt as part of #6210 but wasn't completed 🤔

@izveigor
Copy link
Contributor

I created PR about the 'nulls' problem: #6662. So I think it can solve the regression.

@jackwener
Copy link
Member Author

jackwener commented Jun 14, 2023

I prepare to merge this PR in tomorrow unless there are other comments

I will continue doing more job.

@jackwener jackwener merged commit 36123ee into apache:main Jun 15, 2023
@jackwener jackwener deleted the expr branch June 15, 2023 09:58
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit to jackwener/arrow-datafusion that referenced this pull request Jul 3, 2023
jackwener added a commit that referenced this pull request Jul 4, 2023
* revert: from_plan keep same schema Project in #6595

* revert: from_plan keep same schema Agg/Window in #6820

* revert type coercion

* add comment
2010YOUY01 pushed a commit to 2010YOUY01/arrow-datafusion that referenced this pull request Jul 5, 2023
* revert: from_plan keep same schema Project in apache#6595

* revert: from_plan keep same schema Agg/Window in apache#6820

* revert type coercion

* add comment
yukkit pushed a commit to cnosdb/arrow-datafusion that referenced this pull request Jul 10, 2023
* revert: from_plan keep same schema Project in apache#6595

* revert: from_plan keep same schema Agg/Window in apache#6820

* revert type coercion

* add comment
jayzhan211 added a commit to jayzhan211/datafusion that referenced this pull request Jul 13, 2023
alamb pushed a commit that referenced this pull request Jul 16, 2023
* revert array.slt that changed by #6595

Signed-off-by: jayzhan211 <[email protected]>

* add test for to string

Signed-off-by: jayzhan211 <[email protected]>

* first draft

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

from_plan shouldn't create projection by using original schema
4 participants