Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the schema checking when creating CrossJoinExec #4432

Merged

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Nov 30, 2022

Signed-off-by: remzi [email protected]

Which issue does this PR close?

Closes #4431 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Yes, rename CrossJoinExec::try_new to CrossJoinExec::new and update the return type to Self.

Signed-off-by: remzi <[email protected]>
@github-actions github-actions bot added the core Core DataFusion crate label Nov 30, 2022
Signed-off-by: remzi <[email protected]>
Comment on lines +67 to 75
let all_columns = {
let left_schema = left.schema();
let right_schema = right.schema();
let left_fields = left_schema.fields().iter();
let right_fields = right_schema.fields().iter();
left_fields.chain(right_fields).cloned().collect()
};

let schema = Arc::new(Schema::new(all_columns));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use Schema::merge here instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do something like this. This would preserve schema metadata as well.

        let input_schemas = vec![left.schema().as_ref().clone(), right.schema().as_ref().clone()];
        let schema = Arc::new(Schema::try_merge(input_schemas)?);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm afraid try_merge doesn't fit this context.
What we want here is to concatenate schema, but not merge schema. It could happen that a table cross joins with itself, or two tables have same named columns, in which cases, merge doesn't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the metadata, we should also do concatenation, instead of merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andygrove, what's your suggestion?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes sense to me -- thank you @HaoYang670

@alamb
Copy link
Contributor

alamb commented Dec 9, 2022

As I think all comments have been addressed, I am merging it in

@alamb alamb merged commit 03a6a9f into apache:master Dec 9, 2022
@HaoYang670 HaoYang670 deleted the 4413_remove_schema_checking_cross_join branch December 9, 2022 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the schema checking from CrossJoinExec::try_new
3 participants