-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for SQL Natural Join #4863
Conversation
datafusion/sql/src/relation/join.rs
Outdated
Err(DataFusionError::NotImplemented( | ||
"NATURAL JOIN is not supported (https://issues.apache.org/jira/browse/ARROW-10727)".to_string(), | ||
)) | ||
// TODO: what if no common join cols? -> becomes cross join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: what if no common join cols? -> becomes cross join |
Yes, we should convert it to cross-join.
like pg:
-- create
CREATE TABLE a (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
dept TEXT NOT NULL
);
CREATE TABLE b (
id1 INTEGER PRIMARY KEY,
name1 TEXT NOT NULL,
dept1 TEXT NOT NULL
);
-- insert
INSERT INTO a VALUES (0001, 'Clark', 'Sales');
INSERT INTO b VALUES (0001, 'Clark', 'Sales');
INSERT INTO a VALUES (0002, 'Ava', 'Sales');
INSERT INTO b VALUES (0002, 'Ava', 'Sales');
-- fetch
SELECT * FROM a natural join b;
id | name | dept | id1 | name1 | dept1
----+-------+-------+-----+-------+-------
1 | Clark | Sales | 1 | Clark | Sales
1 | Clark | Sales | 2 | Ava | Sales
2 | Ava | Sales | 1 | Clark | Sales
2 | Ava | Sales | 2 | Ava | Sales
(4 rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think that's been implemented, just forgot to remove the TODOs
let keys: Vec<Column> = right | ||
.schema() | ||
.fields() | ||
.iter() | ||
.map(|f| f.field().name()) | ||
.filter(|f| left_cols.contains(f)) | ||
.map(Column::from_name) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please notice qualifier
. Field can own a qualifier
and also don't own a qualifier
For example,
t1: |t1.a|t1.b|c| (we can meet c
instead of t1.c
)
We also need add test to consider it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow, are you suggesting to consider the qualifier when checking for common columns? My understanding is that when joining two different tables they should have different qualifiers anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow, are you suggesting to consider the qualifier when checking for common columns? My understanding is that when joining two different tables they should have different qualifiers anyway.
Look like current implementation is ok. field.name()
don't use qualifier, so this Problem don't exist
@@ -395,6 +395,54 @@ fn join_with_ambiguous_column() { | |||
quick_test(sql, expected); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great to me, thanks @Jefffrey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks @Jefffrey and @JasonLi-cn
Benchmark runs are scheduled for baseline = 1eb46df and contender = 556282a. 556282a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #117
Rationale for this change
What changes are included in this PR?
Support for planning sql natural join, which simply gets turned into a USING join based on the common columns (checked via name only), and if no common columns found then becomes a cross join.
Are these changes tested?
Added to sql integration_test.rs
Are there any user-facing changes?