-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataframe join_on method #5210
Dataframe join_on method #5210
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,7 +22,10 @@ use crate::expr_rewriter::{ | |
normalize_cols, rewrite_sort_cols_by_aggs, | ||
}; | ||
use crate::type_coercion::binary::comparison_coercion; | ||
use crate::utils::{columnize_expr, compare_sort_expr, exprlist_to_fields, from_plan}; | ||
use crate::utils::{ | ||
columnize_expr, compare_sort_expr, ensure_any_column_reference_is_unambiguous, | ||
exprlist_to_fields, from_plan, | ||
}; | ||
use crate::{and, binary_expr, Operator}; | ||
use crate::{ | ||
logical_plan::{ | ||
|
@@ -502,6 +505,25 @@ impl LogicalPlanBuilder { | |
)); | ||
} | ||
|
||
let filter = if let Some(expr) = filter { | ||
// ambiguous check | ||
ensure_any_column_reference_is_unambiguous( | ||
&expr, | ||
&[self.schema(), right.schema()], | ||
)?; | ||
|
||
// normalize all columns in expression | ||
let using_columns = expr.to_columns()?; | ||
let filter = normalize_col_with_schemas( | ||
expr, | ||
&[self.schema(), right.schema()], | ||
&[using_columns], | ||
)?; | ||
Some(filter) | ||
} else { | ||
None | ||
}; | ||
|
||
Comment on lines
+508
to
+526
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. related to #4196 fix bug where you could do dataframe join with ambiguous column for the filter expr instead of having the check done in both DataFrame join api and SQL planner join mod, unify by having check done inside the logical plan builder this is technically an unrelated fix to the actual issue, so i can extract into separate issue if needed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is fine to include in this PR as long as it also has a test (for ambiguity check using the DataFrame API) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. test added |
||
let (left_keys, right_keys): (Vec<Result<Column>>, Vec<Result<Column>>) = | ||
join_keys | ||
.0 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,7 @@ execution. The plan is evaluated (executed) when an action method is invoked, su | |
| filter | Filter a DataFrame to only include rows that match the specified filter expression. | | ||
| intersect | Calculate the intersection of two DataFrames. The two DataFrames must have exactly the same schema | | ||
| join | Join this DataFrame with another DataFrame using the specified columns as join keys. | | ||
| join_on | Join this DataFrame with another DataFrame using arbitrary expressions. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❤️ |
||
| limit | Limit the number of rows returned from this DataFrame. | | ||
| repartition | Repartition a DataFrame based on a logical partitioning scheme. | | ||
| sort | Sort the DataFrame by the specified sorting expressions. Any expression can be turned into a sort expression by calling its `sort` method. | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM