-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement substrait for LIKE/ILIKE expr #6840
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
@@ -1120,6 +1151,70 @@ fn make_substrait_window_function( | |||
} | |||
} | |||
|
|||
#[allow(deprecated)] | |||
fn make_substrait_like_expr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can combine Expr::Like
and Expr::ILike
by adding a field ignore_case
? cc @alamb
if negated { | ||
let function_anchor = _register_function("not".to_string(), extension_info); | ||
|
||
Ok(Expression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is very common. I'm thinking of wrapping it into something like LogicalPlanBuilder
@@ -1329,3 +1272,66 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> { | |||
)) | |||
} | |||
} | |||
|
|||
async fn make_datafusion_like( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time to refactor these large producer.rs
and consumer.rs
into several small files and organize them by functionality (e.g., produce like
and consume like
)
IIRC I've ran into this issue before as well. I think we should just remove that field and bail out during the SQL->Logical lowering if the SQL parser encounters that (because it seems the logical expr. is modeled after SQL and the physical after whats possible in arrow at the moment). Or we decide that this is a feature we actually want and fix the logical->physical lowering.
Yes please. |
Signed-off-by: Ruihang Xia <[email protected]>
This is also a solution. I have a little background about this feature. But this is supported in PG (doc), so maybe we are going to implement it in the future?
|
@nseekhao fyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from a few minor comments, everything LGTM!
pattern, | ||
*escape_char, | ||
schema, | ||
col_ref_offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For future improvement. Not completely related to this PR]
I think having to carry around col_ref_offset
for any expression-related functions unnecessarily overcrowds the code. Once we have SubqueryAlias
support implemented, this should not be necessary anymore. I'll refactor the code when that happens.
Co-authored-by: Nuttiiya Seekhao <[email protected]>
Co-authored-by: Nuttiiya Seekhao <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
let mut args: Vec<Expr> = vec![]; | ||
for arg in f.arguments.iter() { | ||
ScalarFunctionType::Not => { | ||
let arg = f.arguments.first().ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that f.arguments.len() == 1
is true, since first().ok_or_else()
would only give us an error if the vector is empty? Or do we not care if there's more than one argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. In the previous code (like deserializing ReadRel) we don't care about extra information from the proto. I.e., we only care about if we can get the data necessary to construct our plan, and just ignore other extra things. But on the other hand, redundant things sometimes indicate an error or unexpected behavior (like finding a remaining screw after recovering something 🫣).
return Err(DataFusionError::NotImplemented( | ||
format!("Invalid arguments type for `{}` expr", fn_name) | ||
)) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if
format!("Invalid arguments type for `{}` expr", fn_name)
or
format!("Invalid arguments type for `{fn_name}` expr")
is preferred in Rust, but maybe we should choose one to be consistent here? It may be confusing to the reader if different syntaxes are used to implement the same semantic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay 👍
BTW, clippy has a lint about this style uninlined_format_args
. (but maybe it's unnecessary to enable it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, clippy has a lint about this style uninlined_format_args. (but maybe it's unnecessary to enable it?
Oh thanks for the reference! Inlining the args make sense.
Signed-off-by: Ruihang Xia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
Which issue does this PR close?
Closes #6731.
Rationale for this change
Support more expressions in substrait
What changes are included in this PR?
Implement
Expr::Like
andExpr::ILike
for datafusion-substrait. This patch doesn't use this definition as ourExpr::Like
has one more fieldescape_char
(but it's not used/supported in the physical plan lol).Are these changes tested?
Yes
Are there any user-facing changes?