Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple ways to express like / ilike / not like / not ilike #4765

Closed
alamb opened this issue Dec 28, 2022 · 1 comment · Fixed by #4792
Closed

Multiple ways to express like / ilike / not like / not ilike #4765

alamb opened this issue Dec 28, 2022 · 1 comment · Fixed by #4792
Labels
bug Something isn't working good first issue Good for newcomers logical-expr Logical plan and expressions

Comments

@alamb
Copy link
Contributor

alamb commented Dec 28, 2022

Describe the bug
There are two ways to express pattern matching in datafusion Exprs.

Expr::Like(Like)
..

https://github.com/apache/arrow-datafusion/blob/3abbffb5c83395e8e6e68e5529835e72e7769d0d/datafusion/expr/src/operator.rs#L55-L61

As well as

BinaryExpr {
 left: ..
 op: Operator::Like,
 right: ..
}

This causes issues such as the simplification in #4646 only affects the BinaryExpr form, not the Expr::Like form

The Expr::Like form is more full featured (can have pattern substitution)

Expected behavior
I would like to have a single way to represent these operators. I believe it should be Expr::Like, Expr::ILike, Expr::NotLike and Expr::NotILike as they have more features.

Thus, I would like to remove Operator::Like, Operator::ILike, Operator::NotLike, and Operator::NotILike and update all tests / code to use Expr::Like

Additional context
Noticed while reviewing #4646 from @crepererum

I am marking this as "good first issue" as it is mostly a software engineering exercise and would be guided by the compiler -- it would be a good exercise to get familiar with DataFusion's codebase

@alamb alamb added bug Something isn't working good first issue Good for newcomers logical-expr Logical plan and expressions labels Dec 28, 2022
@crepererum
Copy link
Contributor

crepererum commented Jan 2, 2023

Note that the expression variant supports specifying escape characters (as the SQL does) but the binary expression doesn't support options / a third input (its binary after all) and the arrow kernels don't support escape characters at the moment. Currently the physical plan construction fails if an escape character is specified and this behavior should be carried over (if the feature gap in the arrow kernels isn't addressed beforehand).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants