Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type coercion from Dictionary to string for regular expressions #5154

Closed
stuartcarnie opened this issue Feb 1, 2023 · 0 comments · Fixed by #5152
Closed

Add type coercion from Dictionary to string for regular expressions #5154

stuartcarnie opened this issue Feb 1, 2023 · 0 comments · Fixed by #5152
Labels
enhancement New feature or request

Comments

@stuartcarnie
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

IOx uses the Dictionary(Int32, Utf8) data type to represent tag columns. Using regular expressions to filter on tag columns is confusing to users, as certain queries succeed and others fail, the reasons of which may not be obvious to users. Using EXPLAIN reveals the reason regular expressions, such as the following succeed:

SELECT usage_idle FROM cpu WHERE cpu ~ '9'

In this case, the optimiser changes the filter plan from a regex conditional to LIKE '%9%', and the LIKE operator has additional code to coerce dictionary types:

https://github.com/apache/arrow-datafusion/blob/031534d94efb305eb26a7c16fd7e06ae3bcd88bb/datafusion/expr/src/type_coercion/binary.rs#L522

however, other cases unexpectedly fail:

SELECT usage_idle FROM cpu WHERE cpu ~ '9$'

as the optimiser does not rewrite this to a LIKE expression1, and regular expression operators use a string coercion rule:

https://github.com/apache/arrow-datafusion/blob/b6dbb8d8b896861d23dcc17a8a4b3e0e4276db7e/datafusion/expr/src/type_coercion/binary.rs#L141-L144

Describe the solution you'd like

Teach DataFusion how to coerce compatible Dictionary(_, _) types to a string type, such that the regular expression condition can succeed.

Describe alternatives you've considered

No alternatives considered, given there is prior art for the LIKE operator, which also coerces Dictionary(_, _) types to strings:

https://github.com/apache/arrow-datafusion/blob/031534d94efb305eb26a7c16fd7e06ae3bcd88bb/datafusion/expr/src/type_coercion/binary.rs#L522

Additional context
Add any other context or screenshots about the feature request here.

Footnotes

  1. Incidentally, the optimiser could also rewrite this to LIKE '%9'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant