-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supports_filter_pushdown()
receives broken-up filter
s
#5357
Comments
Just for my understanding, what kind of |
yes, it is
Replace "database" with TableProvider and I agree with you there. (We're writing the database, not calling out to an external one)
Only that this is how it happens now. I assume others are attached to the present behavior? It seems way more simple to have the TableProvider split the conjunction (if it wants) than have it re-assemble the original filter from a bunch of separate calls. |
Honestly, I think the fix might just be removing this line: https://github.com/apache/arrow-datafusion/blob/2a5ef432aef45793b60c2847552d6d597589d67d/datafusion/optimizer/src/push_down_filter.rs#L706 @Dandandan @andygrove , thoughts? |
I think the idea of splitting the filter is that a tableprovider might support certain expressions/functions but not others. This way it can be communicated, so filter push down can push down only certain sub-expressions. If you would pass the complete expression, it only could communicate this for the entire filter. |
@Dandandan what do you think about Andy's suggestion to:
? |
@alamb I guess the alternative is going full-bore on what we had talked about and having TableProviders expose something like:
And taking care of this in the planner / optimizer. |
An alternative approach might be something like:
|
@avantgardnerio I wonder if your
Not only does this seem (?) to handle the usecase in this ticket's description, it would also allow you to handle predicates where you only have indexes on part of the predicate. For example, instead of WHERE s_i_id = $1 AND s_w_id = $2 What if you had a predicate like WHERE s_i_id = $1 AND s_w_id = $2 AND s_i_age > 21 Or something? You still probably want to use the multi-column index on |
I think this would work, but not perform optimally - the TableProvider would actually match exactly, but DataFusion would still have a (useless) I think I have a solution that I believe would work for all the cases. Simply change the existing signature of:
to:
Then my TableProvider could see both It's barely a change from the existing signature, and we can even wrap it in an adapter method with a default impl that will keep existing TableProviders working by calling it iteratively, mark the old one Thoughts @alamb ? |
I like that latest suggestion |
done: #5367 |
The API change seems reasonable to me too |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I have a
TableProvider
that supports multi-column indexes and could theoretically support filter pushdown, however when running the query:on the table
I observe the following in my logs:
So I would have to do complicated logic (append each portion of the predicate, always return "true", then "and" them back together during scan()) to properly apply this filter push down in my code.
Describe the solution you'd like
DataFusion would provide raw
Expr
s of the entire filter it is trying to push down, or - as a compromise, try first with the whole filter, then if theTableProvider
returnsUnsupported
try again with each part as suggested by @andygrove .Describe alternatives you've considered
The text was updated successfully, but these errors were encountered: