Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] rewrite join conditions where only part of it can fit on the AST #8832

Closed
revans2 opened this issue Jul 27, 2023 · 0 comments · Fixed by #9635 or #9702
Closed

[FEA] rewrite join conditions where only part of it can fit on the AST #8832

revans2 opened this issue Jul 27, 2023 · 0 comments · Fixed by #9635 or #9702
Assignees
Labels
feature request New feature or request

Comments

@revans2
Copy link
Collaborator

revans2 commented Jul 27, 2023

Is your feature request related to a problem? Please describe.
CUDF added limited AST string support recently. We have an issue to enable that support, #8157 but it does not allow for any operations that produces a string as an intermediate value. That is just way too complicated for CUDF, because it would need to allocate GPU memory from the GPU.... So to be able to support things like this we either need #8743 which is rather scary and not likely to be all that fast, or we need some other ways to rewrite the queries so that we can split up some of the conditionals.

#8742 would do some of that, but only for inner joins, and not completely.

#8157 (comment) is a request to support doing a lower(trim(string_col)) as a part of a left outer conditional join.

To make that work we would need to look at the query. See that we cannot support that set of expressions on the GPU, and also see that the intermediate value only depends on one side of the join, so we could rewrite it to do part of the processing before the join, and drop unneeded columns afterwards.

#8157 (comment) explains what would need to happen for that specific case, but generally what would need to happen is that we would want to find all subexpressions in the join condition that cannot be translated to AST, but who's output types are supported by AST and only rely on one side of the join. If and only if this would allow us to translate all of the join condition to be on the GPU, then we would move each of those expression trees to a project that happens before the join, with a temporary name. Then rewrite the join condition to use those temporary columns in the join, and finally drop the temporary columns from the output after the join.

Something like #8831 would be really helpful in this too. But I am not sure if we would want to have this all happen inside of join, or if it would be better to rewrite the query and have it show up in the plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment