Optimize EXISTS subquery expressions by rewriting as semi-join #2351

andygrove · 2022-04-27T12:32:53Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like an optimizer rule that rewrites queries with EXISTS subqueries as semi-joins See discussion in #2344.

Describe the solution you'd like
See discussion in #2344 for an example.

Describe alternatives you've considered
None

Additional context
None

jon-chuang · 2022-05-02T23:02:42Z

I'm currently halfway through implementing this, using heuristic rewrite rules, but I've decided to try to make it more generic as per DuckDB/Hyper's (and also apprently Materialize's) approach, which is to pushdown the dependent join through the subquery until all dependent/correlated predicates have been exfiltrated from the subquery.

This was the intuition which occured to me as the most powerful and complete way after a bit of fiddling around, and thankfully there are trailblazers here.

This is the best resource, DuckDB's dependent join pushdown logic for various logical operators:
https://github.com/duckdb/duckdb/blob/bee8017bdcc5e652aee26ce8cfb260990cf6a369/src/planner/subquery/flatten_dependent_join.cpp#L72

Edit: I think the more general strategy (dependent join pushdown) may be more complicated and potentially more costly than the alternative. Hence, I think I will try to do the simpler strategy first (filter pullup + pattern match against Exists(Filter(..))) properly, and leave dependent join pushdown to a future endeavour.

Here is Spark's Catalyst optimizer doing predicate pullup: https://github.com/apache/spark/blob/d3af3e5d73f8fabaa9cb320d485962d49a46f8bf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala

The full subquery decorrelation for Catalyst is found here (same technique as DuckDB/Hyper): https://github.com/apache/spark/blob/9b885ae3ba70cd97489e8768a335c9b9c10e9d87/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala#L308

andygrove · 2022-12-31T16:29:23Z

I think this can be closed now

andygrove added the enhancement New feature or request label Apr 27, 2022

andygrove mentioned this issue Apr 27, 2022

[EPIC] Subquery support #2248

Closed

15 tasks

jon-chuang mentioned this issue May 5, 2022

Add EXISTS and IN subquery rewriting for correlated filters at filter depth 1 #2451

Closed

andygrove closed this as completed Dec 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize EXISTS subquery expressions by rewriting as semi-join #2351

Optimize EXISTS subquery expressions by rewriting as semi-join #2351

andygrove commented Apr 27, 2022

jon-chuang commented May 2, 2022 •

edited

Loading

andygrove commented Dec 31, 2022

Optimize EXISTS subquery expressions by rewriting as semi-join #2351

Optimize EXISTS subquery expressions by rewriting as semi-join #2351

Comments

andygrove commented Apr 27, 2022

jon-chuang commented May 2, 2022 • edited Loading

andygrove commented Dec 31, 2022

jon-chuang commented May 2, 2022 •

edited

Loading