Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cardinality estimation for disjoint inner and outer joins #3848

Merged
merged 1 commit into from
Oct 18, 2022

Conversation

isidentical
Copy link
Contributor

Which issue does this PR close?

Closes #3802 .

Rationale for this change

When we know for a fact that an inner join is disjoint, we can estimate the cardinality of an inner join with (at least) one disjoint column as 0. From there, we can derive the cardinality of a left/right and full outer joins.

What changes are included in this PR?

We now take an is_exact flag in estimate_join_cardinality which is only true when both left and right are originating directly from table providers (which might change in the future if we collect more statistics during materialization points both in DF and Ballista, and somehow adaptively optimize the parent plans according to that). If that is_exact flag is true and if we identify a disjoint column when calculating the inner join cardinality, we now infer the cardinality of an inner join as 0.

Since the cardinality of left/right/full outer joins can be derived from the inner join's cardinality (e.g. a disjoint left outer join would result with the number of rows from the left side), this PR also adds support for inferring their cardinality.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the core Core DataFusion crate label Oct 15, 2022
@isidentical isidentical marked this pull request as ready for review October 16, 2022 15:16
@alamb
Copy link
Contributor

alamb commented Oct 18, 2022

Thanks @isidentical

@alamb alamb merged commit 42f5ff3 into apache:master Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve join cardinality estimation when there is no overlap in the min/max values
3 participants