Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement statistics estimation for FilterExec #3845

Closed
Tracked by #3929
Dandandan opened this issue Oct 15, 2022 · 7 comments · Fixed by #4162
Closed
Tracked by #3929

Implement statistics estimation for FilterExec #3845

Dandandan opened this issue Oct 15, 2022 · 7 comments · Fixed by #4162
Assignees
Labels
enhancement New feature or request optimizer Optimizer rules performance Make DataFusion faster

Comments

@Dandandan
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
@isidentical implemented more statistics-rules for joins. When a FilterExec is between a TableScan however, we do not return a join estimation yet.

Describe the solution you'd like
We should provide an estimation based on the filterexpressions and statistics information that we have.

Describe alternatives you've considered

Additional context
Add any other context or screenshots about the feature request here.

@Dandandan Dandandan added enhancement New feature or request performance Make DataFusion faster optimizer Optimizer rules labels Oct 15, 2022
@isidentical
Copy link
Contributor

This was next on my list, so I'd be happy to work on it!

@Dandandan
Copy link
Contributor Author

Awesome @isidentical I assigned you on this issue!

@Dandandan
Copy link
Contributor Author

I would be very interested to see if we're getting any changes in TCP-H queries after this.

@isidentical
Copy link
Contributor

Will be sure to include some benchmarks 👍🏻

@mingmwang
Copy link
Contributor

@Dandandan I see there are couple of stats related issues opening, is there an umbrella task/task lists? Or maybe we can tag all related issue as CBOStats ?

@alamb
Copy link
Contributor

alamb commented Oct 22, 2022

I think @isidentical started a discussion on #3898 -- @mingmwang could you take a look? Perhaps it would be time to start socializing that ticket more broadly (I could send some email / slack messages to bring it to the community's attention)

@mingmwang
Copy link
Contributor

@alamb Thanks, I will go through all the tickets and take a closer look at the current code base , I can help working on some
of the open tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimizer Optimizer rules performance Make DataFusion faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants