Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange #4585

Closed
yahoNanJing opened this issue Dec 12, 2022 · 5 comments · Fixed by #4586
Labels
enhancement New feature or request

Comments

@yahoNanJing
Copy link
Contributor

yahoNanJing commented Dec 12, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently there's no explicit way to indicate whether an execution plan needs a data exchange, which is not so clear for the caller, like Ballista, to judge whether it's necessary to add an additional data exchange operator to deal with the shuffling.

Describe the solution you'd like

It's better to classify and clarify which execution plans needs data exchange explicitly.

Currently there are 3 kinds of execution plan which needs data exchange

  1. RepartitionExec for changing the partition number between two operators
  2. CoalescePartitionsExec for collapsing all of the partitions into one without ordering guarantee
  3. SortPreservingMergeExec for collapsing all of the sorted partitions into one with ordering guarantee

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@tustvold
Copy link
Contributor

This might be a dumb question, but can this not be inferred from if the ExecutionPlan alters the partitioning of its input?

@alamb
Copy link
Contributor

alamb commented Dec 12, 2022

This might be a dumb question, but can this not be inferred from if the ExecutionPlan alters the partitioning of its input?

I tried to think of a case where checking if input partitioning and output partitioning match was insufficient and I could not 🤔

@yahoNanJing
Copy link
Contributor Author

Hi @tustvold and @alamb, thanks for your suggestion. The PR is ready. One method of need_data_exchange is added to the trait ExecutionPlan, which may be much more clear

@yahoNanJing yahoNanJing changed the title Add a trait DataExchangeExecutionPlan to indicate whether an execution plan needs a data exchange Add need_data_exchange in the ExecutionPlan to indicate whether a physical operator needs data exchange Dec 13, 2022
@mingmwang
Copy link
Contributor

My preference is to keep the ExecutionPlan Trait simple and leverage other methods or pattern matching to decide what kind of actions need to take to adjust the plan tree.

@yahoNanJing
Copy link
Contributor Author

Agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants