Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ballista: Fix hacks around concurrency=2 to force hash-partitioned joins #20

Open
andygrove opened this issue Jul 3, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

By default, DataFusion uses hash-partitioned joins if concurrency > 1 which led to me adding this hacky code in a couple of places in Ballista.

let config = ExecutionConfig::new().with_concurrency(2); // TODO: this is hack to enable partitioned joins
let mut ctx = ExecutionContext::with_config(config);

Describe the solution you'd like
I'm actually not sure what the solution should be, but I would like to be able to tell the context to use hash-partitioned joins, separately from specifying concurrency.

Describe alternatives you've considered
None

Additional context
This code is running in the scheduler, not in the executor where the query actually executes. The scheduler concurrency should not impact how the query is planned.

@Dandandan
Copy link
Contributor

Dandandan commented Jul 3, 2021

I am wondering if the current 2 should just be based on a configation setting for the default number of partitions (just like Spark uses 200 partitions as a default).
Maybe we should clean up the terminology wrt concurrency and number of partitions a bit in that case

@andygrove
Copy link
Member Author

💯 That sounds great to me.

@andygrove andygrove transferred this issue from apache/datafusion May 19, 2022
thinkharderdev added a commit that referenced this issue Jan 21, 2023
* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
thinkharderdev added a commit that referenced this issue Feb 7, 2023
* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>
thinkharderdev added a commit that referenced this issue Feb 7, 2023
andygrove pushed a commit that referenced this issue Feb 11, 2023
* Customize session builder

* Add setter for executor slots policy

* Construct Executor with functions

* Add queued and completed timestamps to successful job status

* Add public methods to SchedulerServer

* Public method for getting execution graph

* Public method for stage metrics

* Use node-level local limit (#20)

* Use node-level local limit

* serialize limit in shuffle writer

* Revert "Merge pull request #19 from coralogix/sc-5792"

This reverts commit 08140ef, reversing
changes made to a7f1384.

* add log

* make sure we don't forget limit for shuffle writer

* update accum correctly and try to break early

* Check local limit accumulator before polling for more data

* fix build

Co-authored-by: Martins Purins <[email protected]>

* configure_me_codegen retroactively reserved on our `bind_host` parame… (#520)

* configure_me_codegen retroactively reserved on our `bind_host` parameter name

* Add label and pray

* Add more labels why not

* Add ClusterState trait

* Refactor slightly for clarity

* Revert "Use node-level local limit (#20)"

This reverts commit ff96bcd.

* Revert "Public method for stage metrics"

This reverts commit a802315.

* Revert "Public method for getting execution graph"

This reverts commit 490bda5.

* Revert "Add public methods to SchedulerServer"

This reverts commit 5ad27c0.

* Revert "Add queued and completed timestamps to successful job status"

This reverts commit c615fce.

* Revert "Construct Executor with functions"

This reverts commit 24d4830.

* Always forget the apache header

* WIP

* Implement JobState

* Tests and fixes

* do not hold ref across await point

* Fix clippy warnings

* Fix tomlfmt github action

* uncomment test

---------

Co-authored-by: Martins Purins <[email protected]>
Co-authored-by: Brent Gardner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants