Initial query stages read parquet files and repartition them needlessly #243

andygrove · 2022-09-19T00:40:31Z

Describe the bug

To Reproduce

Expected behavior
We should not shuffle the input files like this

Additional context

andygrove · 2022-09-19T00:41:18Z

@yahoNanJing @thinkharderdev I found this during testing today with #242

andygrove · 2022-09-19T01:20:12Z

I may have been premature in creating this issue. This stage is an input to a join. I will think about this some more.

yahoNanJing · 2022-09-19T07:59:53Z

Hi @andygrove, I don't think it's an issue. For a hash join, we need to do repartition each tables.

andygrove added bug Something isn't working performance labels Sep 19, 2022

andygrove mentioned this issue Sep 19, 2022

Improve ballista performance #129

Open

18 tasks

andygrove closed this as completed Sep 19, 2022

Provide feedback