opt: improve farms maintenance performance via parallelization #3354

tediou5 · 2025-01-20T07:27:27Z

There's no need for sequential execution here; parallelizing it will improve performance.

It's even simpler than I imagined—just placing the tasks in a FuturesUnordered for background execution, with no changes to the other logic.

in #3309 , I believe what’s needed is another background task queue to collect the stream reques, which doesn’t conflict with this PR, so I’ve submitted it directly.

Code contributor checklist:

I have read, understood and followed contributing guide

teor2345

This code looks reasonable, but does it actually improve performance in practice?

If two farms are on the same disk (or otherwise share resources, like a bus or cable), writing pieces at the same time will likely slow them down.

Also, lock contention on shared data structures could slow every farm down.

There’s also a slight chance running more code in parallel will deadlock, which would be a bug that needs to be fixed in this PR.

I’m not familiar with this code in detail, and why it was designed that way. So I’ll leave it to Nazar to work out how significant these risks are.

tediou5 · 2025-01-22T03:36:28Z

@teor2345 Thank you for your review. here is my understanding; please point it out if I'm wrong.

If two farms are on the same disk (or otherwise share resources, like a bus or cable), writing pieces at the same time will likely slow them down.

This is just the place for managing the cluster farms. During initialization, it only reads, and there is no concurrent write happening on the farms here. (In practice, it is also rare for two farms to be plotted on the same disk, as it doesn't make much sense.)

Also, lock contention on shared data structures could slow every farm down.

Yes, there will indeed be lock contention (this can be optimized by improving lock usage to avoid it; let me optimize this part of the implementation). However, in my understanding, the focus here is still on network I/O, and the time spent on locks is negligible compared to the network I/O.

There’s also a slight chance running mire code in parallel will deadlock, which would be a bug that needs to be fixed in this PR.

Yes, there is a better implementation for this. Let me optimize it.

nazar-pc

This is unfortunately not an equivalent change, you simply ignored the explanation provided in comments and violated expectations described in it. I should have provided a more elaborate explanation as to why it was needed, but it wasn't an accident.

On the networking level it is not guaranteed that farm shutdown will happen strictly after farm initialization has completed on the controller, so you may end up in a situation where farm is never removed. That is the reason of sequential processing, to make sure farm is removed strictly after it is initialized and never before that. As briefly mentioned in #3309, it doesn't actually need to be globally sequential and parallelizing across independent farms will in fact improve performance, but everything related to specific farm still needs to be sequential. I think this can be implemented with StreamMap, which we already use in other places for similar purposes.

opt: improve farms maintenance performance via parallelization

dea7e1a

tediou5 requested a review from nazar-pc as a code owner January 20, 2025 07:27

teor2345 reviewed Jan 21, 2025

View reviewed changes

tediou5 closed this Jan 22, 2025

tediou5 reopened this Jan 22, 2025

nazar-pc requested changes Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: improve farms maintenance performance via parallelization #3354

opt: improve farms maintenance performance via parallelization #3354

tediou5 commented Jan 20, 2025

teor2345 left a comment •

edited

Loading

tediou5 commented Jan 22, 2025

nazar-pc left a comment

opt: improve farms maintenance performance via parallelization #3354

Are you sure you want to change the base?

opt: improve farms maintenance performance via parallelization #3354

Conversation

tediou5 commented Jan 20, 2025

Code contributor checklist:

teor2345 left a comment • edited Loading

Choose a reason for hiding this comment

tediou5 commented Jan 22, 2025

nazar-pc left a comment

Choose a reason for hiding this comment

teor2345 left a comment •

edited

Loading