Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: improve farms maintenance performance via parallelization #3354

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tediou5
Copy link
Contributor

@tediou5 tediou5 commented Jan 20, 2025

There's no need for sequential execution here; parallelizing it will improve performance.

It's even simpler than I imagined—just placing the tasks in a FuturesUnordered for background execution, with no changes to the other logic.

in #3309 , I believe what’s needed is another background task queue to collect the stream reques, which doesn’t conflict with this PR, so I’ve submitted it directly.

Code contributor checklist:

@tediou5 tediou5 requested a review from nazar-pc as a code owner January 20, 2025 07:27
Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks reasonable, but does it actually improve performance in practice?

If two farms are on the same disk (or otherwise share resources, like a bus or cable), writing pieces at the same time will likely slow them down.

Also, lock contention on shared data structures could slow every farm down.

There’s also a slight chance running more code in parallel will deadlock, which would be a bug that needs to be fixed in this PR.

I’m not familiar with this code in detail, and why it was designed that way. So I’ll leave it to Nazar to work out how significant these risks are.

@tediou5 tediou5 closed this Jan 22, 2025
@tediou5 tediou5 reopened this Jan 22, 2025
@tediou5
Copy link
Contributor Author

tediou5 commented Jan 22, 2025

@teor2345 Thank you for your review. here is my understanding; please point it out if I'm wrong.

If two farms are on the same disk (or otherwise share resources, like a bus or cable), writing pieces at the same time will likely slow them down.

This is just the place for managing the cluster farms. During initialization, it only reads, and there is no concurrent write happening on the farms here. (In practice, it is also rare for two farms to be plotted on the same disk, as it doesn't make much sense.)

Also, lock contention on shared data structures could slow every farm down.

Yes, there will indeed be lock contention (this can be optimized by improving lock usage to avoid it; let me optimize this part of the implementation). However, in my understanding, the focus here is still on network I/O, and the time spent on locks is negligible compared to the network I/O.

There’s also a slight chance running mire code in parallel will deadlock, which would be a bug that needs to be fixed in this PR.

Yes, there is a better implementation for this. Let me optimize it.

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately not an equivalent change, you simply ignored the explanation provided in comments and violated expectations described in it. I should have provided a more elaborate explanation as to why it was needed, but it wasn't an accident.

On the networking level it is not guaranteed that farm shutdown will happen strictly after farm initialization has completed on the controller, so you may end up in a situation where farm is never removed. That is the reason of sequential processing, to make sure farm is removed strictly after it is initialized and never before that. As briefly mentioned in #3309, it doesn't actually need to be globally sequential and parallelizing across independent farms will in fact improve performance, but everything related to specific farm still needs to be sequential. I think this can be implemented with StreamMap, which we already use in other places for similar purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants