-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix task pool hanging with nested future::block_on calls #2725
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely, thanks.
For the commit history, it might be worth copying something similar to the comment into your PR description.
I'm happy to merge this as-is though
A previous instance of TaskPool::scope tried to optimize the case of a single task by immediately calling future::block_on. However, @willcrichton observed an issue where a scope created within an async task (that was already being executed via future::block_on) would hang on the inner task when recursively calling future::block_on. Therefore this feature is disabled until someone decides it's needed and comes up with a fix.
46ecce3
to
e4706b9
Compare
I've updated the commit to include the problem description. |
bors try |
bors cancel |
bors try- Sorry, I was trying to see if that would get the right content - I thought bors pulls from the PR description, but it might be different for single-commit PRs. I think putting it in the description and in the PR body is certain to work though. |
Done! |
I'm not sure the best way to implement this, but I think the ideal behavior is that we poll the future once. This way we don't fork out to another thread unless the task actually needs to wait on something. Maybe the future::block_on(future::poll_once(_)) we do later in the function would work? I expect in many cases, these futures will not need to wait, and bouncing to another core means L1 and L2 cache will be cold (on typical hardware today anyways.) It sounds easy to repro if the pool size is 1, might be worth having a test for it. |
Yeah that makes sense. We should write some benchmarks to inform decisions like this (ex: does the cost of polling the future once every time generally pay off). However given that this is a bug that needs solving and the pull request that introduced these lines had benchmarks that showed that there werent significant perf wins, I think its probably best to merge this (provided we can agree it produces correct results). Then we can open another issue to remind us to experiment with polling once / see if it wins us anything. |
I just realized something. Isn't scope() a blocking (i.e. non-async) function? Calling a blocking function from async code may lead to deadlocks and generally shouldn't be done. If this is scope() being called within a task spawned from scope(), the intended way of doing this is to pass the scope object received from the first call to scope() around and to use only a single scope object at a time. |
I think the reason the PR avoids the deadlock is because in this particular repro's case, the executors used in the scope happened to be the same executors that the caller was running on, and we are ticking those executors in the scope.spawned.len() > 1 case. If the caller of scope() was some other executor that isn't getting ticked, it would block one of the threads of that executor and might prevent forward progress, causing a deadlock. On the surface, this would seem to suggest that nested scope() calls are ok. However, scopes have a thread local executors. If nested scopes don't use the local executors, all the spawned tasks will end up in the same executor, so we won't end up with a deadlock (ignoring the possible bad consequences of blocking the thread that called the outer scope). I'm not confident that would hold if tasks were spawned on both the thread-local and non-thread-local executors. I tried to think of a case where this would break, but I can't think of a likely scenario where it would. But I still wouldn't recommended nesting scope calls. |
Removed from the 0.6 milestone because theres nuance here I don't want to navigate this late in the game. |
@cart seems like someone with 4 threads is hitting a lock in the gltf loader. Probably this |
This is reproducible for me with app.insert_resource(DefaultTaskPoolOptions::with_num_threads(4)) And a dumb workaround: let mut opts = DefaultTaskPoolOptions::default();
opts.io.min_threads = 2;
// ...
app.insert_resource(opts); |
# Objective - Fix the case mentioned in #2725 (comment). - On a machine with 4 cores, so 1 thread for assets, loading a gltf with only one textures hangs all asset loading ## Solution - Do not use the task pool when there is only one texture to load Co-authored-by: François <[email protected]>
This specific case should now be fixed by not going full multithreaded to load 1 item (#3577) |
@hymm, can I get your review on this? |
@alice-i-cecile With #4466 this optimization is no longer valid. You might only have one task initially, but that task can spawn more tasks. So the code that is removed here no longer exists in that PR. There are a couple of problems with the code that is removed in this PR. One is that it never yields, so code that needs to use the local thread is blocked from ever running. The other is that it doesn't drive the local executor. If a single task is spawned on the local executor it will never complete, because the local executor never runs. If we wanted to keep this optimization, we would need to do something more like: } else if scope.spawned.len() == 1 {
vec![future::block_on(async {
let run_forever = async move {
loop {
// tick the local executor
local_executor.try_tick();
// yield to allow threads that need this thread to run to make progress
future::yield_now().await;
}
};
scope.spawned[0].or(run_forever).await
})]
} else { ^^ this would need to be benchmarked to see if it's faster My preference would be to focus on #4466 and close this out if that is merged. |
Closed by #4466. |
# Objective - Fix the case mentioned in bevyengine/bevy#2725 (comment). - On a machine with 4 cores, so 1 thread for assets, loading a gltf with only one textures hangs all asset loading ## Solution - Do not use the task pool when there is only one texture to load Co-authored-by: François <[email protected]>
A previous instance of TaskPool::scope tried to optimize the case of a single task by immediately calling
future::block_on. However, @willcrichton observed an issue where a scope created within an async task
(that was already being executed via future::block_on) would hang on the inner task when recursively
calling future::block_on. Therefore this feature is disabled until someone decides it's needed and
comes up with a fix.