Fix task pool hanging with nested future::block_on calls #2725

willcrichton · 2021-08-25T04:34:56Z

A previous instance of TaskPool::scope tried to optimize the case of a single task by immediately calling
future::block_on. However, @willcrichton observed an issue where a scope created within an async task
(that was already being executed via future::block_on) would hang on the inner task when recursively
calling future::block_on. Therefore this feature is disabled until someone decides it's needed and
comes up with a fix.

crates/bevy_tasks/src/task_pool.rs

DJMcNab

Lovely, thanks.

For the commit history, it might be worth copying something similar to the comment into your PR description.

I'm happy to merge this as-is though

@willcrichton

A previous instance of TaskPool::scope tried to optimize the case of a single task by immediately calling future::block_on. However, @willcrichton observed an issue where a scope created within an async task (that was already being executed via future::block_on) would hang on the inner task when recursively calling future::block_on. Therefore this feature is disabled until someone decides it's needed and comes up with a fix.

willcrichton · 2021-08-27T18:10:02Z

I've updated the commit to include the problem description.

DJMcNab · 2021-08-27T19:21:12Z

bors try

DJMcNab · 2021-08-27T19:21:49Z

bors cancel

DJMcNab · 2021-08-27T19:23:01Z

bors try-

Sorry, I was trying to see if that would get the right content - I thought bors pulls from the PR description, but it might be different for single-commit PRs.

I think putting it in the description and in the PR body is certain to work though.

willcrichton · 2021-08-27T19:23:41Z

Done!

cart · 2021-09-04T18:23:28Z

Cool this makes sense to me. This was introduced in #932 and the benchmarks there indicate that perf wasn't majorly affected by the change. @aclysma introduced the line being removed. Do you have any thoughts?

aclysma · 2021-09-04T23:52:55Z

I'm not sure the best way to implement this, but I think the ideal behavior is that we poll the future once. This way we don't fork out to another thread unless the task actually needs to wait on something. Maybe the future::block_on(future::poll_once(_)) we do later in the function would work?

I expect in many cases, these futures will not need to wait, and bouncing to another core means L1 and L2 cache will be cold (on typical hardware today anyways.)

It sounds easy to repro if the pool size is 1, might be worth having a test for it.

cart · 2021-09-08T21:23:15Z

Yeah that makes sense. We should write some benchmarks to inform decisions like this (ex: does the cost of polling the future once every time generally pay off).

However given that this is a bug that needs solving and the pull request that introduced these lines had benchmarks that showed that there werent significant perf wins, I think its probably best to merge this (provided we can agree it produces correct results). Then we can open another issue to remind us to experiment with polling once / see if it wins us anything.

aclysma · 2021-09-12T15:53:25Z

I just realized something. Isn't scope() a blocking (i.e. non-async) function? Calling a blocking function from async code may lead to deadlocks and generally shouldn't be done. If this is scope() being called within a task spawned from scope(), the intended way of doing this is to pass the scope object received from the first call to scope() around and to use only a single scope object at a time.

aclysma · 2021-09-12T16:53:23Z

I think the reason the PR avoids the deadlock is because in this particular repro's case, the executors used in the scope happened to be the same executors that the caller was running on, and we are ticking those executors in the scope.spawned.len() > 1 case. If the caller of scope() was some other executor that isn't getting ticked, it would block one of the threads of that executor and might prevent forward progress, causing a deadlock.

On the surface, this would seem to suggest that nested scope() calls are ok. However, scopes have a thread local executors. If nested scopes don't use the local executors, all the spawned tasks will end up in the same executor, so we won't end up with a deadlock (ignoring the possible bad consequences of blocking the thread that called the outer scope). I'm not confident that would hold if tasks were spawned on both the thread-local and non-thread-local executors. I tried to think of a case where this would break, but I can't think of a likely scenario where it would. But I still wouldn't recommended nesting scope calls.

cart · 2021-12-18T22:28:52Z

Removed from the 0.6 milestone because theres nuance here I don't want to navigate this late in the game.

superdump · 2022-01-06T23:28:43Z

@cart seems like someone with 4 threads is hitting a lock in the gltf loader. Probably this

Roba1993 · 2022-01-06T23:35:39Z

@cart After a longer debug session with the folks here, we figured out, that the attached low poly model with 1 texture is hanging the complete asset loading on any machine with 4 cores.

I hope this PR can be merged into the 0.6 to get this fixed
skull.zip
.

rparrett · 2022-01-06T23:40:25Z

This is reproducible for me with skull.glb above and the load_gltf example with

app.insert_resource(DefaultTaskPoolOptions::with_num_threads(4))

And a dumb workaround:

let mut opts = DefaultTaskPoolOptions::default();
opts.io.min_threads = 2;
// ...
app.insert_resource(opts);

# Objective - Fix the case mentioned in #2725 (comment). - On a machine with 4 cores, so 1 thread for assets, loading a gltf with only one textures hangs all asset loading ## Solution - Do not use the task pool when there is only one texture to load Co-authored-by: François <[email protected]>

mockersf · 2022-01-07T07:56:03Z

the attached low poly model with 1 texture is hanging the complete asset loading on any machine with 4 cores.

This specific case should now be fixed by not going full multithreaded to load 1 item (#3577)

alice-i-cecile · 2022-05-02T12:35:02Z

@hymm, can I get your review on this?

hymm · 2022-05-02T16:06:52Z

@alice-i-cecile With #4466 this optimization is no longer valid. You might only have one task initially, but that task can spawn more tasks. So the code that is removed here no longer exists in that PR.

There are a couple of problems with the code that is removed in this PR. One is that it never yields, so code that needs to use the local thread is blocked from ever running. The other is that it doesn't drive the local executor. If a single task is spawned on the local executor it will never complete, because the local executor never runs. If we wanted to keep this optimization, we would need to do something more like:

} else if scope.spawned.len() == 1 {
    vec![future::block_on(async {
        let run_forever = async move {
            loop {
                // tick the local executor
                local_executor.try_tick();
               // yield to allow threads that need this thread to run to make progress
                future::yield_now().await;
            }
        };
        scope.spawned[0].or(run_forever).await
    })]
} else {

^^ this would need to be benchmarked to see if it's faster

My preference would be to focus on #4466 and close this out if that is merged.

alice-i-cecile · 2022-10-10T16:41:15Z

Closed by #4466.

# Objective - Fix the case mentioned in bevyengine/bevy#2725 (comment). - On a machine with 4 cores, so 1 thread for assets, loading a gltf with only one textures hangs all asset loading ## Solution - Do not use the task pool when there is only one texture to load Co-authored-by: François <[email protected]>

github-actions bot added the S-Needs-Triage This issue needs to be labelled label Aug 25, 2021

alice-i-cecile added A-Tasks Tools for parallel and async work C-Bug An unexpected or incorrect behavior and removed S-Needs-Triage This issue needs to be labelled labels Aug 26, 2021

DJMcNab reviewed Aug 27, 2021

View reviewed changes

crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved

DJMcNab reviewed Aug 27, 2021

View reviewed changes

DJMcNab added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Aug 27, 2021

willcrichton force-pushed the task-pool-lock branch from 46ecce3 to e4706b9 Compare August 27, 2021 18:07

bors bot added a commit that referenced this pull request Aug 27, 2021

Try #2725:

a7b88b4

mockersf approved these changes Sep 2, 2021

View reviewed changes

cart modified the milestone: Bevy 0.6 Dec 14, 2021

mockersf mentioned this pull request Jan 6, 2022

[Merged by Bors] - gltf loader: do not use the taskpool for only one task #3577

Closed

alice-i-cecile added this to the Bevy 0.6.1 milestone Jan 15, 2022

alice-i-cecile removed this from the Bevy 0.6.1 milestone Feb 10, 2022

alice-i-cecile added X-Controversial There is active debate or serious implications around merging this PR and removed S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it labels May 2, 2022

alice-i-cecile added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label May 18, 2022

alice-i-cecile closed this Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix task pool hanging with nested future::block_on calls #2725

Fix task pool hanging with nested future::block_on calls #2725

willcrichton commented Aug 25, 2021 •

edited

Loading

DJMcNab left a comment

willcrichton commented Aug 27, 2021

DJMcNab commented Aug 27, 2021

DJMcNab commented Aug 27, 2021

DJMcNab commented Aug 27, 2021

willcrichton commented Aug 27, 2021

cart commented Sep 4, 2021

aclysma commented Sep 4, 2021

cart commented Sep 8, 2021

aclysma commented Sep 12, 2021

aclysma commented Sep 12, 2021

cart commented Dec 18, 2021

superdump commented Jan 6, 2022

Roba1993 commented Jan 6, 2022

rparrett commented Jan 6, 2022

mockersf commented Jan 7, 2022

alice-i-cecile commented May 2, 2022

hymm commented May 2, 2022

alice-i-cecile commented Oct 10, 2022

Fix task pool hanging with nested future::block_on calls #2725

Fix task pool hanging with nested future::block_on calls #2725

Conversation

willcrichton commented Aug 25, 2021 • edited Loading

DJMcNab left a comment

Choose a reason for hiding this comment

willcrichton commented Aug 27, 2021

DJMcNab commented Aug 27, 2021

DJMcNab commented Aug 27, 2021

DJMcNab commented Aug 27, 2021

willcrichton commented Aug 27, 2021

cart commented Sep 4, 2021

aclysma commented Sep 4, 2021

cart commented Sep 8, 2021

aclysma commented Sep 12, 2021

aclysma commented Sep 12, 2021

cart commented Dec 18, 2021

superdump commented Jan 6, 2022

Roba1993 commented Jan 6, 2022

rparrett commented Jan 6, 2022

mockersf commented Jan 7, 2022

alice-i-cecile commented May 2, 2022

hymm commented May 2, 2022

alice-i-cecile commented Oct 10, 2022

willcrichton commented Aug 25, 2021 •

edited

Loading