Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix task pool hanging with nested future::block_on calls #2725

Closed
wants to merge 1 commit into from

Conversation

willcrichton
Copy link
Contributor

@willcrichton willcrichton commented Aug 25, 2021

A previous instance of TaskPool::scope tried to optimize the case of a single task by immediately calling
future::block_on. However, @willcrichton observed an issue where a scope created within an async task
(that was already being executed via future::block_on) would hang on the inner task when recursively
calling future::block_on. Therefore this feature is disabled until someone decides it's needed and
comes up with a fix.

@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Aug 25, 2021
@alice-i-cecile alice-i-cecile added A-Tasks Tools for parallel and async work C-Bug An unexpected or incorrect behavior and removed S-Needs-Triage This issue needs to be labelled labels Aug 26, 2021
Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely, thanks.

For the commit history, it might be worth copying something similar to the comment into your PR description.

I'm happy to merge this as-is though

@DJMcNab DJMcNab added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Aug 27, 2021
A previous instance of TaskPool::scope tried to optimize the case of a single task by immediately calling
future::block_on. However, @willcrichton observed an issue where a scope created within an async task
(that was already being executed via future::block_on) would hang on the inner task when recursively
calling future::block_on. Therefore this feature is disabled until someone decides it's needed and
comes up with a fix.
@willcrichton
Copy link
Contributor Author

I've updated the commit to include the problem description.

@DJMcNab
Copy link
Member

DJMcNab commented Aug 27, 2021

bors try

bors bot added a commit that referenced this pull request Aug 27, 2021
@DJMcNab
Copy link
Member

DJMcNab commented Aug 27, 2021

bors cancel

@DJMcNab
Copy link
Member

DJMcNab commented Aug 27, 2021

bors try-

Sorry, I was trying to see if that would get the right content - I thought bors pulls from the PR description, but it might be different for single-commit PRs.

I think putting it in the description and in the PR body is certain to work though.

@willcrichton
Copy link
Contributor Author

Done!

@cart
Copy link
Member

cart commented Sep 4, 2021

Cool this makes sense to me. This was introduced in #932 and the benchmarks there indicate that perf wasn't majorly affected by the change. @aclysma introduced the line being removed. Do you have any thoughts?

@aclysma
Copy link
Contributor

aclysma commented Sep 4, 2021

I'm not sure the best way to implement this, but I think the ideal behavior is that we poll the future once. This way we don't fork out to another thread unless the task actually needs to wait on something. Maybe the future::block_on(future::poll_once(_)) we do later in the function would work?

I expect in many cases, these futures will not need to wait, and bouncing to another core means L1 and L2 cache will be cold (on typical hardware today anyways.)

It sounds easy to repro if the pool size is 1, might be worth having a test for it.

@cart
Copy link
Member

cart commented Sep 8, 2021

Yeah that makes sense. We should write some benchmarks to inform decisions like this (ex: does the cost of polling the future once every time generally pay off).

However given that this is a bug that needs solving and the pull request that introduced these lines had benchmarks that showed that there werent significant perf wins, I think its probably best to merge this (provided we can agree it produces correct results). Then we can open another issue to remind us to experiment with polling once / see if it wins us anything.

@aclysma
Copy link
Contributor

aclysma commented Sep 12, 2021

I just realized something. Isn't scope() a blocking (i.e. non-async) function? Calling a blocking function from async code may lead to deadlocks and generally shouldn't be done. If this is scope() being called within a task spawned from scope(), the intended way of doing this is to pass the scope object received from the first call to scope() around and to use only a single scope object at a time.

@aclysma
Copy link
Contributor

aclysma commented Sep 12, 2021

I think the reason the PR avoids the deadlock is because in this particular repro's case, the executors used in the scope happened to be the same executors that the caller was running on, and we are ticking those executors in the scope.spawned.len() > 1 case. If the caller of scope() was some other executor that isn't getting ticked, it would block one of the threads of that executor and might prevent forward progress, causing a deadlock.

On the surface, this would seem to suggest that nested scope() calls are ok. However, scopes have a thread local executors. If nested scopes don't use the local executors, all the spawned tasks will end up in the same executor, so we won't end up with a deadlock (ignoring the possible bad consequences of blocking the thread that called the outer scope). I'm not confident that would hold if tasks were spawned on both the thread-local and non-thread-local executors. I tried to think of a case where this would break, but I can't think of a likely scenario where it would. But I still wouldn't recommended nesting scope calls.

@cart cart modified the milestone: Bevy 0.6 Dec 14, 2021
@cart
Copy link
Member

cart commented Dec 18, 2021

Removed from the 0.6 milestone because theres nuance here I don't want to navigate this late in the game.

@superdump
Copy link
Contributor

@cart seems like someone with 4 threads is hitting a lock in the gltf loader. Probably this

@Roba1993
Copy link

Roba1993 commented Jan 6, 2022

@cart After a longer debug session with the folks here, we figured out, that the attached low poly model with 1 texture is hanging the complete asset loading on any machine with 4 cores.

I hope this PR can be merged into the 0.6 to get this fixed
skull.zip
.

@rparrett
Copy link
Contributor

rparrett commented Jan 6, 2022

This is reproducible for me with skull.glb above and the load_gltf example with

app.insert_resource(DefaultTaskPoolOptions::with_num_threads(4))

And a dumb workaround:

let mut opts = DefaultTaskPoolOptions::default();
opts.io.min_threads = 2;
// ...
app.insert_resource(opts);

bors bot pushed a commit that referenced this pull request Jan 7, 2022
# Objective

- Fix the case mentioned in #2725 (comment).
- On a machine with 4 cores, so 1 thread for assets, loading a gltf with only one textures hangs all asset loading

## Solution

- Do not use the task pool when there is only one texture to load


Co-authored-by: François <[email protected]>
@mockersf
Copy link
Member

mockersf commented Jan 7, 2022

the attached low poly model with 1 texture is hanging the complete asset loading on any machine with 4 cores.

This specific case should now be fixed by not going full multithreaded to load 1 item (#3577)

@alice-i-cecile alice-i-cecile added this to the Bevy 0.6.1 milestone Jan 15, 2022
@alice-i-cecile alice-i-cecile removed this from the Bevy 0.6.1 milestone Feb 10, 2022
@alice-i-cecile
Copy link
Member

@hymm, can I get your review on this?

@alice-i-cecile alice-i-cecile added X-Controversial There is active debate or serious implications around merging this PR and removed S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it labels May 2, 2022
@hymm
Copy link
Contributor

hymm commented May 2, 2022

@alice-i-cecile With #4466 this optimization is no longer valid. You might only have one task initially, but that task can spawn more tasks. So the code that is removed here no longer exists in that PR.

There are a couple of problems with the code that is removed in this PR. One is that it never yields, so code that needs to use the local thread is blocked from ever running. The other is that it doesn't drive the local executor. If a single task is spawned on the local executor it will never complete, because the local executor never runs. If we wanted to keep this optimization, we would need to do something more like:

} else if scope.spawned.len() == 1 {
    vec![future::block_on(async {
        let run_forever = async move {
            loop {
                // tick the local executor
                local_executor.try_tick();
               // yield to allow threads that need this thread to run to make progress
                future::yield_now().await;
            }
        };
        scope.spawned[0].or(run_forever).await
    })]
} else {

^^ this would need to be benchmarked to see if it's faster

My preference would be to focus on #4466 and close this out if that is merged.

@alice-i-cecile alice-i-cecile added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label May 18, 2022
@alice-i-cecile
Copy link
Member

Closed by #4466.

dekirisu pushed a commit to dekirisu/bevy_gltf_trait that referenced this pull request Jul 7, 2024
# Objective

- Fix the case mentioned in bevyengine/bevy#2725 (comment).
- On a machine with 4 cores, so 1 thread for assets, loading a gltf with only one textures hangs all asset loading

## Solution

- Do not use the task pool when there is only one texture to load


Co-authored-by: François <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Tasks Tools for parallel and async work C-Bug An unexpected or incorrect behavior S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help X-Controversial There is active debate or serious implications around merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants