-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Livelock in poll_proceed when using a futures 0.1 FuturesUnordered on Tokio 0.2.16 #2390
Comments
Are you using old futures because it has |
Thanks for replying!
Yep, we're indeed aware that FuturesUnordered in also in Futures 0.3. We simply still have some older code that still uses Futures 0.1 that is being driven by a Tokio 0.2 executor. The repro I shared is definitely a contrived, reduced example (the actual callsite for this is here: https://github.com/facebookexperimental/eden/blob/master/eden/mononoke/pushrebase/src/lib.rs#L281, and goes through layers before finally getting to FuturesUnordered).
Here you go: Line 225 in 58ba45a
|
The issue is almost certainly the new preemption feature (blog post), which makes IO resources return This means that since the wakeup is emitted to the |
Yeah, this is basically rust-lang/futures-rs#2047, which was fixed in |
@jonhoo thanks — sounds like I should submit this backport then. That said, just to make sure I'm following this properly: this means my FuturesUnordered will poll the underlying future 32 times "unsuccessfully" before actually yielding and polling it successfully, right? |
This backports rust-lang#2049 to the 0.1 branch. Without this change, polling > 200 futures trough a FuturesUnordered on a Tokio 0.2 executor results in a busy loop in Tokio's cooperative scheduling module. See for a repro of where this breaks: tokio-rs/tokio#2390 Tested by running the reproducer I submitted there. Without this change, it hangs forever (spinning on CPU). With the change, it doesn't.
I submitted rust-lang/futures-rs#2122 to backport the fix. Out of curiosity, is this new cooperative scheduling mechanism configurable? (i.e. can it be turned on or off on a per-runtime basis?) Looking at the code, it looks like it isn't, but I'd love to make sure: we were somewhat lucky that we had a test that happened to catch this particular bug, but it'd be nice to be able to canary this individual mechanism in order to make sure we don't have further instances of it lying around in the codebase 😄 |
@krallin Yes, your observation is (sadly) correct —
The tricky part of this is how we expose that functionality without requiring that the implementor having to depend on There isn't currently a way to turn off |
Summary: This is needed because the tonic crate (see the diff stack) relies on tokio ^0.2.13 We can't go to a newer version because a bug that affects mononoke was introduced on 0.2.14 (discussion started on T65261126). The issue was reported upstream tokio-rs/tokio#2390 This diff simply changed the version number on `fbsource/third-party/rust/Cargo.toml` and ran `fbsource/third-party/rust/reindeer/vendor`. Also ran `buck run //common/rust/cargo_from_buck:cargo_from_buck` to fix the tokio version on generated cargo files Reviewed By: krallin Differential Revision: D21043344 fbshipit-source-id: e61797317a581aa87a8a54e9e2ae22655f22fb97
Summary: This is needed because the tonic crate (see the diff stack) relies on tokio ^0.2.13 We can't go to a newer version because a bug that affects mononoke was introduced on 0.2.14 (discussion started on T65261126). The issue was reported upstream tokio-rs/tokio#2390 This diff simply changed the version number on `fbsource/third-party/rust/Cargo.toml` and ran `fbsource/third-party/rust/reindeer/vendor`. Also ran `buck run //common/rust/cargo_from_buck:cargo_from_buck` to fix the tokio version on generated cargo files Reviewed By: krallin Differential Revision: D21043344 fbshipit-source-id: e61797317a581aa87a8a54e9e2ae22655f22fb97
Summary: This is needed because the tonic crate (see the diff stack) relies on tokio ^0.2.13 We can't go to a newer version because a bug that affects mononoke was introduced on 0.2.14 (discussion started on T65261126). The issue was reported upstream tokio-rs/tokio#2390 This diff simply changed the version number on `fbsource/third-party/rust/Cargo.toml` and ran `fbsource/third-party/rust/reindeer/vendor`. Also ran `buck run //common/rust/cargo_from_buck:cargo_from_buck` to fix the tokio version on generated cargo files Reviewed By: krallin Differential Revision: D21043344 fbshipit-source-id: e61797317a581aa87a8a54e9e2ae22655f22fb97
This backports #2049 to the 0.1 branch. Without this change, polling > 200 futures trough a FuturesUnordered on a Tokio 0.2 executor results in a busy loop in Tokio's cooperative scheduling module. See for a repro of where this breaks: tokio-rs/tokio#2390 Tested by running the reproducer I submitted there. Without this change, it hangs forever (spinning on CPU). With the change, it doesn't.
+1 for being able to turn off coop. I know some people are going to have different opinions on this, but I explicitly don't want fairness in tasks. (For context, my usage is loading assets for games, and possibly in the future executing jobs that may wait on locks, GPU fences, or each other)
I would probably still be of this opinion, even if I was writing something more "vanilla" like a web service, but I do understand not everyone is going to share my opinion on this! And long term, maybe my usage isn't precisely what tokio is designed for. But maybe it's not too difficult to allow disabling it (or allow modifying the heuristics to an impossibly high value to effectively disable it) My current workaround is to use tokio 0.2.13. Otherwise my application hangs consistently with even a relatively small number of in-flight asset loads. |
Fixed in futures 0.1.30 |
Version
Platform
Linux, OSX.
Subcrates
sync, coop
Description
We are seeing a livelock in poll_proceed when executing a futures 0.1 stream built using poll_fn and FuturesUnordered where the futures driven by FuturesUnordered calls
Semaphore::acquire
.I have implemented a repro here: https://github.com/krallin/repro-tokio-livelock. To repro, just
cargo run
. It'll hang forever.The code is:
The livelock is here:
We'd love some help understanding troubleshooting this issue. Thanks! Let us know if there is anything we can do to help root-cause further.
The text was updated successfully, but these errors were encountered: