Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(spsc_fold): potentially missing wakeup when send()ing in state SenderWaitsForReceiverToConsume #10318

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

problame
Copy link
Contributor

@problame problame commented Jan 9, 2025

Problem

Before this PR, there were cases where send() in state
SenderWaitsForReceiverToConsume would never be woken up
by the receiver, because it never registered with wake_sender.

Example Scenario 1: we stop polling a send() future A that was waiting
for the receiver to consume. We drop A and create a new send() future B.
B would return Poll::Pending and never regsister a waker.

Example Scenario 2: a send() future A transitions from HasData
to SenderWaitsForReceiverToConsume. This registers the context X
with wake_sender. But before the Receiver consumes the data,
we poll A from a different context Y.
The state is still SenderWaitsForReceiverToConsume, but we wouldn't
register the new context with wake_sender.
When the Receiver comes around to consume and wake_sender.notify()s,
it wakes the old context X instead of Y.

Fix

Register the waker in the case where we're polled in
state SenderWaitsForReceiverToConsume.

Relation to #10309

I found this bug while investigating #10309.
There was never proof that this bug here is the root cause for #10309.
In the meantime we found a more probably hypothesis
for the root cause than what is being fixed here.
Regardless, let's walk through my thought process about
how it might have been relevant:

There (in page_service), Scenario 1 does not apply because
we poll the send() future to completion.

Scenario 2 (tokio::join!) also does not apply with the
current tokio::join!() impl, because it will just poll each
future every time, each with the same context.
Although if we ever used something like a FuturesUnordered anywhere,
that will be using a different context, so, in that case,
the bug might materialize.

Regarding tokio & spurious poll in general:
@conradludgate is not aware of any spurious wakeup cases in current tokio,
but within a tokio::join!(), any wake meant for one future will poll all
the futures, so that can appear as a spurious wake up to the N-1 futures
of the tokio::join!().

…enderWaitsForReceiverToConsume`

Before this PR, there were cases where send() in state
SenderWaitsForReceiverToConsume would never be woken up
by the receiver, because it never registered with `wake_sender`.

Example Scenario 1: we stop polling a send() future A that was waiting
for the receiver to consume. We drop A and create a new send() future B.
B would return Poll::Pending and never regsister a waker.

Example Scenario 2: a send() future A transitions from HasData
to SenderWaitsForReceiverToConsume. This registers the context X
with `wake_sender`. But before the Receiver consumes the data,
we poll A from a different context Y.
The state is still SenderWaitsForReceiverToConsume, but we wouldn't
register the new context with `wake_sender`.
When the Receiver comes around to consume and `wake_sender.notify()`s,
it wakes the old context X instead of Y.

I found this bug while investigating #10309.
There (in page_service), Scenario 1 does not apply because
we poll the send() future to completion.
I am not certain about Scenario 2; we use tokio::join!.
Need to investigate whether it (or tokio, generally), can
decide to poll, possibly spuriously, without an explicit
wakeup _AND_ a changed context.
Copy link

github-actions bot commented Jan 9, 2025

7286 tests run: 6921 passed, 0 failed, 365 skipped (full report)


Flaky tests (3)

Postgres 17

Postgres 14

Code coverage* (full report)

  • functions: 31.1% (8410 of 27000 functions)
  • lines: 47.8% (66791 of 139616 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
ec900f9 at 2025-01-10T11:16:46.845Z :recycle:

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have much context on spsc_fold nor tokio::join! internals, but the change makes sense to me.

@conradludgate
Copy link
Contributor

One thing to note: https://docs.rs/diatomic-waker/latest/diatomic_waker/struct.DiatomicWaker.html#method.notify

This automatically unregisters any waker that may have been previously registered.

As for the other questions,

  1. tokio::join!() will just poll each future every time, each with the same context.

  2. A tokio task will currently never change the context (the context is currently a pointer to the task). Although if you ever use something like a FuturesUnordered anywhere, that will replace the context.

  3. I don't think there's currently any spurious wakeup cases, but within a join!() it will poll both futures, so that can appear as a spurious wake up for batcher if the executor task is woken.

@problame problame enabled auto-merge January 10, 2025 10:41
@problame problame added this pull request to the merge queue Jan 10, 2025
Merged via the queue into main with commit db00eb4 Jan 10, 2025
86 checks passed
@problame problame deleted the problame/hung-shutdown/spsc-fold-spurious-wakeup branch January 10, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants