Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog: reducing tail latencies with auto yielding #422

Merged
merged 18 commits into from
Apr 1, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions content/blog/2020-04-preemption.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
+++
date = "2020-03-31"
title = "Reducing tail latencies with automatic cooperative task yielding"
description = "April 1, 2020"
menu = "blog"
weight = 982
+++

Tokio is a runtime for asynchronous Rust applications. It allows writing code
using `async` & `await` syntax. For example:

```rust
let mut listener = TcpListener::bind(&addr).await?;

loop {
let (mut socket, _) = listener.accept().await?;

tokio::spawn(async move {
// handle socket
});
}
```

The Rust compiler transforms this code into a state machine. The Tokio runtime
executes these state machines, multiplexing many tasks on a handful of threads.
Tokio's scheduler requires that the generated task state machine yields control
carllerche marked this conversation as resolved.
Show resolved Hide resolved
back to the scheduler in order to multiplex tasks. Each `.await` call is an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI: I might rephrase this like

Suggested change
Tokio's scheduler requires that the generated task state machine yields control
back to the scheduler in order to multiplex tasks. Each `.await` call is an
In order to multiplex tasks, Tokio's scheduler requires that the generated task
state machine yields control back to the scheduler. Each `.await` call is an

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also consider reframing this as a requirement of Rust's futures model, rather than of Tokio's scheduler in particular?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note overlap with this.

opportunity to yield back to the scheduler. In the above example,
`listener.accept().await` will return a socket if one is pending. If there are
no pending sockets, control is yielded back to the scheduler.

This system works well in most cases. However, when a system comes under load,
it is possible for an asynchronous resource to never be "not ready". For
carllerche marked this conversation as resolved.
Show resolved Hide resolved
example, consider an echo server:

```rust
tokio::spawn(async move {
let mut buf = [0; 1024];

loop {
let n = socket.read(&mut buf).await?;

if n == 0 {
break;
}

// Write the data back
socket.write(buf[..n]).await?;
}
});
```

If data is received faster than it can be processed, it is possible that, by the
time the processing of a data chunk completes, more data has already been
received. In this case, `.await` will never yield control back to the scheduler,
carllerche marked this conversation as resolved.
Show resolved Hide resolved
other tasks will not be scheduled, resulting in starvation and large latency
variance.

Currently, the answer to this problem is that the user of Tokio is responsible
for adding yield points every so often. In practice, very few actually do this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to yield_now, and maybe also rust-lang/futures-rs#2047 ?

and end up being vulnerable to this sort of problem.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

A common solution to this problem is preemption. OS threads will interrupt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"With normal OS threads, the kernel will interrupt..."

thread execution every so often in order to ensure fair scheduling of all
threads. Runtimes that have full control over execution (golang, erlang, ...)
carllerche marked this conversation as resolved.
Show resolved Hide resolved
will also use preemption to ensure fair scheduling of tasks. This is
accomplished by injecting yield points when compiling code that check if the
task has been executing for long enough and yielding back to the scheduler.
carllerche marked this conversation as resolved.
Show resolved Hide resolved
Unfortunately, Tokio is not able to use this technique as `async` Rust does not
inject any sort of yield point.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

## Per-task operation budget

Even though Tokio is not able to **preempt**, there is still an opportunity to
nudge a task to yield back to the scheduler. As of [0.2.14], each Tokio task has
an operation budget. This budget is reset when the scheduler switches to the
task. Each Tokio resource (socket, timer, channel, ...) is now aware of this
carllerche marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
task. Each Tokio resource (socket, timer, channel, ...) is now aware of this
task. Each Tokio resource (e.g., sockets, timers, channels, etc.) is now aware of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should have either "e.g." or "etc." here but not both.

budget. As long as the task as budget remaining, the resource operates as it did
previously. Each asynchronous operation (actions that users must `.await` on)
carllerche marked this conversation as resolved.
Show resolved Hide resolved
decrement the task's budget. Once the task is out of budget, all resources will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really true though. It's only true if they await a tokio resource (the sentence says "Each asynchronous operation"). And I guess we also don't want to get into the details of how it's really every poll call, not every .await.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you update it... I guess I can say "all tokio resources".

carllerche marked this conversation as resolved.
Show resolved Hide resolved
become "not ready" until the task yields back to the scheduler, at which point,
the budget is reset.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

Going back to the echo server example from above. When the task is scheduled, it
carllerche marked this conversation as resolved.
Show resolved Hide resolved
is assigned a budget of 128 operations. When `socket.read(..)` and
carllerche marked this conversation as resolved.
Show resolved Hide resolved
`socket.write(..)` are called, the budget is decremented. If the budget is zero,
the task yields back to the scheduler. If either `read` or `write` cannot
proceed due to the underlying socket not being ready (no pending data or a full
send buffer), then the task also yield back to the scheduler.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

The idea originated from a conversation I was having with [Ryan Dahl][ry]. He is
carllerche marked this conversation as resolved.
Show resolved Hide resolved
using Tokio as the underlying runtime for [Deno][deno]. When doing some HTTP
experimentation (a while back) with [Hyper], he was seeing some high tail
carllerche marked this conversation as resolved.
Show resolved Hide resolved
latencies in some benchmarks. The problem was due to a loop not yielding back to
the scheduler under load. Hyper ended up fixing the problem by hand in this one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/tioli: can we reference a Hyper PR/commit for the fix?

case, but Ryan mentioned that, when he worked on [node.js][node], they handled
the problem by adding **per resource** limits. So, if a TCP socket was always
ready, it would force a yield every so often. I mentioned this conversation to
[Jon Gjenset][jonhoo], and he ended up coming up with the idea of placing the
limit on the task itself instead of on each resource.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

The end result is that Tokio should be able to provide more consistent runtime
behavior under load. While the exact heuristics will most likely be tweaked over
time, initial measurements show that, in some cases, tail latencies are reduced
almost 3x.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

[![benchmark](https://user-images.githubusercontent.com/176295/73222456-4a103300-4131-11ea-9131-4e437ecb9a04.png)](https://user-images.githubusercontent.com/176295/73222456-4a103300-4131-11ea-9131-4e437ecb9a04.png)

"master" is before the automatic yielding and "preempt" is after. Click for a
bigger version, also see the original [PR comment][pr] for more details.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

carllerche marked this conversation as resolved.
Show resolved Hide resolved
## A note on blocking

While automatic cooperative task yielding improves many cases, it cannot preempt
tasks. Users of Tokio must still take care to avoid CPU intensive work and using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say something here to clarify what exactly the distinction between "automatic yielding" and "preemption" is?

blocking APIs. The [`spawn_blocking`][spawn_blocking] function can be used to
"asyncify" these sorts of tasks by running them on a thread pool.
carllerche marked this conversation as resolved.
Show resolved Hide resolved
carllerche marked this conversation as resolved.
Show resolved Hide resolved

Tokio does not, and will not attempt to detect blocking tasks and automatically
compensate by adding threads to the scheduler. This question has come up a
number of times in the past so allow me to elaborate.
carllerche marked this conversation as resolved.
Show resolved Hide resolved

For context, the idea is for the scheduler to include a monitoring thread. This
thread would poll scheduler threads every so often and check that workers are
making progress. If a worker is not making progress, it is assumed that the
worker is executing a blocking task and a new thread should be spawned to
carllerche marked this conversation as resolved.
Show resolved Hide resolved
compensate.

This idea is not new. The first occurence of this strategy that I am aware of is
in the .NET thread pool and was introduced more than ten years ago.
carllerche marked this conversation as resolved.
Show resolved Hide resolved
Unfortunately, the strategy has a number of problems and because of this, has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unfortunately, the strategy has a number of problems and because of this, has
Unfortunately, the strategy has a number of problems, and because of this, has

not been featured in other thread pools / schedulers (golang, java, erlang,
carllerche marked this conversation as resolved.
Show resolved Hide resolved
...).
carllerche marked this conversation as resolved.
Show resolved Hide resolved

The first problem is that it is very hard to define "progress". A naive
definition of progress is whether or not a task has been scheduled for over some
unit of time. For example, if a worker has been stuck scheduling the same task
for more than 100ms, then that worker is flagged as blocked and a new thread is
spawned. In this definition, how does one detect scenarios where spawning a new
thread **reduces** throughput? This can happen when the scheduler is generally
under load and adding threads would make the situation much worse. To combat
this, the .NET thread pool uses [hill climbing][hill].
Copy link
Contributor

@Darksonn Darksonn Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a few more words can be added about the hill climbing heuristic they use? I know what hill climbing is as I specialize in OR, but even that doesn't let me guess any further details on what they are measuring here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found https://mattwarren.org/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/ which seems like a pretty good discussion of the specific hill-climbing approach used in CLR (at a glance). May be good as a second reference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there reference(s) for how this was a problem in .NET? If so, it would be nice to link to that.


The second problem is that any automatic detection strategy will be vulnerable
to bursty or otherwise uneven workloads. This specific problem has been the bane
of the .NET thread pool and is known as the "stuttering" problem. The hill
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there references we can link to showing the impacts of the stuttering problem in .NET? After a quick google search, I found http://joeduffyblog.com/2006/07/08/clr-thread-pool-injection-stuttering-problems/ from Joe Duffy's blog (although it's also from 2006)

climbing strategy requires some period of time (hundreds of milliseconds) to
adapt to load changes. This time period is needed, in part, to be able to
determine that adding threads is improving the situation and not making it
worse.

The stuttering problem can be managed with the .NET thread pool, in part,
because the pool is designed to schedule **coarse** tasks, i.e. tasks that
execute in the order of hundreds of milliseconds to multiple seconds. However,
asynchronous task schedulers are designed to schedule tasks that should run in
carllerche marked this conversation as resolved.
Show resolved Hide resolved
the order of micro seconds to tens of milliseconds at most. In this case, any
carllerche marked this conversation as resolved.
Show resolved Hide resolved
stutttering problem from a heuristic based scheduler will result in far greater
carllerche marked this conversation as resolved.
Show resolved Hide resolved
latency variations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the order of micro seconds to tens of milliseconds at most. In this case, any
stutttering problem from a heuristic based scheduler will result in far greater
latency variations.
the order of microseconds to tens of milliseconds at most. In this case, any
stuttering problem from a heuristic based scheduler will result in far greater
latency variations.


The most common follow up question I've received after this is "doesn't the Go
carllerche marked this conversation as resolved.
Show resolved Hide resolved
scheduler automatically detect blocked tasks?". The short answer is: no. Doing
so would result in the same stuttering problems as mentioned above. Also, Go has
no need to have generalized blocked task detection because Go is able to
preempt. What the Go scheduler **does** do is annotate potentially blocking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something we can link to for more information on how Go annotates potentially blocking calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, doesn't Go inject yield points as well? Good references here are golang/go#10958 and golang/go#24543.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly got this by reading the source...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what I can ref.

system calls. This is roughly equivalent to the Tokio APIs
[`spawn_blocking`][spawn_blocking] and [`block_in_place`][block_in_place].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that Go does this in the standard library, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tokio does as well... for example tokio::fs. The difference being that Tokio provides access to these fns as it doesn't preempt.


In short, as of now, the automatic cooperative task yielding strategy that has
just been introduced is the best we have found for reducing tail latencies.
hawkw marked this conversation as resolved.
Show resolved Hide resolved

<div style="text-align:right">&mdash;Carl Lerche</div>


[0.2.14]: #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to update this once it has been released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me, i had already forgotten... I'll probably forget again anyway 😆

[ry]: https://github.com/ry
[deno]: https://github.com/denoland/deno
[Hyper]: github.com/hyperium/hyper/
[node]: https://nodejs.org
[jonhoo]: https://github.com/jonhoo/
[pr]: https://github.com/tokio-rs/tokio/pull/2160#issuecomment-579004856
[spawn_blocking]: https://docs.rs/tokio/0.2/tokio/task/fn.spawn_blocking.html
[block_in_place]: https://docs.rs/tokio/0.2/tokio/task/fn.block_in_place.html
[hill]: https://en.wikipedia.org/wiki/Hill_climbing
2 changes: 1 addition & 1 deletion layouts/partials/header.html
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
</li>
<li class="nav-item">
{{ $isCommunity := hasPrefix .URL "/community" }}
<a class="nav-link {{ if $isCommunity }} active {{ end }}" href="{{ ref . "/community.md" }}">Community</a>
<a class="nav-link {{ if $isCommunity }} active {{ end }}" href="{{ ref . "/community" }}">Community</a>
</li>
<li class="nav-item">
{{ $blog := index (.Site.Menus.blog) 0 }}
Expand Down