Pooling allocator: add a reuse-affinity policy. #3738

cfallin · 2022-01-28T08:04:17Z

(Builds on #3697.)

This policy attempts to reuse the same instance slot for subsequent
instantiations of the same module. This is particularly useful when
using a pooling backend such as memfd that benefits from this reuse: for
example, in the memfd case, instantiating the same module into the same
slot allows us to avoid several calls to mmap() because the same
mappings can be reused.

The policy tracks a freelist per "compiled module ID", and when
allocating a slot for an instance, tries these two options in order:

A slot from the freelist for this module (i.e., last used for another
instantiation of this particular module), or
A slot that was last used by some other module or never before.

The "victim" slot for choice 2 is randomly chosen.

The data structures are carefully designed so that all updates are O(1),
and there is no retry-loop in any of the random selection.

This policy is now the default when the memfd backend is selected via
the memfd-allocator feature flag.

cfallin · 2022-01-28T08:11:23Z

After writing this, it occurs to me that the reuse policy as stated will choose with equal probability a module whose freelist we steal from, but this does not imply equal probability for any slot to be stolen.

In other words, if we have one module with average occupancy of 500 preinitialized slots out of 1000, and 500 others with 1 slot each, and a new module comes along and wants a slot, we have only 1/501 chance of picking one of the 500.

To unbias this I should probably keep a global freelist (whole pool of choices mixed together), randomly pick from that freelist, and keep a reverse-index of slot to last-allocated module to remove the index from that module's freelist (or lazily do so next time we look at that list). I'll take a closer look at this tomorrow!

github-actions · 2022-01-28T08:35:48Z

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

crates/runtime/src/module_id.rs

cfallin · 2022-01-29T01:29:30Z

I've updated this PR now to use a better data structure/algorithm design. It performs a fair random choice of victim slot when no slots with the desired affinity are available, and it has all O(1) updates -- somewhat tricky given the need to maintain two freelists (global and per-module) and remove from both. This is done by keeping Vecs and using swap_remove, and tracking a slot's position in each freelist in a separate reverse-index. Hopefully the comments make this a little more clear.

I've added a randomized test that counts ID-reuse (a little random simulation of sorts) and verifies a reasonable hit rate (at least twice what would be expected with random reuse) as well.

fitzgen · 2022-01-31T17:09:52Z

The policy tracks a freelist per "compiled module ID", and when
allocating a slot for an instance, tries these three options in order:

1. A slot from the freelist for this module (i.e., last used for another
  instantiation of this particular module), or
3. A slot that was last used by some other module or never before.

1..3 👀

cfallin · 2022-01-31T17:39:12Z

1..3 👀

Incomplete edit, sorry! The distinction between the last two (empty, and then slot with other affinity) was removed because it made the data structure simpler, and in steady-state (past the first n_slot instantiations in the process) no slots will be empty.

fitzgen

This is looking really good! A couple questions, suggestions, and nitpicks below.

crates/runtime/src/instance/allocator/pooling/index_allocator.rs

As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap *growth*. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be *larger than the file itself*, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772

Testing so far with recent Wasmtime has not been able to show the need for avoiding the process-wide mmap lock in real-world use-cases. As such, the technique of using an anonymous file and ftruncate() to extend it seems unnecessary; instead, memfd can always use anonymous zeroed memory for heap backing where the CoW image is not present, and mprotect() to extend the heap limit by changing page protections.

cfallin · 2022-01-31T22:44:32Z

I think I addressed all your comments; thanks @fitzgen ! This is rebased on the latest #3697 as well.

fitzgen

Thanks! (Note that I haven't looked at any of the earlier commits)

crates/runtime/src/instance/allocator/pooling/index_allocator.rs

(This was not a correctness bug, but is an obvious performance bug...)

…ng the initial mmap.

This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.

cfallin requested a review from alexcrichton January 28, 2022 08:04

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Jan 28, 2022

fitzgen reviewed Jan 28, 2022

View reviewed changes

crates/runtime/src/module_id.rs Outdated Show resolved Hide resolved

cfallin force-pushed the pooling-affinity branch 5 times, most recently from e319285 to bf67ab2 Compare January 29, 2022 01:29

cfallin force-pushed the pooling-affinity branch 3 times, most recently from b9265fd to f7d63bf Compare January 29, 2022 03:27

cfallin force-pushed the pooling-affinity branch from f7d63bf to 01c044e Compare January 31, 2022 17:41

fitzgen reviewed Jan 31, 2022

View reviewed changes

cfallin added 3 commits January 31, 2022 12:53

Use MemFdSlot in the on-demand allocator as well.

570dee6

cfallin force-pushed the pooling-affinity branch from 01c044e to ba56068 Compare January 31, 2022 22:03

cfallin force-pushed the pooling-affinity branch from 739f2e5 to 69312ad Compare January 31, 2022 22:58

fitzgen approved these changes Jan 31, 2022

View reviewed changes

crates/runtime/src/instance/allocator/pooling/index_allocator.rs Outdated Show resolved Hide resolved

cfallin force-pushed the pooling-affinity branch from 69312ad to bf974ad Compare January 31, 2022 23:11

Review feedback.

982df2f

cfallin force-pushed the pooling-affinity branch 2 times, most recently from 889fe87 to f65ea00 Compare February 1, 2022 01:13

Optimization: only mprotect the *new* bit of heap, not all of it.

ccfa245

(This was not a correctness bug, but is an obvious performance bug...)

cfallin force-pushed the pooling-affinity branch 2 times, most recently from 2365db0 to 2d28f97 Compare February 1, 2022 06:13

Make build-config magic use memfd by default.

0ff8f6a

cfallin force-pushed the pooling-affinity branch from 2d28f97 to 6159eae Compare February 1, 2022 06:39

cfallin mentioned this pull request Feb 1, 2022

memfd/madvise-based CoW pooling allocator #3697

Merged

Review feedback.

01e6bb8

cfallin force-pushed the pooling-affinity branch from 6159eae to bba70fc Compare February 1, 2022 23:57

Fix to the optimization: mprotect(NONE) sometimes needed after skippi…

84a8368

…ng the initial mmap.

cfallin force-pushed the pooling-affinity branch from bba70fc to 9fde5fe Compare February 2, 2022 00:37

Review comments.

94410a8

cfallin force-pushed the pooling-affinity branch from 9fde5fe to 2cfcc6e Compare February 2, 2022 18:04

Add additional tests for MemFdSlot.

0ec45d3

cfallin force-pushed the pooling-affinity branch from cc245e3 to da84b13 Compare February 2, 2022 19:35

Review comments.

d7b04f5

cfallin force-pushed the pooling-affinity branch from da84b13 to 5b32eb2 Compare February 2, 2022 19:43

cfallin added 3 commits February 2, 2022 12:25

Skip memfd tests when on qemu, due to differing madvise semantics.

9880eba

Review comments.

1cbd393

cfallin force-pushed the pooling-affinity branch from 5b32eb2 to 1cbd393 Compare February 2, 2022 20:25

cfallin merged commit 5deb1f1 into bytecodealliance:main Feb 2, 2022

cfallin deleted the pooling-affinity branch February 2, 2022 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pooling allocator: add a reuse-affinity policy. #3738

Pooling allocator: add a reuse-affinity policy. #3738

cfallin commented Jan 28, 2022 •

edited

Loading

cfallin commented Jan 28, 2022 •

edited

Loading

github-actions bot commented Jan 28, 2022

cfallin commented Jan 29, 2022

fitzgen commented Jan 31, 2022 •

edited

Loading

cfallin commented Jan 31, 2022

fitzgen left a comment

cfallin commented Jan 31, 2022

fitzgen left a comment

Pooling allocator: add a reuse-affinity policy. #3738

Pooling allocator: add a reuse-affinity policy. #3738

Conversation

cfallin commented Jan 28, 2022 • edited Loading

cfallin commented Jan 28, 2022 • edited Loading

github-actions bot commented Jan 28, 2022

Subscribe to Label Action

cfallin commented Jan 29, 2022

fitzgen commented Jan 31, 2022 • edited Loading

cfallin commented Jan 31, 2022

fitzgen left a comment

Choose a reason for hiding this comment

cfallin commented Jan 31, 2022

fitzgen left a comment

Choose a reason for hiding this comment

cfallin commented Jan 28, 2022 •

edited

Loading

cfallin commented Jan 28, 2022 •

edited

Loading

fitzgen commented Jan 31, 2022 •

edited

Loading