-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pooling allocator: add a reuse-affinity policy. #3738
Conversation
After writing this, it occurs to me that the reuse policy as stated will choose with equal probability a module whose freelist we steal from, but this does not imply equal probability for any slot to be stolen. In other words, if we have one module with average occupancy of 500 preinitialized slots out of 1000, and 500 others with 1 slot each, and a new module comes along and wants a slot, we have only 1/501 chance of picking one of the 500. To unbias this I should probably keep a global freelist (whole pool of choices mixed together), randomly pick from that freelist, and keep a reverse-index of slot to last-allocated module to remove the index from that module's freelist (or lazily do so next time we look at that list). I'll take a closer look at this tomorrow! |
Subscribe to Label Actioncc @peterhuene
This issue or pull request has been labeled: "wasmtime:api"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
e319285
to
bf67ab2
Compare
I've updated this PR now to use a better data structure/algorithm design. It performs a fair random choice of victim slot when no slots with the desired affinity are available, and it has all O(1) updates -- somewhat tricky given the need to maintain two freelists (global and per-module) and remove from both. This is done by keeping Vecs and using swap_remove, and tracking a slot's position in each freelist in a separate reverse-index. Hopefully the comments make this a little more clear. I've added a randomized test that counts ID-reuse (a little random simulation of sorts) and verifies a reasonable hit rate (at least twice what would be expected with random reuse) as well. |
b9265fd
to
f7d63bf
Compare
1..3 👀 |
Incomplete edit, sorry! The distinction between the last two (empty, and then slot with other affinity) was removed because it made the data structure simpler, and in steady-state (past the first |
f7d63bf
to
01c044e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good! A couple questions, suggestions, and nitpicks below.
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap *growth*. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be *larger than the file itself*, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772
Testing so far with recent Wasmtime has not been able to show the need for avoiding the process-wide mmap lock in real-world use-cases. As such, the technique of using an anonymous file and ftruncate() to extend it seems unnecessary; instead, memfd can always use anonymous zeroed memory for heap backing where the CoW image is not present, and mprotect() to extend the heap limit by changing page protections.
01c044e
to
ba56068
Compare
739f2e5
to
69312ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! (Note that I haven't looked at any of the earlier commits)
crates/runtime/src/instance/allocator/pooling/index_allocator.rs
Outdated
Show resolved
Hide resolved
69312ad
to
bf974ad
Compare
889fe87
to
f65ea00
Compare
(This was not a correctness bug, but is an obvious performance bug...)
2365db0
to
2d28f97
Compare
2d28f97
to
6159eae
Compare
6159eae
to
bba70fc
Compare
…ng the initial mmap.
bba70fc
to
9fde5fe
Compare
9fde5fe
to
2cfcc6e
Compare
cc245e3
to
da84b13
Compare
da84b13
to
5b32eb2
Compare
This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.
5b32eb2
to
1cbd393
Compare
(Builds on #3697.)
This policy attempts to reuse the same instance slot for subsequent
instantiations of the same module. This is particularly useful when
using a pooling backend such as memfd that benefits from this reuse: for
example, in the memfd case, instantiating the same module into the same
slot allows us to avoid several calls to mmap() because the same
mappings can be reused.
The policy tracks a freelist per "compiled module ID", and when
allocating a slot for an instance, tries these two options in order:
instantiation of this particular module), or
The "victim" slot for choice 2 is randomly chosen.
The data structures are carefully designed so that all updates are O(1),
and there is no retry-loop in any of the random selection.
This policy is now the default when the memfd backend is selected via
the
memfd-allocator
feature flag.