Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pooling allocator: add a reuse-affinity policy. #3738

Merged
merged 14 commits into from
Feb 2, 2022

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Jan 28, 2022

(Builds on #3697.)

This policy attempts to reuse the same instance slot for subsequent
instantiations of the same module. This is particularly useful when
using a pooling backend such as memfd that benefits from this reuse: for
example, in the memfd case, instantiating the same module into the same
slot allows us to avoid several calls to mmap() because the same
mappings can be reused.

The policy tracks a freelist per "compiled module ID", and when
allocating a slot for an instance, tries these two options in order:

  1. A slot from the freelist for this module (i.e., last used for another
    instantiation of this particular module), or
  2. A slot that was last used by some other module or never before.

The "victim" slot for choice 2 is randomly chosen.

The data structures are carefully designed so that all updates are O(1),
and there is no retry-loop in any of the random selection.

This policy is now the default when the memfd backend is selected via
the memfd-allocator feature flag.

@cfallin cfallin requested a review from alexcrichton January 28, 2022 08:04
@cfallin
Copy link
Member Author

cfallin commented Jan 28, 2022

After writing this, it occurs to me that the reuse policy as stated will choose with equal probability a module whose freelist we steal from, but this does not imply equal probability for any slot to be stolen.

In other words, if we have one module with average occupancy of 500 preinitialized slots out of 1000, and 500 others with 1 slot each, and a new module comes along and wants a slot, we have only 1/501 chance of picking one of the 500.

To unbias this I should probably keep a global freelist (whole pool of choices mixed together), randomly pick from that freelist, and keep a reverse-index of slot to last-allocated module to remove the index from that module's freelist (or lazily do so next time we look at that list). I'll take a closer look at this tomorrow!

@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Jan 28, 2022
@github-actions
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@cfallin cfallin force-pushed the pooling-affinity branch 5 times, most recently from e319285 to bf67ab2 Compare January 29, 2022 01:29
@cfallin
Copy link
Member Author

cfallin commented Jan 29, 2022

I've updated this PR now to use a better data structure/algorithm design. It performs a fair random choice of victim slot when no slots with the desired affinity are available, and it has all O(1) updates -- somewhat tricky given the need to maintain two freelists (global and per-module) and remove from both. This is done by keeping Vecs and using swap_remove, and tracking a slot's position in each freelist in a separate reverse-index. Hopefully the comments make this a little more clear.

I've added a randomized test that counts ID-reuse (a little random simulation of sorts) and verifies a reasonable hit rate (at least twice what would be expected with random reuse) as well.

@cfallin cfallin force-pushed the pooling-affinity branch 3 times, most recently from b9265fd to f7d63bf Compare January 29, 2022 03:27
@fitzgen
Copy link
Member

fitzgen commented Jan 31, 2022

The policy tracks a freelist per "compiled module ID", and when
allocating a slot for an instance, tries these three options in order:

1. A slot from the freelist for this module (i.e., last used for another
  instantiation of this particular module), or
3. A slot that was last used by some other module or never before.

1..3 👀

@cfallin
Copy link
Member Author

cfallin commented Jan 31, 2022

1..3 👀

Incomplete edit, sorry! The distinction between the last two (empty, and then slot with other affinity) was removed because it made the data structure simpler, and in steady-state (past the first n_slot instantiations in the process) no slots will be empty.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! A couple questions, suggestions, and nitpicks below.

As first suggested by Jan on the Zulip here [1], a cheap and effective
way to obtain copy-on-write semantics of a "backing image" for a Wasm
memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism
provided by the Linux kernel allows us to create anonymous,
in-memory-only files that we can use for this mapping, so we can
construct the image contents on-the-fly then effectively create a CoW
overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)`
will discard the CoW overlay, returning the mapping to its original
state.

By itself this is almost enough for a very fast
instantiation-termination loop of the same image over and over,
without changing the address space mapping at all (which is
expensive). The only missing bit is how to implement
heap *growth*. But here memfds can help us again: if we create another
anonymous file and map it where the extended parts of the heap would
go, we can take advantage of the fact that a `mmap()` mapping can
be *larger than the file itself*, with accesses beyond the end
generating a `SIGBUS`, and the fact that we can cheaply resize the
file with `ftruncate`, even after a mapping exists. So we can map the
"heap extension" file once with the maximum memory-slot size and grow
the memfd itself as `memory.grow` operations occur.

The above CoW technique and heap-growth technique together allow us a
fastpath of `madvise()` and `ftruncate()` only when we re-instantiate
the same module over and over, as long as we can reuse the same
slot. This fastpath avoids all whole-process address-space locks in
the Linux kernel, which should mean it is highly scalable. It also
avoids the cost of copying data on read, as the `uffd` heap backend
does when servicing pagefaults; the kernel's own optimized CoW
logic (same as used by all file mmaps) is used instead.

[1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772
Testing so far with recent Wasmtime has not been able to show the need
for avoiding the process-wide mmap lock in real-world use-cases. As
such, the technique of using an anonymous file and ftruncate() to extend
it seems unnecessary; instead, memfd can always use anonymous zeroed
memory for heap backing where the CoW image is not present, and
mprotect() to extend the heap limit by changing page protections.
@cfallin
Copy link
Member Author

cfallin commented Jan 31, 2022

I think I addressed all your comments; thanks @fitzgen ! This is rebased on the latest #3697 as well.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! (Note that I haven't looked at any of the earlier commits)

@cfallin cfallin force-pushed the pooling-affinity branch 2 times, most recently from 889fe87 to f65ea00 Compare February 1, 2022 01:13
(This was not a correctness bug, but is an obvious performance bug...)
@cfallin cfallin force-pushed the pooling-affinity branch 2 times, most recently from 2365db0 to 2d28f97 Compare February 1, 2022 06:13
This policy attempts to reuse the same instance slot for subsequent
instantiations of the same module. This is particularly useful when
using a pooling backend such as memfd that benefits from this reuse: for
example, in the memfd case, instantiating the same module into the same
slot allows us to avoid several calls to mmap() because the same
mappings can be reused.

The policy tracks a freelist per "compiled module ID", and when
allocating a slot for an instance, tries these three options in order:

1. A slot from the freelist for this module (i.e., last used for another
   instantiation of this particular module), or
3. A slot that was last used by some other module or never before.

The "victim" slot for choice 2 is randomly chosen.

The data structures are carefully designed so that all updates are O(1),
and there is no retry-loop in any of the random selection.

This policy is now the default when the memfd backend is selected via
the `memfd-allocator` feature flag.
@cfallin cfallin merged commit 5deb1f1 into bytecodealliance:main Feb 2, 2022
@cfallin cfallin deleted the pooling-affinity branch February 2, 2022 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants