-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allocate shard metadata lazily #45
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
hawkw
changed the title
perf: allocate shard metadata lazily
fix: allocate shard metadata lazily
Oct 15, 2020
hawkw
added a commit
that referenced
this pull request
Oct 15, 2020
Currently, creating a new `Slab` allocates a shards array of `Shard` structs. The `Shard` struct itself owns two boxed arrays of local and shared metadata for each page on that shard. Even though we don't allocate the actual storage arrays for those pages until they are needed, allocating the shard metadata eagerly means that a completely empty slab results in a fairly large memory allocation up front. This is especially the case when used with the default `Config`, which (on 64-bit machines) allows up to 4096 threads. On a 64-bit machine, the `Shared` page metadata is 4 words, so 32 bytes, and the `Local` metadata is another word. 33 bytes * 32 pages per shard = 1056 bytes, which is a little over 1kb per shard. This means that the default config eagerly allocates 4096 shards * 1056 bytes is about 4mb of metadata, even when the program only has one or two threads in it. and the remaining 4000-some possible threads will never allocate their shards. When most of the shards are empty because there are very few threads in the program, most of this allocated memory is not *resident*, and gets paged out by the operating system, but it results in a very surprising amount of allocated virtual memory. This is the cause of issues like tokio-rs/tracing#1005. Furthermore, allocating all of this means that actually _constructing_ a slab takes a pretty long time. In `tracing-subscriber`, this is normally not a major issue, since subscribers tend to be created on startup and live for the entire lifetime of the program. However, in some use-cases, like creating a separate subscriber for each test, the performance impact of allocating all that metadata is quite significant. See, for example: rust-lang/rust-analyzer#5792 (comment) This branch fixes this by allocating the shard metadata only when a new shard is actually needed by a new thread. The shard array is now an array of `AtomicPtr`s to shards, and shards are only allocated the first time they are `insert`ed to. Since each thread can only insert to its own shard, the synchronization logic for this is fairly simple. However, since the shards are morally, although not actually, _owned_ by these `AtomicPtr`s, there is the potential for leaks when a slab is dropped, if we don't also ensure that all the shards it creates are also dropped. Therefore, we use `loom::alloc::Track` for leak detection in tests. Fortunately, the logic for ensuring these are deallocated is not too complex. Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
to tokio-rs/tracing
that referenced
this pull request
Oct 22, 2020
## Motivation hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard metadata is allocated only when a new shard is created, rather than all up front when the slab is created. This fixes the very large amount of memory allocated by simply creating a new `Registry` without actually collecting any traces. ## Solution This branch updates `tracing-subscriber` to depend on `sharded-slab` 0.1.0, which includes the upstream fix. In addition, this branch the registry from using `sharded_slab::Slab` to `sharded_slab::Pool`. This allows us to clear hashmap allocations for extensions in-place, retaining the already allocated maps. This should improve `new_span` performance a bit. Fixes #1005
hawkw
added a commit
to tokio-rs/tracing
that referenced
this pull request
Oct 22, 2020
This backports #1062 to v0.1.6. This has already been approved on master. hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard metadata is allocated only when a new shard is created, rather than all up front when the slab is created. This fixes the very large amount of memory allocated by simply creating a new `Registry` without actually collecting any traces. This branch updates `tracing-subscriber` to depend on `sharded-slab` 0.1.0, which includes the upstream fix. In addition, this branch the registry from using `sharded_slab::Slab` to `sharded_slab::Pool`. This allows us to clear hashmap allocations for extensions in-place, retaining the already allocated maps. This should improve `new_span` performance a bit. Fixes #1005 Signed-off-by: Eliza Weisman <[email protected]>
hawkw
added a commit
to tokio-rs/tracing
that referenced
this pull request
Oct 22, 2020
This backports #1062 to v0.1.6. This has already been approved on master. hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard metadata is allocated only when a new shard is created, rather than all up front when the slab is created. This fixes the very large amount of memory allocated by simply creating a new `Registry` without actually collecting any traces. This branch updates `tracing-subscriber` to depend on `sharded-slab` 0.1.0, which includes the upstream fix. In addition, this branch the registry from using `sharded_slab::Slab` to `sharded_slab::Pool`. This allows us to clear hashmap allocations for extensions in-place, retaining the already allocated maps. This should improve `new_span` performance a bit. Fixes #1005 Signed-off-by: Eliza Weisman <[email protected]>
kaffarell
pushed a commit
to kaffarell/tracing
that referenced
this pull request
May 22, 2024
This backports tokio-rs#1062 to v0.1.6. This has already been approved on master. hawkw/sharded-slab#45 changes `sharded-slab` so that the per-shard metadata is allocated only when a new shard is created, rather than all up front when the slab is created. This fixes the very large amount of memory allocated by simply creating a new `Registry` without actually collecting any traces. This branch updates `tracing-subscriber` to depend on `sharded-slab` 0.1.0, which includes the upstream fix. In addition, this branch the registry from using `sharded_slab::Slab` to `sharded_slab::Pool`. This allows us to clear hashmap allocations for extensions in-place, retaining the already allocated maps. This should improve `new_span` performance a bit. Fixes tokio-rs#1005 Signed-off-by: Eliza Weisman <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, creating a new
Slab
allocates a shards array ofShard
structs. The
Shard
struct itself owns two boxed arrays of local andshared metadata for each page on that shard. Even though we don't
allocate the actual storage arrays for those pages until they are
needed, allocating the shard metadata eagerly means that a completely
empty slab results in a fairly large memory allocation up front. This is
especially the case when used with the default
Config
, which (on64-bit machines) allows up to 4096 threads. On a 64-bit machine, the
Shared
page metadata is 4 words, so 32 bytes, and theLocal
metadatais another word. 33 bytes * 32 pages per shard = 1056 bytes, which is a
little over 1kb per shard. This means that the default config eagerly
allocates 4096 shards * 1056 bytes is about 4mb of metadata, even when
the program only has one or two threads in it. and the remaining
4000-some possible threads will never allocate their shards.
When most of the shards are empty because there are very few threads in
the program, most of this allocated memory is not resident, and gets
paged out by the operating system, but it results in a very surprising
amount of allocated virtual memory. This is the cause of issues like
tokio-rs/tracing#1005.
Furthermore, allocating all of this means that actually constructing a
slab takes a pretty long time. In
tracing-subscriber
, this is normallynot a major issue, since subscribers tend to be created on startup and
live for the entire lifetime of the program. However, in some use-cases,
like creating a separate subscriber for each test, the performance
impact of allocating all that metadata is quite significant. See, for
example:
rust-lang/rust-analyzer#5792 (comment)
This branch fixes this by allocating the shard metadata only when a new
shard is actually needed by a new thread. The shard array is now an
array of
AtomicPtr
s to shards, and shards are only allocated the firsttime they are
insert
ed to. Since each thread can only insert to itsown shard, the synchronization logic for this is fairly simple. However,
since the shards are morally, although not actually, owned by these
AtomicPtr
s, there is the potential for leaks when a slab is dropped,if we don't also ensure that all the shards it creates are also dropped.
Therefore, we use
loom::alloc::Track
for leak detection in tests.Fortunately, the logic for ensuring these are deallocated is not too
complex.