-
Notifications
You must be signed in to change notification settings - Fork 36.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use best-fit strategy in Arena, now O(log(n)) instead O(n) #12048
Conversation
This replaces the first-fit algorithm used in the Arena with a best-fit. According to "Dynamic Storage Allocation: A Survey and Critical Review", Wilson et. al. 1995, http://www.scs.stanford.edu/14wi-cs140/sched/readings/wilson.pdf, both startegies work well in practice. The advantage of using best-fit is that we can switch the slow O(n) algorithm to O(log(n)) operations. Additionally, some previously O(log(n)) operations are now replaced with O(1) operations by using a hash map. The end effect is that the benchmark runs about 2.5 times faster on my machine: old: BenchLockedPool, 5, 530, 5.25749, 0.00196938, 0.00199755, 0.00198172 new: BenchLockedPool, 5, 1300, 5.11313, 0.000781493, 0.000793314, 0.00078606 I've run all unit tests and benchmarks.
Context: PR makes sense to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
src/support/lockedpool.cpp
Outdated
if (prev == chunks_free.end() || !extend(prev, freed)) | ||
prev = chunks_free.emplace_hint(next, freed); | ||
if (next != chunks_free.end() && extend(prev, *next)) | ||
// Coalesc freed with previous chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Coalesce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed
src/support/lockedpool.cpp
Outdated
prev = chunks_free.emplace_hint(next, freed); | ||
if (next != chunks_free.end() && extend(prev, *next)) | ||
// Coalesc freed with previous chunk | ||
auto prev = chunks_free_end.find(freed.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the chunks_free_end
map required instead of iterating one backwards from next
as is done in the current code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cunks_free
used to be an std::map
that was sorted by char*
. There it was possible to use std::prev
to just find what memory address came before. I have replaced this sorted map with an std::unordered_map
, then it's not possible any more to just find the previous address quickly within the same map. That's why I have added another map that contains the back links. So this change replaces one O(log(n)) lookup + std::prev() call with two O(1) lookups.
src/support/lockedpool.cpp
Outdated
// This allocation strategy is best-fit. According to "Dynamic Storage Allocation: A Survey and Critical Review", | ||
// Wilson et. al. 1995, http://www.scs.stanford.edu/14wi-cs140/sched/readings/wilson.pdf, best-fit and first-fit | ||
// policies seem to work well in practice. | ||
auto sizePtrIt = size_to_free_chunk.lower_bound(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Variable names are snake_case, per developer guidelines. Fix sizePtrIt
, sizeRemaining
, itRemaining
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Concept ACK - this is a great improvement, thanks! will need to review in detail that various data structures stay consistent, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, and the code looks correct to me. I will try to test this locally this week.
I can find no problems with this, utACK 5fbf7c4 |
5fbf7c4 fix nits: variable naming, typos (Martin Ankerl) 1e0ee90 Use best-fit strategy in Arena, now O(log(n)) instead O(n) (Martin Ankerl) Pull request description: This replaces the first-fit algorithm used in the Arena with a best-fit. According to "Dynamic Storage Allocation: A Survey and Critical Review", Wilson et. al. 1995, http://www.scs.stanford.edu/14wi-cs140/sched/readings/wilson.pdf, both startegies work well in practice. The advantage of using best-fit is that we can switch the O(n) allocation to O(log(n)). Additionally, some previously O(log(n)) operations are now O(1) operations by using hash maps. The end effect is that the benchmark runs about 2.5 times faster on my machine: # Benchmark, evals, iterations, total, min, max, median old: BenchLockedPool, 5, 530, 5.25749, 0.00196938, 0.00199755, 0.00198172 new: BenchLockedPool, 5, 1300, 5.11313, 0.000781493, 0.000793314, 0.00078606 I've run all unit tests and benchmarks, and increased the number of iterations so that BenchLockedPool takes about 5 seconds again. Tree-SHA512: 6551e384671f93f10c60df530a29a1954bd265cc305411f665a8756525e5afe2873a8032c797d00b6e8c07e16d9827465d0b662875433147381474a44119ccce
Changes in bitcoin#12048 cause compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
30fb598 Fix segfault in allocator_tests/arena_tests (Jeffrey Czyz) 15c84f5 Define ARENA_DEBUG in Travis test runs (Jeffrey Czyz) ad71548 Fix compilation errors in support/lockedpool.cpp (Jeffrey Czyz) Pull request description: Changes in #12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG ACKs for top commit: laanwj: ACK 30fb598 fanquake: ACK 30fb598 - thanks for following up jkczyz. Tree-SHA512: 4eec368a4e9c67e4e2a27bc05608a807c2892d50c60d06ed21490cd274c0369f9671bc05b3006acc2a193316caf4896454c9c299603bfed29bd488f1987ec446
…ol.cpp 30fb598 Fix segfault in allocator_tests/arena_tests (Jeffrey Czyz) 15c84f5 Define ARENA_DEBUG in Travis test runs (Jeffrey Czyz) ad71548 Fix compilation errors in support/lockedpool.cpp (Jeffrey Czyz) Pull request description: Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG ACKs for top commit: laanwj: ACK 30fb598 fanquake: ACK 30fb598 - thanks for following up jkczyz. Tree-SHA512: 4eec368a4e9c67e4e2a27bc05608a807c2892d50c60d06ed21490cd274c0369f9671bc05b3006acc2a193316caf4896454c9c299603bfed29bd488f1987ec446
Summary: PR12048: 5fbf7c4 fix nits: variable naming, typos (Martin Ankerl) 1e0ee90 Use best-fit strategy in Arena, now O(log(n)) instead O(n) (Martin Ankerl) Pull request description: This replaces the first-fit algorithm used in the Arena with a best-fit. According to "Dynamic Storage Allocation: A Survey and Critical Review", Wilson et. al. 1995, http://www.scs.stanford.edu/14wi-cs140/sched/readings/wilson.pdf, both startegies work well in practice. The advantage of using best-fit is that we can switch the O(n) allocation to O(log(n)). Additionally, some previously O(log(n)) operations are now O(1) operations by using hash maps. The end effect is that the benchmark runs about 2.5 times faster on my machine: # Benchmark, evals, iterations, total, min, max, median old: BenchLockedPool, 5, 530, 5.25749, 0.00196938, 0.00199755, 0.00198172 new: BenchLockedPool, 5, 1300, 5.11313, 0.000781493, 0.000793314, 0.00078606 I've run all unit tests and benchmarks, and increased the number of iterations so that BenchLockedPool takes about 5 seconds again. Tree-SHA512: 6551e384671f93f10c60df530a29a1954bd265cc305411f665a8756525e5afe2873a8032c797d00b6e8c07e16d9827465d0b662875433147381474a44119ccce Also included PR16161 which fixes a bug introduced by PR12048: Changes in #12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Second commit fixes segfault in allocator_tests/arena_tests The test uses reinterpret_cast<void*> on unallocated memory. Using this memory in printchunk as char* causes a segfault, so have printchunk take void* instead. Reproduced with: make CPPFLAGS=-DARENA_DEBUG Backport of Core PR12048 and PR16161 bitcoin/bitcoin#12048 bitcoin/bitcoin#16161 Reviewer note: the last commit in PR16161 is TRAVIS related and was therefore modified to our code base. Test Plan: make check test_runner.py ./bitcoin-qt -> settings -> Encrypt Wallet -> set password and/or change password if already encrypted ./bench_bitcoin old: BenchLockedPool, 5, 530, 5.15069, 0.00192059, 0.00199956, 0.00193796 new: BenchLockedPool, 5, 1300, 4.44411, 0.00067821, 0.000698535, 0.000678916 ../configure CPPFLAGS=-DARENA_DEBUG make check test_runner.py ABC_BUILD_NAME=build-asan build-configurations.sh Reviewers: deadalnix, Fabien, jasonbcox, O1 Bitcoin ABC, #bitcoin_abc Reviewed By: deadalnix, Fabien, jasonbcox, O1 Bitcoin ABC, #bitcoin_abc Differential Revision: https://reviews.bitcoinabc.org/D3898
Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
…stead O(n) 5fbf7c4 fix nits: variable naming, typos (Martin Ankerl) 1e0ee90 Use best-fit strategy in Arena, now O(log(n)) instead O(n) (Martin Ankerl) Pull request description: This replaces the first-fit algorithm used in the Arena with a best-fit. According to "Dynamic Storage Allocation: A Survey and Critical Review", Wilson et. al. 1995, http://www.scs.stanford.edu/14wi-cs140/sched/readings/wilson.pdf, both startegies work well in practice. The advantage of using best-fit is that we can switch the O(n) allocation to O(log(n)). Additionally, some previously O(log(n)) operations are now O(1) operations by using hash maps. The end effect is that the benchmark runs about 2.5 times faster on my machine: # Benchmark, evals, iterations, total, min, max, median old: BenchLockedPool, 5, 530, 5.25749, 0.00196938, 0.00199755, 0.00198172 new: BenchLockedPool, 5, 1300, 5.11313, 0.000781493, 0.000793314, 0.00078606 I've run all unit tests and benchmarks, and increased the number of iterations so that BenchLockedPool takes about 5 seconds again. Tree-SHA512: 6551e384671f93f10c60df530a29a1954bd265cc305411f665a8756525e5afe2873a8032c797d00b6e8c07e16d9827465d0b662875433147381474a44119ccce
Locked memory manager Add a pool for locked memory chunks, replacing `LockedPageManager`. Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#8321 - bitcoin/bitcoin#8753 - bitcoin/bitcoin#9063 - bitcoin/bitcoin#9070 - bitcoin/bitcoin#11385 - bitcoin/bitcoin#12048 - Excludes change to benchmark. - bitcoin/bitcoin#15117 - bitcoin/bitcoin#16161 - Excludes Travis CI changes. - Includes change from bitcoin/bitcoin#13163 - bitcoin/bitcoin#15600 - bitcoin/bitcoin#18443 - Assorted small changes from: - bitcoin/bitcoin#9233 - bitcoin/bitcoin#10483 - bitcoin/bitcoin#10645 - bitcoin/bitcoin#10969 - bitcoin/bitcoin#11351 - bitcoin/bitcoin#19111 - Excludes change to `src/rpc/server.cpp` - bitcoin/bitcoin#9804 - Only the commit for `src/key.cpp` - bitcoin/bitcoin#9598
…stead O(n) 5fbf7c4 fix nits: variable naming, typos (Martin Ankerl) 1e0ee90 Use best-fit strategy in Arena, now O(log(n)) instead O(n) (Martin Ankerl) Pull request description: This replaces the first-fit algorithm used in the Arena with a best-fit. According to "Dynamic Storage Allocation: A Survey and Critical Review", Wilson et. al. 1995, http://www.scs.stanford.edu/14wi-cs140/sched/readings/wilson.pdf, both startegies work well in practice. The advantage of using best-fit is that we can switch the O(n) allocation to O(log(n)). Additionally, some previously O(log(n)) operations are now O(1) operations by using hash maps. The end effect is that the benchmark runs about 2.5 times faster on my machine: # Benchmark, evals, iterations, total, min, max, median old: BenchLockedPool, 5, 530, 5.25749, 0.00196938, 0.00199755, 0.00198172 new: BenchLockedPool, 5, 1300, 5.11313, 0.000781493, 0.000793314, 0.00078606 I've run all unit tests and benchmarks, and increased the number of iterations so that BenchLockedPool takes about 5 seconds again. Tree-SHA512: 6551e384671f93f10c60df530a29a1954bd265cc305411f665a8756525e5afe2873a8032c797d00b6e8c07e16d9827465d0b662875433147381474a44119ccce
…ol.cpp 30fb598 Fix segfault in allocator_tests/arena_tests (Jeffrey Czyz) 15c84f5 Define ARENA_DEBUG in Travis test runs (Jeffrey Czyz) ad71548 Fix compilation errors in support/lockedpool.cpp (Jeffrey Czyz) Pull request description: Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG ACKs for top commit: laanwj: ACK 30fb598 fanquake: ACK 30fb598 - thanks for following up jkczyz. Tree-SHA512: 4eec368a4e9c67e4e2a27bc05608a807c2892d50c60d06ed21490cd274c0369f9671bc05b3006acc2a193316caf4896454c9c299603bfed29bd488f1987ec446
…ol.cpp 30fb598 Fix segfault in allocator_tests/arena_tests (Jeffrey Czyz) 15c84f5 Define ARENA_DEBUG in Travis test runs (Jeffrey Czyz) ad71548 Fix compilation errors in support/lockedpool.cpp (Jeffrey Czyz) Pull request description: Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG ACKs for top commit: laanwj: ACK 30fb598 fanquake: ACK 30fb598 - thanks for following up jkczyz. Tree-SHA512: 4eec368a4e9c67e4e2a27bc05608a807c2892d50c60d06ed21490cd274c0369f9671bc05b3006acc2a193316caf4896454c9c299603bfed29bd488f1987ec446
…ol.cpp 30fb598 Fix segfault in allocator_tests/arena_tests (Jeffrey Czyz) 15c84f5 Define ARENA_DEBUG in Travis test runs (Jeffrey Czyz) ad71548 Fix compilation errors in support/lockedpool.cpp (Jeffrey Czyz) Pull request description: Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG ACKs for top commit: laanwj: ACK 30fb598 fanquake: ACK 30fb598 - thanks for following up jkczyz. Tree-SHA512: 4eec368a4e9c67e4e2a27bc05608a807c2892d50c60d06ed21490cd274c0369f9671bc05b3006acc2a193316caf4896454c9c299603bfed29bd488f1987ec446
Changes in #12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
Changes in bitcoin#12048 cause a compilation error in Arena::walk() when ARENA_DEBUG is defined. Specifically, Arena's chunks_free map was changed to have a different value type. Additionally, missing includes cause other compilation errors when ARENA_DEBUG is defined. Reproduced with: make CPPFLAGS=-DARENA_DEBUG
This replaces the first-fit algorithm used in the Arena with a best-fit. According to "Dynamic Storage Allocation: A Survey and Critical Review", Wilson et. al. 1995, http://www.scs.stanford.edu/14wi-cs140/sched/readings/wilson.pdf, both startegies work well in practice.
The advantage of using best-fit is that we can switch the O(n) allocation to O(log(n)). Additionally, some previously O(log(n)) operations are now O(1) operations by using hash maps. The end effect is that the benchmark runs about 2.5 times faster on my machine:
I've run all unit tests and benchmarks, and increased the number of iterations so that BenchLockedPool takes about 5 seconds again.