-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New historic state cache #6475
New historic state cache #6475
Conversation
This is working quite well for the sequential query case. The tricky part is managing when to build caches. Previously we were re-building them for every state that required replay, which was obviously sub-optimal. Now we ensure that any state in the historic state cache has all caches built, which makes replay faster at the expense of slowing down the loading of snapshot/diff states. On the test machine the loads for snapshot/diff states are now around 6-7s (ouch), but each subsequent state is more like 0.5-1s (on average 700ms). I thought about moving the cache build from when the snapshot/diff state is loaded to the next slot, which would keep the total amortised time the same, but reduce the worst-case time for the diff state by several seconds (maybe from 6-7s down to 3s). It seems cache building only takes 1.8-2.5s in the worst case, so there's still some time going missing for the snapshot state. |
Block replay is quick now with built caches, although it's a bit slower at epoch boundaries:
In this commit (52a9066) I made it so that we replay 1 block in preference to applying a diff for states that would normally be constructed by diffing. This may be the wrong choice, as applying a diff should be ~150ms, and converting HDiffBuffer -> state should be only ~400ms or so, for 550ms total. Although this doesn't mesh with the benchmark results from prior to this commit, where diff states were taking around 4-6s! Maybe cloning the several hundred MB of hdiff buffer is taking more time than we think (could be good to measure this). Another advantage of favouring replay where possible (i.e. for sequential access) is that it minimises memory fragmentation. |
@@ -1992,6 +1992,8 @@ pub fn scrape_for_metrics<T: BeaconChainTypes>(beacon_chain: &BeaconChain<T>) { | |||
.canonical_head | |||
.fork_choice_read_lock() | |||
.scrape_for_metrics(); | |||
|
|||
beacon_chain.store.register_metrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a regression from a looong time ago.
I think this is ready for review. Some further optimisations are described here, but I think current performance is now very good. |
metrics::inc_counter(&metrics::STORE_BEACON_HISTORIC_STATE_CACHE_MISS); | ||
|
||
return self.load_cold_state_by_slot_using_replay(cached_state, slot); | ||
} | ||
|
||
metrics::inc_counter(&metrics::STORE_BEACON_HISTORIC_STATE_CACHE_MISS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two misses are different. Technically we got a hit as the cache is allowing us to skip some work. Maybe we can have a metric STORE_BEACON_HISTORIC_STATE_CACHE_RESULT
with a label:
hit_exact
hit_replay
miss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this some more. Will merge as-is for now
Co-authored-by: Lion - dapplion <[email protected]>
Co-authored-by: Lion - dapplion <[email protected]>
ab9c275
into
sigp:tree-states-archive
* Start extracting freezer changes for tree-states * Remove unused config args * Add comments * Remove unwraps * Subjective more clear implementation * Clean up hdiff * Update xdelta3 * Tree states archive metrics (#6040) * Add store cache size metrics * Add compress timer metrics * Add diff apply compute timer metrics * Add diff buffer cache hit metrics * Add hdiff buffer load times * Add blocks replayed metric * Move metrics to store * Future proof some metrics --------- Co-authored-by: Michael Sproul <[email protected]> * Port and clean up forwards iterator changes * Add and polish hierarchy-config flag * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Cleaner errors * Fix beacon_chain test compilation * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Patch a few more freezer block roots * Fix genesis block root bug * Fix test failing due to pending updates * Beacon chain tests passing * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix doc lint * Implement DB schema upgrade for hierarchical state diffs (#6193) * DB upgrade * Add flag * Delete RestorePointHash * Update docs * Update docs * Implement hierarchical state diffs config migration (#6245) * Implement hierarchical state diffs config migration * Review PR * Remove TODO * Set CURRENT_SCHEMA_VERSION correctly * Fix genesis state loading * Re-delete some PartialBeaconState stuff --------- Co-authored-by: Michael Sproul <[email protected]> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix test compilation * Update schema downgrade test * Fix tests * Fix null anchor migration * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix tree states upgrade migration (#6328) * Towards crash safety * Fix compilation * Move cold summaries and state roots to new columns * Rename StateRoots chunked field * Update prune states * Clean hdiff CLI flag and metrics * Fix "staged reconstruction" * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix alloy issues * Fix staged reconstruction logic * Prevent weird slot drift * Remove "allow" flag * Update CLI help * Remove FIXME about downgrade * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Remove some unnecessary error variants * Fix new test * Tree states archive - review comments and metrics (#6386) * Review PR comments and metrics * Comments * Add anchor metrics * drop prev comment * Update metadata.rs * Apply suggestions from code review --------- Co-authored-by: Michael Sproul <[email protected]> * Update beacon_node/store/src/hot_cold_store.rs Co-authored-by: Lion - dapplion <[email protected]> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Clarify comment and remove anchor_slot garbage * Simplify database anchor (#6397) * Simplify database anchor * Update beacon_node/store/src/reconstruct.rs * Add migration for anchor * Fix and simplify light_client store tests * Fix incompatible config test * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * More metrics * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * New historic state cache (#6475) * New historic state cache * Add more metrics * State cache hit rate metrics * Fix store metrics * More logs and metrics * Fix logger * Ensure cached states have built caches :O * Replay blocks in preference to diffing * Two separate caches * Distribute cache build time to next slot * Re-plumb historic-state-cache flag * Clean up metrics * Update book * Update beacon_node/store/src/hdiff.rs Co-authored-by: Lion - dapplion <[email protected]> * Update beacon_node/store/src/historic_state_cache.rs Co-authored-by: Lion - dapplion <[email protected]> --------- Co-authored-by: Lion - dapplion <[email protected]> * Update database docs * Update diagram * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Update lockbud to work with bindgen/etc * Correct pkg name for Debian * Remove vestigial epochs_per_state_diff * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Markdown lint * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Address Jimmy's review comments * Simplify ReplayFrom case * Fix and document genesis_state_root * Typo Co-authored-by: Jimmy Chen <[email protected]> * Merge branch 'unstable' into tree-states-archive * Compute diff of validators list manually (#6556) * Split hdiff computation * Dedicated logic for historical roots and summaries * Benchmark against real states * Mutated source? * Version the hdiff * Add lighthouse DB config for hierarchy exponents * Tidy up hierarchy exponents flag * Apply suggestions from code review Co-authored-by: Michael Sproul <[email protected]> * Address PR review * Remove hardcoded paths in benchmarks * Delete unused function in benches * lint --------- Co-authored-by: Michael Sproul <[email protected]> * Test hdiff binary format stability (#6585) * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Add deprecation warning for SPRP * Update xdelta to get rid of duplicate deps * Document test
* Start extracting freezer changes for tree-states * Remove unused config args * Add comments * Remove unwraps * Subjective more clear implementation * Clean up hdiff * Update xdelta3 * Tree states archive metrics (sigp#6040) * Add store cache size metrics * Add compress timer metrics * Add diff apply compute timer metrics * Add diff buffer cache hit metrics * Add hdiff buffer load times * Add blocks replayed metric * Move metrics to store * Future proof some metrics --------- Co-authored-by: Michael Sproul <[email protected]> * Port and clean up forwards iterator changes * Add and polish hierarchy-config flag * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Cleaner errors * Fix beacon_chain test compilation * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Patch a few more freezer block roots * Fix genesis block root bug * Fix test failing due to pending updates * Beacon chain tests passing * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix doc lint * Implement DB schema upgrade for hierarchical state diffs (sigp#6193) * DB upgrade * Add flag * Delete RestorePointHash * Update docs * Update docs * Implement hierarchical state diffs config migration (sigp#6245) * Implement hierarchical state diffs config migration * Review PR * Remove TODO * Set CURRENT_SCHEMA_VERSION correctly * Fix genesis state loading * Re-delete some PartialBeaconState stuff --------- Co-authored-by: Michael Sproul <[email protected]> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix test compilation * Update schema downgrade test * Fix tests * Fix null anchor migration * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix tree states upgrade migration (sigp#6328) * Towards crash safety * Fix compilation * Move cold summaries and state roots to new columns * Rename StateRoots chunked field * Update prune states * Clean hdiff CLI flag and metrics * Fix "staged reconstruction" * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix alloy issues * Fix staged reconstruction logic * Prevent weird slot drift * Remove "allow" flag * Update CLI help * Remove FIXME about downgrade * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Remove some unnecessary error variants * Fix new test * Tree states archive - review comments and metrics (sigp#6386) * Review PR comments and metrics * Comments * Add anchor metrics * drop prev comment * Update metadata.rs * Apply suggestions from code review --------- Co-authored-by: Michael Sproul <[email protected]> * Update beacon_node/store/src/hot_cold_store.rs Co-authored-by: Lion - dapplion <[email protected]> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Clarify comment and remove anchor_slot garbage * Simplify database anchor (sigp#6397) * Simplify database anchor * Update beacon_node/store/src/reconstruct.rs * Add migration for anchor * Fix and simplify light_client store tests * Fix incompatible config test * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * More metrics * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * New historic state cache (sigp#6475) * New historic state cache * Add more metrics * State cache hit rate metrics * Fix store metrics * More logs and metrics * Fix logger * Ensure cached states have built caches :O * Replay blocks in preference to diffing * Two separate caches * Distribute cache build time to next slot * Re-plumb historic-state-cache flag * Clean up metrics * Update book * Update beacon_node/store/src/hdiff.rs Co-authored-by: Lion - dapplion <[email protected]> * Update beacon_node/store/src/historic_state_cache.rs Co-authored-by: Lion - dapplion <[email protected]> --------- Co-authored-by: Lion - dapplion <[email protected]> * Update database docs * Update diagram * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Update lockbud to work with bindgen/etc * Correct pkg name for Debian * Remove vestigial epochs_per_state_diff * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Markdown lint * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Address Jimmy's review comments * Simplify ReplayFrom case * Fix and document genesis_state_root * Typo Co-authored-by: Jimmy Chen <[email protected]> * Merge branch 'unstable' into tree-states-archive * Compute diff of validators list manually (sigp#6556) * Split hdiff computation * Dedicated logic for historical roots and summaries * Benchmark against real states * Mutated source? * Version the hdiff * Add lighthouse DB config for hierarchy exponents * Tidy up hierarchy exponents flag * Apply suggestions from code review Co-authored-by: Michael Sproul <[email protected]> * Address PR review * Remove hardcoded paths in benchmarks * Delete unused function in benches * lint --------- Co-authored-by: Michael Sproul <[email protected]> * Test hdiff binary format stability (sigp#6585) * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Add deprecation warning for SPRP * Update xdelta to get rid of duplicate deps * Document test
Proposed Changes
Attempt to unify the new HDiffBuffer cache with the old "historic state cache" by creating a cache that holds either or both kinds of states, and lazily converts between the formats as required.
Additional Info
Performance testing on mainnet with
--hierarchy-exponents 5,7,11
shows much improved performance for sequential queries. Most queries after the first are served in 400-700ms.