[Merged by Bors] - Optimise `per_epoch_processing` low-hanging-fruit #3254

paulhauner · 2022-06-09T08:22:14Z

Issue Addressed

NA

Proposed Changes

Uses a Vec in SingleEpochParticipationCache rather than HashMap to speed up processing times at the cost of memory usage.
Cache the result of integer_sqrt rather than recomputing for each validator.
Cache state.previous_epoch rather than recomputing it for each validator.

Benchmarks

Benchmarks on a recent mainnet state using #3252 to get timing.

Without this PR

lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Using mainnet spec
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Advancing 32 slots
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Doing 10 runs
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz"
SSZ decoding /tmp/state-0x3cdc.ssz: 43ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 0: 245.718794ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 1: 245.364782ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 2: 255.866179ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 3: 243.838909ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 4: 250.431425ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 5: 248.68765ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 6: 262.051113ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 7: 264.293967ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 8: 293.202007ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 9: 264.552017ms

With this PR:

lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 0: 73.898678ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 1: 75.536978ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 2: 75.176104ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 3: 76.460828ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 4: 75.904195ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 5: 75.53077ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 6: 74.745572ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 7: 75.823489ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 8: 74.892055ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 9: 76.333569ms

Additional Info

NA

michaelsproul · 2022-06-09T11:42:18Z

Nice! 🎉

Regarding optimising the other base reward calls, we could consider caching the whole base_reward for each validator on the ParticipationCache. This is the approach I took on the tree-states branch when I noticed base reward calculations taking significant time, and for tree-states had the added advantage of reducing the number of expensive tree iterations. The relevant struct is called ValidatorInfo on tree-states, and also contains a bunch of other fields which probably aren't worth caching in the non-tree case:

lighthouse/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs

Lines 133 to 142 in a3e396c

    
           #[derive(Debug, PartialEq)] 
        
           pub struct ValidatorInfo { 
        
               pub effective_balance: u64, 
        
               pub base_reward: u64, 
        
               pub is_eligible: bool, 
        
               pub is_slashed: bool, 
        
               pub is_active_current_epoch: bool, 
        
               pub is_active_previous_epoch: bool, 
        
               pub previous_epoch_participation: ParticipationFlags, 
        
           }

lighthouse/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs

Lines 155 to 159 in a3e396c

    
           /// Single `HashMap` for validator info relevant to `process_epoch`. 
        
           #[derive(Debug, PartialEq)] 
        
           struct ValidatorInfoCache { 
        
               info: HashMap<usize, ValidatorInfo>, 
        
           }

That's a larger and more invasive change however, and seeing as this change combined with the Vec map already yields enormous speed-ups, maybe we take the simple win and merge this almost as-is 😍

commit 3870406 Author: Paul Hauner <[email protected]> Date: Thu Jun 9 18:13:46 2022 +1000 Swap hashmap to vec

commit 3c2a742 Author: Paul Hauner <[email protected]> Date: Thu Jun 9 18:54:41 2022 +1000 Avoid recomputing previous_epoch

paulhauner · 2022-06-10T02:31:33Z

I've just pushed the changes from #3255 and #3256 here.

michaelsproul

This is awesome!

I've thoroughly checked every line of this PR and I'm confident in its correctness.

I re-ran the benchmark from #3229 and found a whopping 30% speed-up on top of the ~10% speed-up from that PR. That means our next release should be 37% faster on such blocks. It's worth noting that the benchmark is dominated by tree hashing, so the speed-up isn't quite as huge as the 3x speed-up for epoch processing without tree hashing. https://docs.google.com/spreadsheets/d/1kfPRVTI2NL5UbDl7WAHLA9gWpaK22D9C22dplzjJ7dw/edit#gid=0

I'm super excited to ship this and get it running on mainnet! 🎉

paulhauner · 2022-06-10T04:29:15Z

bors r+

## Issue Addressed NA ## Proposed Changes - Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage. - Cache the result of `integer_sqrt` rather than recomputing for each validator. - Cache `state.previous_epoch` rather than recomputing it for each validator. ### Benchmarks Benchmarks on a recent mainnet state using #3252 to get timing. #### Without this PR ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:21:02Z INFO lcli::skip_slots] Using mainnet spec [2022-06-09T08:21:02Z INFO lcli::skip_slots] Advancing 32 slots [2022-06-09T08:21:02Z INFO lcli::skip_slots] Doing 10 runs [2022-06-09T08:21:02Z INFO lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz" SSZ decoding /tmp/state-0x3cdc.ssz: 43ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 0: 245.718794ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 1: 245.364782ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 2: 255.866179ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 3: 243.838909ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 4: 250.431425ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 5: 248.68765ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 6: 262.051113ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 7: 264.293967ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 8: 293.202007ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 9: 264.552017ms ``` #### With this PR: ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 0: 73.898678ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 1: 75.536978ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 2: 75.176104ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 3: 76.460828ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 4: 75.904195ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 5: 75.53077ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 6: 74.745572ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 7: 75.823489ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 8: 74.892055ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 9: 76.333569ms ``` ## Additional Info NA

bors · 2022-06-10T06:58:40Z

Pull request successfully merged into unstable.

Build succeeded:

Avoid recomputing sqrt

58081ee

paulhauner added ready-for-review The code is ready for review consensus An issue/PR that touches consensus code, such as state_processing or block verification. work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels Jun 9, 2022

paulhauner changed the title ~~Avoid repeated, duplicate calls to integer_sqrt~~ Avoid duplicate calls to integer_sqrt Jun 9, 2022

paulhauner mentioned this pull request Jun 9, 2022

Avoid recomputing state.previous_epoch #3256

Closed

paulhauner added 2 commits June 10, 2022 12:27

Swap hashmap to vec

0d80fc2

commit 3870406 Author: Paul Hauner <[email protected]> Date: Thu Jun 9 18:13:46 2022 +1000 Swap hashmap to vec

Avoid recomputing previous_epoch

eeca53e

commit 3c2a742 Author: Paul Hauner <[email protected]> Date: Thu Jun 9 18:54:41 2022 +1000 Avoid recomputing previous_epoch

paulhauner changed the title ~~Avoid duplicate calls to integer_sqrt~~ Optimise per_epoch_processing low-hanging-fruit Jun 10, 2022

paulhauner added 4 commits June 10, 2022 12:39

Fix todos

08d8c15

Switch instantiate to new.

772d40b

Change function sig of base_reward_per_increment.

5dee549

Create BaseRewardPerIncrement

92e8874

paulhauner added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Jun 10, 2022

michaelsproul approved these changes Jun 10, 2022

View reviewed changes

michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Jun 10, 2022

bors bot changed the title ~~Optimise per_epoch_processing low-hanging-fruit~~ [Merged by Bors] - Optimise per_epoch_processing low-hanging-fruit Jun 10, 2022

bors bot closed this Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Optimise `per_epoch_processing` low-hanging-fruit #3254

[Merged by Bors] - Optimise `per_epoch_processing` low-hanging-fruit #3254

paulhauner commented Jun 9, 2022 •

edited

Loading

michaelsproul commented Jun 9, 2022

paulhauner commented Jun 10, 2022

michaelsproul left a comment

paulhauner commented Jun 10, 2022

bors bot commented Jun 10, 2022

[Merged by Bors] - Optimise per_epoch_processing low-hanging-fruit #3254

[Merged by Bors] - Optimise per_epoch_processing low-hanging-fruit #3254

Conversation

paulhauner commented Jun 9, 2022 • edited Loading

Issue Addressed

Proposed Changes

Benchmarks

Without this PR

With this PR:

Additional Info

michaelsproul commented Jun 9, 2022

paulhauner commented Jun 10, 2022

michaelsproul left a comment

Choose a reason for hiding this comment

paulhauner commented Jun 10, 2022

bors bot commented Jun 10, 2022

[Merged by Bors] - Optimise `per_epoch_processing` low-hanging-fruit #3254

[Merged by Bors] - Optimise `per_epoch_processing` low-hanging-fruit #3254

paulhauner commented Jun 9, 2022 •

edited

Loading