Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Optimise per_epoch_processing low-hanging-fruit #3254

Closed
wants to merge 7 commits into from

Conversation

paulhauner
Copy link
Member

@paulhauner paulhauner commented Jun 9, 2022

Issue Addressed

NA

Proposed Changes

  • Uses a Vec in SingleEpochParticipationCache rather than HashMap to speed up processing times at the cost of memory usage.
  • Cache the result of integer_sqrt rather than recomputing for each validator.
  • Cache state.previous_epoch rather than recomputing it for each validator.

Benchmarks

Benchmarks on a recent mainnet state using #3252 to get timing.

Without this PR

lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Using mainnet spec
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Advancing 32 slots
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Doing 10 runs
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz"
SSZ decoding /tmp/state-0x3cdc.ssz: 43ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 0: 245.718794ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 1: 245.364782ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 2: 255.866179ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 3: 243.838909ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 4: 250.431425ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 5: 248.68765ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 6: 262.051113ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 7: 264.293967ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 8: 293.202007ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 9: 264.552017ms

With this PR:

lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 0: 73.898678ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 1: 75.536978ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 2: 75.176104ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 3: 76.460828ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 4: 75.904195ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 5: 75.53077ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 6: 74.745572ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 7: 75.823489ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 8: 74.892055ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 9: 76.333569ms

Additional Info

NA

@paulhauner paulhauner added ready-for-review The code is ready for review consensus An issue/PR that touches consensus code, such as state_processing or block verification. work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels Jun 9, 2022
@paulhauner paulhauner changed the title Avoid repeated, duplicate calls to integer_sqrt Avoid duplicate calls to integer_sqrt Jun 9, 2022
@michaelsproul
Copy link
Member

Nice! 🎉

Regarding optimising the other base reward calls, we could consider caching the whole base_reward for each validator on the ParticipationCache. This is the approach I took on the tree-states branch when I noticed base reward calculations taking significant time, and for tree-states had the added advantage of reducing the number of expensive tree iterations. The relevant struct is called ValidatorInfo on tree-states, and also contains a bunch of other fields which probably aren't worth caching in the non-tree case:

#[derive(Debug, PartialEq)]
pub struct ValidatorInfo {
pub effective_balance: u64,
pub base_reward: u64,
pub is_eligible: bool,
pub is_slashed: bool,
pub is_active_current_epoch: bool,
pub is_active_previous_epoch: bool,
pub previous_epoch_participation: ParticipationFlags,
}

/// Single `HashMap` for validator info relevant to `process_epoch`.
#[derive(Debug, PartialEq)]
struct ValidatorInfoCache {
info: HashMap<usize, ValidatorInfo>,
}

That's a larger and more invasive change however, and seeing as this change combined with the Vec map already yields enormous speed-ups, maybe we take the simple win and merge this almost as-is 😍

commit 3870406
Author: Paul Hauner <[email protected]>
Date:   Thu Jun 9 18:13:46 2022 +1000

    Swap hashmap to vec
commit 3c2a742
Author: Paul Hauner <[email protected]>
Date:   Thu Jun 9 18:54:41 2022 +1000

    Avoid recomputing previous_epoch
@paulhauner
Copy link
Member Author

I've just pushed the changes from #3255 and #3256 here.

@paulhauner paulhauner changed the title Avoid duplicate calls to integer_sqrt Optimise per_epoch_processing low-hanging-fruit Jun 10, 2022
@paulhauner paulhauner added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Jun 10, 2022
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

I've thoroughly checked every line of this PR and I'm confident in its correctness.

I re-ran the benchmark from #3229 and found a whopping 30% speed-up on top of the ~10% speed-up from that PR. That means our next release should be 37% faster on such blocks. It's worth noting that the benchmark is dominated by tree hashing, so the speed-up isn't quite as huge as the 3x speed-up for epoch processing without tree hashing. https://docs.google.com/spreadsheets/d/1kfPRVTI2NL5UbDl7WAHLA9gWpaK22D9C22dplzjJ7dw/edit#gid=0

I'm super excited to ship this and get it running on mainnet! 🎉

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Jun 10, 2022
@paulhauner
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Jun 10, 2022
## Issue Addressed

NA

## Proposed Changes

- Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage.
- Cache the result of `integer_sqrt` rather than recomputing for each validator.
- Cache `state.previous_epoch` rather than recomputing it for each validator.

### Benchmarks

Benchmarks on a recent mainnet state using #3252 to get timing.

#### Without this PR

```
lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Using mainnet spec
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Advancing 32 slots
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Doing 10 runs
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz"
SSZ decoding /tmp/state-0x3cdc.ssz: 43ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 0: 245.718794ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 1: 245.364782ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 2: 255.866179ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 3: 243.838909ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 4: 250.431425ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 5: 248.68765ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 6: 262.051113ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 7: 264.293967ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 8: 293.202007ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 9: 264.552017ms
```

#### With this PR:

```
lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 0: 73.898678ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 1: 75.536978ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 2: 75.176104ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 3: 76.460828ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 4: 75.904195ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 5: 75.53077ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 6: 74.745572ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 7: 75.823489ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 8: 74.892055ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 9: 76.333569ms
```

## Additional Info

NA
@bors bors bot changed the title Optimise per_epoch_processing low-hanging-fruit [Merged by Bors] - Optimise per_epoch_processing low-hanging-fruit Jun 10, 2022
@bors bors bot closed this Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus An issue/PR that touches consensus code, such as state_processing or block verification. ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants