Remove redundant EEs #21

paulhauner · 2022-03-31T07:32:16Z

Issue Addressed

This is implementing (de-implementing?) my proposal over in sigp#3118.

I figured basing this off @realbigsean's branch would avoid a horrible merge conflict. I've opened this PR to provide a view on the work I'm doing. I'm not sure if we should merge this into sigp#3062 and then merge that to unstable, or merge sigp#3062 and then put this on top of it. Open to feedback :)

This is still a WIP, it doesn't compile yet.

## Issue Addressed N/A ## Proposed Changes - Update the JSON-RPC id field for both our request and response objects to be a `serde_json::Value` rather than a `u32`. This field could be a string or a number according to the JSON-RPC 2.0 spec. We only ever set it to a number, but if, for example, we get a response that wraps this number in quotes, we would fail to deserialize it. I think because we're not doing any validation around this id otherwise, we should be less strict with it in this regard. ## Additional Info Co-authored-by: realbigsean <[email protected]>

## Issue Addressed MEV boost compatibility ## Proposed Changes See sigp#2987 ## Additional Info This is blocked on the stabilization of a couple specs, [here](ethereum/beacon-APIs#194) and [here](flashbots/mev-boost#20). Additional TODO's and outstanding questions - [ ] MEV boost JWT Auth - [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate - [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic. Co-authored-by: realbigsean <[email protected]> Co-authored-by: realbigsean <[email protected]>

realbigsean

The simplification is really nice.. seems like it's not too bad with the builder stuff since the last refactor!

@winksaville

## Proposed Changes Add a `lighthouse db` command with three initial subcommands: - `lighthouse db version`: print the database schema version. - `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version. - `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`. This PR lays the groundwork for other changes, namely: - Mark's fast-deposit sync (sigp#2915), for which I think we should implement a database downgrade (from v9 to v8). - My `tree-states` work, which already implements a downgrade (v10 to v8). - Standalone purge commands like `lighthouse db purge-dht` per sigp#2824. ## Additional Info I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods. Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (sigp#2685).

## Issue Addressed Resolves sigp#3128 ## Proposed Changes Strip trailing newlines from jwt secret files.

## Proposed Changes Mitigate the fork choice attacks described in [_Three Attacks on Proof-of-Stake Ethereum_](https://arxiv.org/abs/2110.10086) by enabling proposer boost @ 70% on mainnet. Proposer boost has been running with stability on Prater for a few months now, and is safe to roll out gradually on mainnet. I'll argue that the financial impact of rolling out gradually is also minimal. Consider how a proposer-boosted validator handles two types of re-orgs: ## Ex ante re-org (from the paper) In the mitigated attack, a malicious proposer releases their block at slot `n + 1` late so that it re-orgs the block at the slot _after_ them (at slot `n + 2`). Non-boosting validators will follow this re-org and vote for block `n + 1` in slot `n + 2`. Boosted validators will vote for `n + 2`. If the boosting validators are outnumbered, there'll be a re-org to the malicious block from `n + 1` and validators applying the boost will have their slot `n + 2` attestations miss head (and target on an epoch boundary). Note that all the attesters from slot `n + 1` are doomed to lose their head vote rewards, but this is the same regardless of boosting. Therefore, Lighthouse nodes stand to miss slightly more head votes than other nodes if they are in the minority while applying the proposer boost. Once the proposer boost nodes gain a majority, this trend reverses. ## Ex post re-org (using the boost) The other type of re-org is an ex post re-org using the strategy described here: sigp#2860. With this strategy, boosted nodes will follow the attempted re-org and again lose a head vote if the re-org is unsuccessful. Once boosting is widely adopted, the re-orgs will succeed and the non-boosting validators will lose out. I don't think there are (m)any validators applying this strategy, because it is irrational to attempt it before boosting is widely adopted. Therefore I think we can safely ignore this possibility. ## Risk Assessment From observing re-orgs on mainnet I don't think ex ante re-orgs are very common. I've observed around 1 per day for the last month on my node (see: https://gist.github.com/michaelsproul/3b2142fa8fe0ff767c16553f96959e8c), compared to 2.5 ex post re-orgs per day. Given one extra slot per day where attesting will cause a missed head vote, each individual validator has a 1/32 chance of being assigned to that slot. So we have an increase of 1/32 missed head votes per validator per day in expectation. Given that we currently see ~7 head vote misses per validator per day due to late/missing blocks (and re-orgs), this represents only a (1/32)/7 = 0.45% increase in missed head votes in expectation. I believe this is so small that we shouldn't worry about it. Particularly as getting proposer boost deployed is good for network health and may enable us to drive down the number of late blocks over time (which will decrease head vote misses). ## TL;DR Enable proposer boost now and release ASAP, as financial downside is a 0.45% increase in missed head votes until widespread adoption.

## Proposed Changes Increase the default `--slots-per-restore-point` to 8192 for a 4x reduction in freezer DB disk usage. Existing nodes that use the previous default of 2048 will be left unchanged. Newly synced nodes (with or without checkpoint sync) will use the new 8192 default. Long-term we could do away with the freezer DB entirely for validator-only nodes, but this change is much simpler and grants us some extra space in the short term. We can also roll it out gradually across our nodes by purging databases one by one, while keeping the Ansible config the same. ## Additional Info We ignore a change from 2048 to 8192 if the user hasn't set the 8192 explicitly. We fire a debug log in the case where we do ignore: ``` DEBG Ignoring slots-per-restore-point config in favour of on-disk value, on_disk: 2048, config: 8192 ```

## Issue Addressed N/A ## Proposed Changes Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair. Further make the rpc response limits fork aware.

## Proposed Changes I did some gardening 🌳 in our dependency tree: - Remove duplicate versions of `warp` (git vs patch) - Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`. - Update MDBX (should resolve sigp#3028). I tested and Lighthouse compiles on Windows 11 now. - Restore `psutil` back to upstream - Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7. Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly. ## Additional Info Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see rust-lang/cargo#6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version. I've also opened a PR to `blst` related to zeroize: supranational/blst#111

## Issue Addressed This resolves errors related to the glibc version of the downloaded mdbook binaries. Currently the mdbook job is failing on `unstable`: https://github.com/sigp/lighthouse/runs/5785245715?check_suite_focus=true

## Proposed Changes Cut release v2.2.0 including proposer boost. ## Additional Info I also updated the clippy lints for the imminent release of Rust 1.60, although LH v2.2.0 will continue to compile using Rust 1.58 (our MSRV).

## Issue Addressed NA ## Proposed Changes - Adds more checks to prevent importing blocks atop parent with invalid execution payloads. - Adds a test for these conditions. ## Additional Info NA

## Issue Addressed NA ## Proposed Changes Ensures that a `VALID` response from a `forkchoiceUpdate` call will update that block in `ProtoArray`. I also had to modify the mock execution engine so it wouldn't return valid when all payloads were supposed to be some other static value. ## Additional Info NA

Changed SPRP to the correct default value of 8192.

## Issue Addressed - sigp#3058 Co-authored-by: Paul Hauner <[email protected]>

## Issue Addressed N/A ## Proposed Changes sigp#3133 changed the rpc type limits to be fork aware i.e. if our current fork based on wall clock slot is Altair, then we apply only altair rpc type limits. This is a bug because phase0 blocks can still be sent over rpc and phase 0 block minimum size is smaller than altair block minimum size. So a phase0 block with `size < SIGNED_BEACON_BLOCK_ALTAIR_MIN` will return an `InvalidData` error as it doesn't pass the rpc types bound check. This error can be seen when we try syncing pre-altair blocks with size smaller than `SIGNED_BEACON_BLOCK_ALTAIR_MIN`. This PR fixes the issue by also accounting for forks earlier than current_fork in the rpc limits calculation in the `rpc_block_limits_by_fork` function. I decided to hardcode the limits in the function because that seemed simpler than calculating previous forks based on current fork and doing a min across forks. Adding a new fork variant is simple and can the limits can be easily checked in a review. Adds unit tests and modifies the syncing simulator to check the syncing from across fork boundaries. The syncing simulator's block 1 would always be of phase 0 minimum size (404 bytes) which is smaller than altair min block size (since block 1 contains no attestations).

## Issue Addressed NA ## Proposed Changes Fixes an issue introduced in sigp#3088 which was causing unnecessary `crit` logs on networks without Bellatrix enabled. ## Additional Info NA

## Issue Addressed Addresses sync stalls on v2.2.0 (i.e. sigp#3147). ## Additional Info I've avoided doing a full `cargo update` because I noticed there's a new patch version of libp2p and thought it could do with some more testing. Co-authored-by: Paul Hauner <[email protected]>

## Issue Addressed In very rare occasions we've seen most if not all our peers in a chain with which we don't agree. Purging these peers can take a very long time: number of retries of the chain. Meanwhile sync is caught in a loop trying the chain again and again. This makes it so that we fast track purging peers via registering the failed chain to prevent retrying for some time (30 seconds). Longer times could be dangerous since a chain can fail if a batch fails to download for example. In this case, I think it's still acceptable to fast track purging peers since they are nor providing the required info anyway Co-authored-by: Divma <[email protected]>

## Issue Addressed NA ## Proposed Changes Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end. I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid). ## Additional Info - ~~Blocked on sigp#3126~~

## Issue Addressed We still ping peers that are considered in a disconnecting state ## Proposed Changes Do not ping peers once we decide they are disconnecting Upgrade logs about ignored rpc messages ## Additional Info --

## Issue Addressed N/A ## Proposed Changes Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.

realbigsean and others added 2 commits March 29, 2022 22:59

realbigsean reviewed Mar 31, 2022

View reviewed changes

michaelsproul and others added 19 commits April 1, 2022 00:58

Strip newline from jwt secrets (sigp#3132)

9ec072f

## Issue Addressed Resolves sigp#3128 ## Proposed Changes Strip trailing newlines from jwt secret files.

Fix merge rpc length limits (sigp#3133)

ab434bc

## Issue Addressed N/A ## Proposed Changes Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair. Further make the rpc response limits fork aware.

v2.2.0 (sigp#3139)

bac7c3f

## Proposed Changes Cut release v2.2.0 including proposer boost. ## Additional Info I also updated the clippy lints for the imminent release of Rust 1.60, although LH v2.2.0 will continue to compile using Rust 1.58 (our MSRV).

Add tests for importing blocks on invalid parents (sigp#3123)

42cdaf5

## Issue Addressed NA ## Proposed Changes - Adds more checks to prevent importing blocks atop parent with invalid execution payloads. - Adds a test for these conditions. ## Additional Info NA

Fix SPRP default value in cli (sigp#3145)

5ff4013

Changed SPRP to the correct default value of 8192.

Transition Block Proposer Preparation (sigp#3088)

22002a4

## Issue Addressed - sigp#3058 Co-authored-by: Paul Hauner <[email protected]>

Don't log crits for missing EE before Bellatrix (sigp#3150)

c8edeaf

## Issue Addressed NA ## Proposed Changes Fixes an issue introduced in sigp#3088 which was causing unnecessary `crit` logs on networks without Bellatrix enabled. ## Additional Info NA

Poll shutdown timeout in rpc handler (sigp#3153)

db0beb5

## Issue Addressed N/A ## Proposed Changes Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.

paulhauner force-pushed the no-redundant-ees branch from 1a1a8c0 to 091a394 Compare April 14, 2022 01:01

paulhauner added 5 commits April 14, 2022 11:18

Remove redundancy from execution_layer

13d64a2

Remove engines.rs

6419153

Fix exec engine integration compilation

9340689

Update CLI flags

d4e77bf

Fix warnings after rebase

f990f32

paulhauner force-pushed the no-redundant-ees branch from 091a394 to f990f32 Compare April 14, 2022 01:25

Start fixing CLI tests

6ecfa33

realbigsean deleted the branch realbigsean:kiln-mev-boost November 21, 2023 16:15

realbigsean closed this Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant EEs #21

Remove redundant EEs #21

paulhauner commented Mar 31, 2022 •

edited

Loading

realbigsean left a comment

Remove redundant EEs #21

Remove redundant EEs #21

Conversation

paulhauner commented Mar 31, 2022 • edited Loading

Issue Addressed

realbigsean left a comment

Choose a reason for hiding this comment

paulhauner commented Mar 31, 2022 •

edited

Loading