Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant EEs #21

Closed

Conversation

paulhauner
Copy link

@paulhauner paulhauner commented Mar 31, 2022

Issue Addressed

This is implementing (de-implementing?) my proposal over in sigp#3118.

I figured basing this off @realbigsean's branch would avoid a horrible merge conflict. I've opened this PR to provide a view on the work I'm doing. I'm not sure if we should merge this into sigp#3062 and then merge that to unstable, or merge sigp#3062 and then put this on top of it. Open to feedback :)

This is still a WIP, it doesn't compile yet.

realbigsean and others added 2 commits March 29, 2022 22:59
## Issue Addressed

N/A

## Proposed Changes

- Update the JSON-RPC id field for both our request and response objects to be a `serde_json::Value` rather than a `u32`. This field could be a string or a number according to the JSON-RPC 2.0 spec. We only ever set it to a number, but if, for example, we get a response that wraps this number in quotes, we would fail to deserialize it. I think because we're not doing any validation around this id otherwise, we should be less strict with it in this regard. 

## Additional Info



Co-authored-by: realbigsean <[email protected]>
## Issue Addressed

MEV boost compatibility

## Proposed Changes

See sigp#2987

## Additional Info

This is blocked on the stabilization of a couple specs, [here](ethereum/beacon-APIs#194) and [here](flashbots/mev-boost#20).

Additional TODO's and outstanding questions

- [ ] MEV boost JWT Auth
- [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate
- [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic. 

Co-authored-by: realbigsean <[email protected]>
Co-authored-by: realbigsean <[email protected]>
Copy link
Owner

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplification is really nice.. seems like it's not too bad with the builder stuff since the last refactor!

michaelsproul and others added 19 commits April 1, 2022 00:58
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (sigp#2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per sigp#2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (sigp#2685).
## Issue Addressed

Resolves sigp#3128 

## Proposed Changes

Strip trailing newlines from jwt secret files.
## Proposed Changes

Mitigate the fork choice attacks described in [_Three Attacks on Proof-of-Stake Ethereum_](https://arxiv.org/abs/2110.10086) by enabling proposer boost @ 70% on mainnet.

Proposer boost has been running with stability on Prater for a few months now, and is safe to roll out gradually on mainnet. I'll argue that the financial impact of rolling out gradually is also minimal.

Consider how a proposer-boosted validator handles two types of re-orgs:

## Ex ante re-org (from the paper)

In the mitigated attack, a malicious proposer releases their block at slot `n + 1` late so that it re-orgs the block at the slot _after_  them (at slot `n + 2`). Non-boosting validators will follow this re-org and vote for block `n + 1` in slot `n + 2`. Boosted validators will vote for `n + 2`. If the boosting validators are outnumbered, there'll be a re-org to the malicious block from `n + 1` and validators applying the boost will have their slot `n + 2` attestations miss head (and target on an epoch boundary). Note that all the attesters from slot `n + 1` are doomed to lose their head vote rewards, but this is the same regardless of boosting.

Therefore, Lighthouse nodes stand to miss slightly more head votes than other nodes if they are in the minority while applying the proposer boost. Once the proposer boost nodes gain a majority, this trend reverses.

## Ex post re-org (using the boost)

The other type of re-org is an ex post re-org using the strategy described here: sigp#2860. With this strategy, boosted nodes will follow the attempted re-org and again lose a head vote if the re-org is unsuccessful. Once boosting is widely adopted, the re-orgs will succeed and the non-boosting validators will lose out.

I don't think there are (m)any validators applying this strategy, because it is irrational to attempt it before boosting is widely adopted. Therefore I think we can safely ignore this possibility.

## Risk Assessment

From observing re-orgs on mainnet I don't think ex ante re-orgs are very common. I've observed around 1 per day for the last month on my node (see: https://gist.github.com/michaelsproul/3b2142fa8fe0ff767c16553f96959e8c), compared to 2.5 ex post re-orgs per day.

Given one extra slot per day where attesting will cause a missed head vote, each individual validator has a 1/32 chance of being assigned to that slot. So we have an increase of 1/32 missed head votes per validator per day in expectation. Given that we currently see ~7 head vote misses per validator per day due to late/missing blocks (and re-orgs), this represents only a (1/32)/7 = 0.45% increase in missed head votes in expectation. I believe this is so small that we shouldn't worry about it. Particularly as getting proposer boost deployed is good for network health and may enable us to drive down the number of late blocks over time (which will decrease head vote misses).

## TL;DR

Enable proposer boost now and release ASAP, as financial downside is a 0.45% increase in missed head votes until widespread adoption.
## Proposed Changes

Increase the default `--slots-per-restore-point` to 8192 for a 4x reduction in freezer DB disk usage.

Existing nodes that use the previous default of 2048 will be left unchanged. Newly synced nodes (with or without checkpoint sync) will use the new 8192 default. 

Long-term we could do away with the freezer DB entirely for validator-only nodes, but this change is much simpler and grants us some extra space in the short term. We can also roll it out gradually across our nodes by purging databases one by one, while keeping the Ansible config the same.

## Additional Info

We ignore a change from 2048 to 8192 if the user hasn't set the 8192 explicitly. We fire a debug log in the case where we do ignore:

```
DEBG Ignoring slots-per-restore-point config in favour of on-disk value, on_disk: 2048, config: 8192
```
## Issue Addressed

N/A

## Proposed Changes

Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair.
Further make the rpc response limits fork aware.
## Proposed Changes

I did some gardening 🌳 in our dependency tree:

- Remove duplicate versions of `warp` (git vs patch)
- Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`.
- Update MDBX (should resolve sigp#3028). I tested and Lighthouse compiles on Windows 11 now.
- Restore `psutil` back to upstream
- Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7.

Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly.

## Additional Info

Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see rust-lang/cargo#6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version.

I've also opened a PR to `blst` related to zeroize: supranational/blst#111
## Issue Addressed

This resolves errors related to the glibc version of the downloaded mdbook binaries.

Currently the mdbook job is failing on `unstable`: https://github.com/sigp/lighthouse/runs/5785245715?check_suite_focus=true
## Proposed Changes

Cut release v2.2.0 including proposer boost.

## Additional Info

I also updated the clippy lints for the imminent release of Rust 1.60, although LH v2.2.0 will continue to compile using Rust 1.58 (our MSRV).
## Issue Addressed

NA

## Proposed Changes

- Adds more checks to prevent importing blocks atop parent with invalid execution payloads.
- Adds a test for these conditions.

## Additional Info

NA
## Issue Addressed

NA

## Proposed Changes

Ensures that a `VALID` response from a `forkchoiceUpdate` call will update that block in `ProtoArray`.

I also had to modify the mock execution engine so it wouldn't return valid when all payloads were supposed to be some other static value.

## Additional Info

NA
Changed SPRP to the correct default value of 8192.
## Issue Addressed

- sigp#3058 

Co-authored-by: Paul Hauner <[email protected]>
## Issue Addressed

N/A

## Proposed Changes

sigp#3133 changed the rpc type limits to be fork aware i.e. if our current fork based on wall clock slot is Altair, then we apply only altair rpc type limits. This is a bug because phase0 blocks can still be sent over rpc and phase 0 block minimum size is smaller than altair block minimum size. So a phase0 block with `size < SIGNED_BEACON_BLOCK_ALTAIR_MIN` will return an `InvalidData` error as it doesn't pass the rpc types bound check.

This error can be seen when we try syncing pre-altair blocks with size smaller than `SIGNED_BEACON_BLOCK_ALTAIR_MIN`.

This PR fixes the issue by also accounting for forks earlier than current_fork in the rpc limits calculation in the  `rpc_block_limits_by_fork` function. I decided to hardcode the limits in the function because that seemed simpler than calculating previous forks based on current fork and doing a min across forks. Adding a new fork variant is simple and can the limits can be easily checked in a review. 

Adds unit tests and modifies the syncing simulator to check the syncing from across fork boundaries. 
The syncing simulator's block 1 would always be of phase 0 minimum size (404 bytes) which is smaller than altair min block size (since block 1 contains no attestations).
## Issue Addressed

NA

## Proposed Changes

Fixes an issue introduced in sigp#3088 which was causing unnecessary `crit` logs on networks without Bellatrix enabled.

## Additional Info

NA
## Issue Addressed

Addresses sync stalls on v2.2.0 (i.e. sigp#3147).

## Additional Info

I've avoided doing a full `cargo update` because I noticed there's a new patch version of libp2p and thought it could do with some more testing.



Co-authored-by: Paul Hauner <[email protected]>
## Issue Addressed

In very rare occasions we've seen most if not all our peers in a chain with which we don't agree. Purging these peers can take a very long time: number of retries of the chain. Meanwhile sync is caught in a loop trying the chain again and again. This makes it so that we fast track purging peers via registering the failed chain to prevent retrying for some time (30 seconds). Longer times could be dangerous since a chain can fail if a batch fails to download for example. In this case, I think it's still acceptable to fast track purging peers since they are nor providing the required info anyway 

Co-authored-by: Divma <[email protected]>
## Issue Addressed

NA

## Proposed Changes

Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end.

I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid).

## Additional Info

- ~~Blocked on sigp#3126~~
## Issue Addressed
We still ping peers that are considered in a disconnecting state

## Proposed Changes

Do not ping peers once we decide they are disconnecting
Upgrade logs about ignored rpc messages

## Additional Info
--
## Issue Addressed

N/A

## Proposed Changes

Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.
@realbigsean realbigsean deleted the branch realbigsean:kiln-mev-boost November 21, 2023 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants