-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Maintain attestations that reference unknown blocks #2319
Conversation
Hi lighthouse team, we are looking to merge a prysm bug fix: Background, currently prysm validators always wait 1/3 of a slot before sending out attestation. We are a bit hesitant to merge the "timely" fix because @potuz pointed out that lighthouse doesn't hold attestations for unknown block. By merging the fix, it may impact large stakers where they often propose and attest in the same slot. I'm wondering if you guys have any thoughts on whether we can merge our timely fix or a timeline for #2319? Thanks so much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heyo! This looks really good, I can't fault it structurally. I had a couple of suggestions, which I've either added to my branch or I've made as single-line Github suggestions.
Feel free to merge or cherry-pick from my branch, as you see fit. There's some cases where I've made a comment on this PR and then implemented my suggestion on my branch; in such cases I've also provided a link to the specific commit.
For convenience, here's a full diff of my suggested changes: https://github.com/paulhauner/lighthouse/compare/e762f175b753fd61469a532ff139844265a76998..dca09d012155f6224291b0a56f009aa6a61ef0a2
I hope that providing actual commits instead of just PR comments is helpful. I've found that it's easier to communicate these things in Rust rather than English, plus it helps me filter out some dumb suggestions before they waste your time 😅
beacon_node/network/src/beacon_processor/work_reprocessing_queue.rs
Outdated
Show resolved
Hide resolved
beacon_node/network/src/beacon_processor/work_reprocessing_queue.rs
Outdated
Show resolved
Hide resolved
beacon_node/network/src/beacon_processor/work_reprocessing_queue.rs
Outdated
Show resolved
Hide resolved
beacon_node/network/src/beacon_processor/work_reprocessing_queue.rs
Outdated
Show resolved
Hide resolved
const MAXIMUM_QUEUED_BLOCKS: usize = 16; | ||
|
||
/// How many attestations we keep before new ones get dropped. | ||
const MAXIMUM_QUEUED_ATTESTATIONS: usize = 1_024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dapplion astutely pointed out that this value may be too low if a large percentage of validators attest early. Assuming 250k validators, that's 7.8k per slot. We could set the max capacity to 8192 to give us a bit of headroom. What do you think @paulhauner, @divagant-martian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I also expected this to be a bit small but was planning to just run it and see what happens whilst it's in unstable
.
That being said, I think 8,192 is a better starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not compute these numbers on start-up depending on the preset constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not compute these numbers on start-up depending on the preset constants?
I'm not sure I understand. Are you saying to compute it based on the current validator count?
I'm hesitant to over-think this, I'm keen to get it onto our staging infra, monitor it on Grafana and then start tweaking. The worst case is that the queue fills up and we fall back to the same behaviour as before this PR.
Co-authored-by: Paul Hauner <[email protected]>
Co-authored-by: Paul Hauner <[email protected]>
Hey @divagant-martian, just checking if this is still "waiting on author" or if it's "ready for review" 🙏 |
I've marked this as ready-for-review as per offline discussions with @divagant-martian. Feel free to remove the label if I misread Diva 🙂 |
Hey! I'm back. It's ready for review. I included all your proposed changes via merge. It was missing a decision about queue length but since that's solved you can give it a check again :) |
I think this needs to merge |
Merge went out fine. What kind of conflicts to you think we are not noticing? Also it seems to be there is now a cargo audit issue that I'm checking |
Ah cool. I thought there might be some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, happy to merge once cargo-audit
is appeased. I think @michaelsproul is working on that
bors r+ |
## Issue Addressed #635 ## Proposed Changes - Keep attestations that reference a block we have not seen for 30secs before being re processed - If we do import the block before that time elapses, it is reprocessed in that moment - The first time it fails, do nothing wrt to gossipsub propagation or peer downscoring. If after being re processed it fails, downscore with a `LowToleranceError` and ignore the message.
Pull request successfully merged into unstable. Build succeeded: |
Issue Addressed
#635
Proposed Changes
LowToleranceError
and ignore the message.