validator-registration: simplify & optimize duty execution #2030

iurii-ssv · 2025-02-07T13:11:42Z

This PR clarifies validator-registration flow and also aims to simplify & improve the way validator-registrations are sent to Beacon node, namely we want to:

submit each validator-registration once per epoch (like Lighthouse or Prysm)
submit newly produced registrations next slot (they waited 10 epochs, so lets not make them wait another 1)
to reduce BN load - don't submit everything at the same time (also keep in mind that multiple operators in cluster can be connected to the same Beacon node - that might create bottleneck scenario if not accounted for)

codecov · 2025-02-07T13:15:02Z

Codecov Report

Attention: Patch coverage is 26.15385% with 48 lines in your changes missing coverage. Please review.

Project coverage is 47.8%. Comparing base (df0b510) to head (1c1639f).

Files with missing lines	Patch %	Lines
beacon/goclient/validator.go	25.9%	38 Missing and 2 partials ⚠️
doppelganger/mock.go	0.0%	6 Missing ⚠️
operator/duties/validatorregistration.go	50.0%	2 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

y0sher

lgtm. Although as you mentioned I don't think this is a real issue, since the cache is by pubkey, we are overriding the pubkey each time and the cache doesn't grow. There are edge cases where a validator is removed and it might stay in the cache forever unused.

oleg-ssvlabs

Good find. lgtm

moshe-blox · 2025-03-19T11:29:35Z

@iurii-ssv it may in theory grow perpetually, however its bounded by the amount of registered active validators, so should never be a problem

whether we can drop the registrations or not depends on how often we should be submitting registrations

i recall that other validator clients submit every epoch, and if thats the case then we should probably stick to it

however i did notice that we seem to submit all registrations every slot? if so that does seem like a bit of an overkill 😄

we should probably be submitting for every validator only once per epoch

iurii-ssv · 2025-03-20T10:28:35Z

I don't really know how validator registrations are supposed to work in full, but cleaning up cache helps with "submitting too often / submitting too much data" (since we only submit when cache is not empty)

submitting every epoch is another way to reduce the amount of requests/data sent I guess, but

maybe cleaning up cache (+ submit every slot) is good enough ?
maybe we want to submit frequently so that the delay of submission is as low as possible (but maybe that doesn't affect anything really)

@moshe-blox ^

moshe-blox · 2025-03-20T11:15:38Z

I don't really know how validator registrations are supposed to work in full

i think that's once per epoch, you can verify it by reading other validator clients such as Lighthouse

if we remove from the cache, we'll only submit once per 10 epochs, which isn't ideal because if the above is true

what we can do maybe is a hot & cold cache system, where we move submitted registrations to the cold cache which is submitted only once per epoch (if slot%32==0), where as the hot cache is submitted every slot to keep the small delay

maybe pendingRegistrations and activeRegistrations

beacon/goclient/goclient.go

beacon/goclient/proposer.go

operator/duties/validatorregistration.go

scripts/spec-alignment/differ.config.yaml

olegshmuelov · 2025-03-25T16:03:47Z

beacon/goclient/goclient.go

+	// registrations is a set of validator-registrations (their latest versions) to be sent to
+	// Beacon node to ensure various entities in Ethereum network, such as Relays, are aware of
+	// participating validators
+	registrations map[phase0.BLSPubKey]*validatorRegistration


Would it make sense to eventually expire old validator registrations from the registrations map (e.g., if a validator is exited/slashed or removed)?

It would make sense to do that, but I don't think it would justify the added complexity (since "validator is exited/slashed or removed" events are rare and node-restarts solve it eventually)

validator X is initially registered with operators 1,2,3,4

fee recipient Y is submitted via operator 1,2,3,4

validator X switches to a new cluster: 2,3,4,5

fee recipient is updated to Z

operator 5 now submits registration with recipient Z

but operator 1 (still holding old state) may still submit with Y

if validator X gets a proposal duty in the meantime → block may be built with outdated fee recipient Y

Hmm, that's an interesting scenario you are describing,

I'm not sure how realistic/bad it actually is (maybe @moshe-blox or @y0sher could chime in on it) but from what I understand:

both Y and Z recipients "belong" to the "same user" (the owner of Validator X) - meaning he should be able to get those funds even though they might be sent to old address due to all these circumstances

eventually (after a restart) operator 1 will stop sending out this old info, and so it will resolve - so user will only need to "have access" to his old address for maybe ~ a week to pull those funds out of address Y

thus if it's unlikely to ever happen it doesn't seem too bad ? wdyt

Also, that seems to be a problem for stage version as well, right ?

both Y and Z recipients "belong" to the "same user" (the owner of Validator X) - meaning he should be able to get those funds even though they might be sent to old address due to all these circumstances

The assumption that "Y and Z belong to the same user" is risky.
In a permissionless system like Ethereum, the protocol can't rely on that being true.

Also, that seems to be a problem for stage version as well, right ?

If registrations aren’t removed from the cache after validator removals, then yes this issue likely exists in both stage and production.

I pretty much agree with all your input, just not sure where exactly it fits on our priority list - so I'll create an issue to document the problem for now (so we don't loose it) #2105 but it doesn't have to a part of this PR

olegshmuelov · 2025-03-25T16:04:37Z

beacon/goclient/validator.go

+			// Select registrations to submit.
+			gc.registrationMu.Lock()
+			allRegistrations := maps.Values(gc.registrations)
+			gc.registrationMu.Unlock()
+
+			registrations := make([]*api.VersionedSignedValidatorRegistration, 0)
+			for _, r := range allRegistrations {
+				validatorPk, err := r.PubKey()
+				if err != nil {
+					gc.log.Error("Failed to get validator pubkey", zap.Error(err), fields.Slot(currentSlot))
+					continue
+				}
+
+				// Distribute the registrations evenly across the epoch based on the pubkeys.
+				slotInEpoch := uint64(currentSlot) % gc.network.SlotsPerEpoch()
+				validatorHash := sha256.Sum256(validatorPk[:8])
+				validatorDescriptor := binary.LittleEndian.Uint64(validatorHash[:])
+				shouldSubmit := validatorDescriptor%gc.network.SlotsPerEpoch() == slotInEpoch
+
+				if r.new || shouldSubmit {
+					r.new = false
+					registrations = append(registrations, r.VersionedSignedValidatorRegistration)
+				}
+			}


With ~1000 registrations max, the tradeoff between copying maps.Values() and filtering inside the lock seems minor. Was this mainly to reduce lock time and avoid blocking SubmitValidatorRegistration()?

setting r.new = false after unlocking could lead to a race — if the registration gets replaced before that line runs, it might cause unnecessary resubmissions.

setting r.new = false after unlocking could lead to a race — if the registration gets replaced before that line runs, it might cause unnecessary resubmissions

Right, not sure if in stage branch all racy behavior is avoided - but here in this PR we certainly have races like that - still I think these are non-harmful (sending registration couple times extra isn't that bad)

while on the upside it simplifies mutex usage quite a bit

With ~1000 registrations max, the tradeoff between copying maps.Values() and filtering inside the lock seems minor. Was this mainly to reduce lock time and avoid blocking SubmitValidatorRegistration()?

I think there just is no need to hold this mutex locked while filtering,

unless you want use this mutex to make operations involving r.new atomic ... which IMO isn't worth the added complexity (it's also not super obvious to lock this mutex to achieve that - comment would help somewhat with that ... but long comments aren't ideal either)

I understand the race here is low-impact — re-submitting a registration isn’t a big deal.
That said, using maps.Values() and then mutating .new outside the lock introduces a data race on the struct itself. The lock protects the map, but not the underlying pointer, which may have already been replaced concurrently (e.g., via SubmitValidatorRegistration).
It might be worth either moving the mutation under the lock or rethinking how submission state is tracked — just to ensure consistency and avoid surprises if the logic evolves.

Oh wait you are right, I meant for new to be atomic.Bool

I thought I've defined it like that already but looks like I forgot - changed it now 80f8d94 so the data race should no longer be an issue

Edit: on a second thought I think there isn't a data race in what you described above because all SubmitValidatorRegistration does is 1-time initialization that needs to be "propagated" to go-routine that's running registrationSubmitter func (and gc.registrationMu actually ensures that "propagation"/synchronization happens correctly for us) - and then go-routine that's running registrationSubmitter reads/writes that data address sequentially (nobody reads/modifies it after that concurrently to it - it is the only go-routine that accesses this address from that point forward)

so I'm reverting 80f8d94 for now as unnecessary, @olegshmuelov let me know if I'm missing something

Thanks for the detailed explanation! Appreciate you thinking it through.
Just worth noting - this is still a data race under the Go memory model, and relying on intuition like “only one goroutine accesses it after init” isn’t safe in concurrent code. Even benign races are better avoided, especially in infra-level systems.
That said, fine to leave it as-is if we agree the impact is negligible.

olegshmuelov · 2025-03-25T16:04:59Z

beacon/goclient/validator.go

+				}
+
+				// Distribute the registrations evenly across the epoch based on the pubkeys.
+				slotInEpoch := uint64(currentSlot) % gc.network.SlotsPerEpoch()


Minor: slotInEpoch := uint64(currentSlot) % gc.network.SlotsPerEpoch() could be moved outside the loop since it doesn't change per validator

True, but I thought trivial optimizations like that are done automatically by compiler, Golang compiler is kinda weak though in that sense last time I've read about it (compared to C++ or Java which are more aggressive) ...

but regardless I would prefer code readability over some processing overhead (unless it's in some super-hot execution path)

cc @oleg-ssvlabs @moshe-blox @y0sher let me know if you think otherwise (just so we get on the same page about this)

olegshmuelov · 2025-03-25T17:31:57Z

This PR improves and clarifies how validator-registrations are submitted to the Beacon Node. The design separates the process into:

Duty Scheduling (ValidatorRegistrationHandler)
- Every slot, the handler loads participating shares for epoch + frequencyEpochs (currently +10).
- It executes validator-registration duties only if:
```
uint64(share.ValidatorIndex) % registrationSlots == uint64(slot) % registrationSlots
```
Duty Execution (ProcessPreConsensus) → Submission
- Once reach pre consensus quorum, a duty sends a VersionedSignedValidatorRegistration to GoClient.SubmitValidatorRegistration().
Registration Submission (registrationSubmitter)
- Runs every slot, checks all registrations.
- Submits:
  - Fresh (.new) registrations immediately
  - All registrations once per epoch, distributed deterministically by pubkey hash
- Batches submissions in groups of 500 to avoid BN overload.

Known Issues:
Delayed submission for new added validators or fee recipient updates.
There’s a timing edge case where a new validator registration or fee recipient update might not take effect in time for a proposal. Here’s how it can happen:

A validator is added to the SSV network, or its fee recipient is updated
This creates a new validator registration that needs to be submitted to the Beacon Node.
The duty scheduler attempts to assign a BNRoleValidatorRegistration duty
- But duties are only scheduled if:
```
validatorIndex % registrationSlots == slot % registrationSlots
```
- If the validator’s index doesn’t match the current slot (especially near the end of the frequencyEpochs window), the duty might be skipped for now.
In the meantime, a block proposal duty is triggered for that validator
The proposal is built and submitted with the old or fallback fee recipient (e.g., the owner address)
since the Beacon Node hasn’t received the updated validator registration yet.

another spec side issue:
ssvlabs/ssv-spec#504
The ValidatorRegistration duty does not carry the fee recipient directly — instead, it dynamically reads it from a shared state, which can be modified by the FeeRecipientUpdate event handler during the pre-consensus process.

This introduces a potential data race, where the fee recipient may change between signing root calculation and actual signature generation.

If this happens, reconstructed signing roots won't match, causing signature reconstruction to fail and the duty to break.

olegshmuelov · 2025-03-26T08:09:25Z

Known Issues: Delayed submission for new added validators or fee recipient updates. There’s a timing edge case where a new validator registration or fee recipient update might not take effect in time for a proposal. Here’s how it can happen:
A validator is added to the SSV network, or its fee recipient is updated
This creates a new validator registration that needs to be submitted to the Beacon Node.
The duty scheduler attempts to assign a BNRoleValidatorRegistration duty

But duties are only scheduled if:
validatorIndex % registrationSlots == slot % registrationSlots
If the validator’s index doesn’t match the current slot (especially near the end of the frequencyEpochs window), the duty might be skipped for now.
In the meantime, a block proposal duty is triggered for that validator

The proposal is built and submitted with the old or fallback fee recipient (e.g., the owner address)
since the Beacon Node hasn’t received the updated validator registration yet.

If we aim to further improve and harden the validator registration flow, we should consider addressing at least one of the known issues above

This reverts commit 80f8d94.

iurii-ssv requested a review from nkryuchkov February 7, 2025 13:11

y0sher approved these changes Feb 9, 2025

View reviewed changes

nkryuchkov approved these changes Feb 10, 2025

View reviewed changes

oleg-ssvlabs approved these changes Mar 18, 2025

View reviewed changes

iurii-ssv commented Mar 20, 2025

View reviewed changes

beacon/goclient/goclient.go Outdated Show resolved Hide resolved

iurii-ssv commented Mar 20, 2025

View reviewed changes

beacon/goclient/proposer.go Outdated Show resolved Hide resolved

iurii-ssv force-pushed the beacon-client-clean-up-registrations-cache branch from 2bae57f to 504d1ba Compare March 20, 2025 19:50

iurii-ssv changed the title ~~beacon-client: clean up registrations cache~~ validator-registration: simplify & optimize duty execution Mar 20, 2025

iurii-ssv marked this pull request as draft March 20, 2025 19:51

validator-registration: simplify & optimize duty execution

273e455

iurii-ssv force-pushed the beacon-client-clean-up-registrations-cache branch from c1284f3 to 273e455 Compare March 21, 2025 11:09

iurii-ssv marked this pull request as ready for review March 21, 2025 11:15

iurii-ssv requested a review from olegshmuelov March 21, 2025 11:24

iurii-ssv commented Mar 21, 2025

View reviewed changes

operator/duties/validatorregistration.go Outdated Show resolved Hide resolved

iurii-ssv added 3 commits March 21, 2025 13:57

remove artifacts (fix compilation)

0cd2fe7

simplify validator-registration duty further

c7359c6

fix spec alignment

e45fe78

iurii-ssv commented Mar 21, 2025

View reviewed changes

scripts/spec-alignment/differ.config.yaml Show resolved Hide resolved

submit registrations in batches

8bbbfb0

iurii-ssv force-pushed the beacon-client-clean-up-registrations-cache branch from a0ce8b0 to 8bbbfb0 Compare March 21, 2025 15:09

iurii-ssv added 2 commits March 24, 2025 13:39

log every chunk when submitting registrations (not just the end result)

d2d9c80

revert changes that would require a hard-fork

cea6d87

olegshmuelov reviewed Mar 25, 2025

View reviewed changes

use hash instead of validator pubkey as is to get more even distribution

7722a70

iurii-ssv force-pushed the beacon-client-clean-up-registrations-cache branch from 7e385aa to 7722a70 Compare March 25, 2025 16:24

iurii-ssv added 3 commits March 26, 2025 11:08

use atomic bool to avoid data race

80f8d94

Revert "use atomic bool to avoid data race"

e33686d

This reverts commit 80f8d94.

use xxhash instead of sha for efficiency

03c6e35

iurii-ssv mentioned this pull request Mar 26, 2025

validator changing cluster & fee-recipient (both at the same time) could be problematic #2105

Open

Merge branch 'stage' into beacon-client-clean-up-registrations-cache

1c1639f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validator-registration: simplify & optimize duty execution #2030

validator-registration: simplify & optimize duty execution #2030

iurii-ssv commented Feb 7, 2025 •

edited

Loading

codecov bot commented Feb 7, 2025 •

edited

Loading

y0sher left a comment

oleg-ssvlabs left a comment

moshe-blox commented Mar 19, 2025

iurii-ssv commented Mar 20, 2025 •

edited

Loading

moshe-blox commented Mar 20, 2025 •

edited

Loading

olegshmuelov Mar 25, 2025

iurii-ssv Mar 25, 2025

olegshmuelov Mar 25, 2025

iurii-ssv Mar 25, 2025 •

edited

Loading

olegshmuelov Mar 26, 2025

olegshmuelov Mar 26, 2025

iurii-ssv Mar 26, 2025

olegshmuelov Mar 25, 2025

iurii-ssv Mar 25, 2025 •

edited

Loading

olegshmuelov Mar 26, 2025

iurii-ssv Mar 26, 2025 •

edited

Loading

olegshmuelov Mar 26, 2025

olegshmuelov Mar 25, 2025

iurii-ssv Mar 25, 2025 •

edited

Loading

olegshmuelov commented Mar 25, 2025 •

edited

Loading

olegshmuelov commented Mar 26, 2025

validator-registration: simplify & optimize duty execution #2030

Are you sure you want to change the base?

validator-registration: simplify & optimize duty execution #2030

Conversation

iurii-ssv commented Feb 7, 2025 • edited Loading

codecov bot commented Feb 7, 2025 • edited Loading

Codecov Report

y0sher left a comment

Choose a reason for hiding this comment

oleg-ssvlabs left a comment

Choose a reason for hiding this comment

moshe-blox commented Mar 19, 2025

iurii-ssv commented Mar 20, 2025 • edited Loading

moshe-blox commented Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iurii-ssv Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iurii-ssv Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iurii-ssv Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iurii-ssv Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

olegshmuelov commented Mar 25, 2025 • edited Loading

olegshmuelov commented Mar 26, 2025

iurii-ssv commented Feb 7, 2025 •

edited

Loading

codecov bot commented Feb 7, 2025 •

edited

Loading

iurii-ssv commented Mar 20, 2025 •

edited

Loading

moshe-blox commented Mar 20, 2025 •

edited

Loading

iurii-ssv Mar 25, 2025 •

edited

Loading

iurii-ssv Mar 25, 2025 •

edited

Loading

iurii-ssv Mar 26, 2025 •

edited

Loading

iurii-ssv Mar 25, 2025 •

edited

Loading

olegshmuelov commented Mar 25, 2025 •

edited

Loading