-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup validator uptime computation #1270
Conversation
consensus/istanbul/uptime/uptime.go
Outdated
logger := um.logger.New("func", "Backend.updateValidatorScores", "epoch", epoch) | ||
logger.Trace("Updating validator scores") | ||
|
||
// The denominator is the (last block - first block + 1) of the val score tally window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be clearer if we explicitly say this is the maximum possible uptime score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree that denominator
doesn't really inform anything.
It's not the max uptime score (since score it's a ratio), but the total blocks that we are monitoring
So, what about:
// The totalMonitoredBlocks are the total number of block on which we monitor uptime for the epoch
totalMonitoredBlocks := um.MonitoringWindowSize(epoch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definite improvement, but I think there's further we can go while we're at it. Added a bunch of comments, you can decide which of them make sense.
consensus/istanbul/uptime/uptime.go
Outdated
// than couting up to the second to last one. | ||
lastBlockToMonitor := epochLastBlock - 1 | ||
|
||
return firstBlockToMonitor, lastBlockToMonitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ask the question you mentioned to me previously, what happens if lastBlockToMontor < firstBlockToMonitor
? (i.e. if lookbackWindowSize > epochSize - 2
). Do we have safeguards to prevent that from happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll review this on when finishing #1136
{"tally on first epoch", args{1, 10, 2}, 1 + 2, 9}, | ||
{"tally on second epoch", args{2, 10, 2}, 11 + 2, 19}, | ||
{"lookback window too big", args{1, 10, 10}, 11, 9}, | ||
// TODO: Add test cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO intended to be completed before finalizing the PR? If not, I would remove it.
}{ | ||
{"tally on first epoch", args{1, 10, 2}, 1 + 2, 9}, | ||
{"tally on second epoch", args{2, 10, 2}, 11 + 2, 19}, | ||
{"lookback window too big", args{1, 10, 10}, 11, 9}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical about including a case with an invalid combination of lookback window size and epoch size. Is this a case that we are supposed to support or that should lead to a fatal error somewhere? If it's the latter, we should test elsewhere does it does lead to a fatal error, and not test it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll review this on when finishing #1136
4bc78f9
to
1923025
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much improved. Left some minor comments here and there. Also, TestUptimeStorage
is failing due to stray references to ScoreTally
. That made me wonder: since the uptime store uses the db, would renaming a field cause problems when it comes to backwards compatibility?
Also, github says there are now merge conflicts (my guess is it's since 1.9.12 is merged to master now) 😬
Fixed the stray reference. About the field renaming, it doesn't cause problems. since RLP (the encoding) it's unaware of the names, only the structure of the type.. here's a test I've just did to verify this func TestUptimeStorageChange(t *testing.T) {
db := NewMemoryDatabase()
epoch := uint64(0)
// Create a test uptime to move around the database and make sure it's really new
if entry := ReadAccumulatedEpochUptime(db, epoch); entry != nil {
t.Fatalf("Non existent uptime returned: %v", entry)
}
expectedStoredUptime := &uptime.Uptime{
Entries: []uptime.UptimeEntry{
{UpBlocks: 0, LastSignedBlock: 1},
{UpBlocks: 2, LastSignedBlock: 2},
{UpBlocks: 8, LastSignedBlock: 8},
},
LatestBlock: 5,
}
type OldUptimeEntry struct {
// Numbers of blocks validator is considered UP within monitored window
UpBlocks uint64
LastSignedBlock uint64
}
type OldUptime struct {
LatestBlock uint64
Entries []OldUptimeEntry
}
// Write and verify the uptime in the database
uptime := &OldUptime{
Entries: []OldUptimeEntry{
{UpBlocks: 0, LastSignedBlock: 1},
{UpBlocks: 2, LastSignedBlock: 2},
{UpBlocks: 8, LastSignedBlock: 8},
},
LatestBlock: 5,
}
data, err := rlp.EncodeToBytes(uptime)
if err != nil {
t.Fatalf("Failed to RLP encode updated uptime %v", err)
}
if err := db.Put(uptimeKey(epoch), data); err != nil {
t.Fatalf("Failed to store updated uptime: %v", err)
}
if entry := ReadAccumulatedEpochUptime(db, epoch); entry == nil {
t.Fatalf("Stored uptime not found")
} else if !reflect.DeepEqual(entry, expectedStoredUptime) {
t.Fatalf("Retrieved uptime mismatch: have %v, want %v", entry, uptime)
}
// Delete the uptime and verify the execution
DeleteAccumulatedEpochUptime(db, epoch)
if entry := ReadAccumulatedEpochUptime(db, epoch); entry != nil {
t.Fatalf("Deleted uptime returned: %v", entry)
}
} |
Instead of one method for first and last block on window, have a method to get the window [first, last] pair. - Add godocs to a few utils.go methods - Add tests to a few utils.go methods
- Centralize all uptime related logic into istanbul/uptime pkg. - Use interface for storage to ease testing later
We define a `Window` type and then uptime block are the sum of all blocks where block is within monitoredWindow, and lastSignedBlock for the validator is within currentLookback window. Additionally, updateUptime() function now receives the block is to monitor and the corresponding singatures, instead of receiving a block and the parent block signatures. So math is simplified. MonitoringWindow() now returns the blocks whose signatures we want to monitor and not the blocks whose parentBlock signatures we want to monitor.
c41c105
to
42f0da3
Compare
Rebased to master to fix conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, very nice!
Description
Several people had trouble following current uptime scoring logic. It looks like a pool for off by one errors. This PR attempts to do some cleanup to make it more maintainable.
Window
which simplifies reasoning about UP blocksWe can now summarize uptime computation as:
monitoringWindow(epoch):
Range of blocks from an epoch where UP status will be monitor. Represents also the maximum number of blocks within an epoch a validator can be considered UPcurrentLookbackWindow(blockNumber)
Current range of blocks we consider to label a validator a UP. A valiadator is considered UP when it has sign at least a block in the currentLookbackWindowvalidator.UpBlocks
total number of blocks within an epoch a validator has been labeled as UPThen the score is:
Finally, for each block we process, the
LastSignedBlock
for each validator is updated, and thenvalidator.UPBlocks
is increased if:monitoringWindow.contains(blockNumber) && currentLookbackWindow.contains(lastSignedBlock)
Other changes
Tested
Related issues
Backwards compatibility
Yes