Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: Start nodes in a new status to prevent lease transfers #96980

Closed

Conversation

andrewbaptist
Copy link
Contributor

Previously when a store started it immediately became a target for lease and replica transfers. This could cause problems if the store was still recovering from being down because it was behind on Raft updates. This patch adds a delay until publishing a membership status of ACTIVE until it has gone through its Raft backlog and cleaned up its LSM.

Epic: none
Release note (ops change): A node now transitions through an additional STARTING state on startup.

@blathers-crl
Copy link

blathers-crl bot commented Feb 10, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Previously when a store started it immediately became a target for lease
and replica transfers. This could cause problems if the store was still
recovering from being down because it was behind on Raft updates. This
patch adds a delay until publishing a membership status of ACTIVE until
it has gone through its Raft backlog and cleaned up its LSM.

Epic: none
Release note (ops change): A node now transitions through an additional
STARTING state on startup.
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a new enum for liveness membership feels a tad heavy-handed and stateful for what we want to achieve. It's persisted state that we'd be using to effectively inform remote nodes to not transfer leases to a freshly started node. But on remote stores we already observe this freshly restarted node's liveness epoch getting bumped, which happens as part of the liveness heartbeat loop:

incrementEpoch := true
heartbeatInterval := nl.livenessThreshold - nl.renewalDuration
ticker := time.NewTicker(heartbeatInterval)
defer ticker.Stop()
for {
select {
case <-nl.heartbeatToken:
case <-nl.stopper.ShouldQuiesce():
return
}
// Give the context a timeout approximately as long as the time we
// have left before our liveness entry expires.
if err := contextutil.RunWithTimeout(ctx, "node liveness heartbeat", nl.renewalDuration,
func(ctx context.Context) error {
// Retry heartbeat in the event the conditional put fails.
for r := retry.StartWithCtx(ctx, retryOpts); r.Next(); {
oldLiveness, ok := nl.Self()
if !ok {
nodeID := nl.gossip.NodeID.Get()
liveness, err := nl.getLivenessFromKV(ctx, nodeID)
if err != nil {
log.Infof(ctx, "unable to get liveness record from KV: %s", err)
if grpcutil.IsConnectionRejected(err) {
return err
}
continue
}
oldLiveness = liveness
}
if err := nl.heartbeatInternal(ctx, oldLiveness, incrementEpoch); err != nil {

And as of aece96b, we're also gossiping each node's IO overload score periodically. It seems to me then we have all the information we need on remote stores. Am I missing something? Also note that by only using node-level state to prevent lease transfers, we're making multi-store setups behave a tad awkwardly; a single IO overloaded store would prevent lease transfers to replicas on other stores that could've very well have taken it.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@andrewbaptist
Copy link
Contributor Author

Its not quite right that we have all the information we need, but I am going to put this PR on hold waiting the results of the allocator changes #97142. The underlying problem is that there is a time interval (10 sec - 5 min) before we get to IO overload and can start using this value effectively. During this window it is STILL bad to transfer leases to this store. So I think the better thing is to wait for only the Raft catchup, and not the IO overload signal once I bring this back. I also need to wait for #97044 to complete which will allow accurately knowing the state of Raft.

Finally I agree that this is a little heavy handed since this is a Node level metric vs a Store level metric. The Raft catchup is also store level, so I can change to using that once it is in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants