-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ReconnectModifyIndex to handle reconnect lifecycle #14948
Conversation
2940367
to
73b0129
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgfa29 it looks like there are some correctness issues here around the state store; let's chat internally about carrying this PR.
// ReconnectModifyIndex is used to determine if the server has processed the node reconnect. | ||
ReconnectModifyIndex uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure this gets onto the api.Allocation
struct as well.
// Set the reconnect modify index so that the scheduler can track that the reconnect has not been processed. | ||
alloc.ReconnectModifyIndex = ar.Alloc().AllocModifyIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value comes from the server:
// AllocModifyIndex is not updated when the client updates allocations. This
// lets the client pull only the allocs updated by the server.
But that made me remember there are two code paths in the state store for updating allocations: one for upserting allocs from the server and one for updating allocs from the client. But in any case neither of them are handling the ReconnectModifyIndex
field because for existing allocations (which is what we care about here), we copy the existing Allocation and then merge the needed fields over before inserting.
So we're not actually updating this field in Nomad's state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in client.go line 2033 it get's added to the stripped alloc during allocSync, that then gets sent to Node Update, which then updates state, and triggers an eval, When the eval fires, the index is set. We then have to unset it when applying the plan. Have you tried it out? I had logging in here during development showing it all flowed through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which then updates state
That's the bit where I don't see how it's happening. Any update of an existing object takes a copy first (ref state_store.go#L3474
) and then modifies the copy before inserting it. So if we haven't pulled in information that the client is authoritative on, the state isn't getting updated for the transaction.
I haven't had a chance to test it out thoroughly (still trying to get 1.4.2 out! 😁 ) but I suspect the reason it's "working" right now is because of the state store corruption on line 1223. That'll appear correct under some circumstances but won't have gone thru raft correctly.
@@ -1220,6 +1220,7 @@ func (n *Node) UpdateAlloc(args *structs.AllocUpdateRequest, reply *structs.Gene | |||
if evalTriggerBy != structs.EvalTriggerJobDeregister && | |||
alloc.ClientStatus == structs.AllocClientStatusUnknown { | |||
evalTriggerBy = structs.EvalTriggerReconnect | |||
alloc.ReconnectModifyIndex = allocToUpdate.ReconnectModifyIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment corrupts the state store because alloc
hasn't been copied after being queried from the state store. I'm fairly certain this line isn't needed at all, as the allocToUpdate
is what's getting added to the batch of updates and not alloc
.
Per our discussion, moving this out of 1.4.2 so that we don't risk rushing it out. |
Passing in some feedback from a customer. I think it might be related to this underlying issue since it is max_client_disconnect related, but I am not sure.
Does this seem related or should I make a new issue? |
I think I understand the problem now 😅 I have an alternative approach in #15068 that I think makes the disconnect/reconnect flows more similar and so easier to understand, but it's still an early work. I will keep investigating the problem to see which solution would be better. |
@mikenomitch I think this may be related to this problem. From our docs on
I think #14925 may prevent the job version from changing, which means you could end up with reused indexes. But it may be better to open a separate issue just in case. If it's the same problem we can close both issues. |
Closing this in favour of #15068. Thanks for the all the work and guidance on this issue @DerekStrickland! |
@lgfa29 I'm glad you found a good solution! |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Closes #14925
This PR fixes a bug where if an allocation with
max_client_disconnect
configured is on a node that disconnects, and then the node reconnects, future jobspec changes for that job get ignored until themax_client_disconnect
interval expires. Previous to this change,Allocation.Reconnected
naively just checked the last reconnect event time and the expiry.This PR:
ReconnectModifyIndex
field to theAllocation
struct.ReconnectModifyIndex
when a reconnect is processed by the clientClient.allocSync
to send theReconnectModifyIndex
when syncing client managed attributesNode.UpdateAlloc
to persist the incomingReconnectModifyIndex
when generating reconnect evalsAllocation.Reconnected
toAllocation.IsReconnecting
Allocation.IsReconnecting
to compare theReconnectModifyIndex
to theAllocModifyIndex
to determine if an allocation is reconnectingGenericScheduler.computeJobAllocs
to reset theReconnectModifyIndex
to0
when processingreconnectUpdates
and appends them toPlan.NodeAllocation
so that the updates get persisted