Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ReconnectModifyIndex to handle reconnect lifecycle #14948

Closed
wants to merge 2 commits into from

Conversation

DerekStrickland
Copy link
Contributor

@DerekStrickland DerekStrickland commented Oct 18, 2022

Closes #14925

This PR fixes a bug where if an allocation with max_client_disconnect configured is on a node that disconnects, and then the node reconnects, future jobspec changes for that job get ignored until the max_client_disconnect interval expires. Previous to this change, Allocation.Reconnected naively just checked the last reconnect event time and the expiry.

This PR:

  • Adds a ReconnectModifyIndex field to the Allocation struct.
  • Updates the alloc runner to update the alloc ReconnectModifyIndex when a reconnect is processed by the client
  • Modifies Client.allocSync to send the ReconnectModifyIndex when syncing client managed attributes
  • Modifies Node.UpdateAlloc to persist the incoming ReconnectModifyIndex when generating reconnect evals
  • Renames Allocation.Reconnected to Allocation.IsReconnecting
  • Refactors Allocation.IsReconnecting to compare the ReconnectModifyIndex to the AllocModifyIndex to determine if an allocation is reconnecting
  • Updates all related code to match the new name and test the new logic
  • Updates GenericScheduler.computeJobAllocs to reset the ReconnectModifyIndex to 0 when processing reconnectUpdates and appends them to Plan.NodeAllocation so that the updates get persisted

@DerekStrickland DerekStrickland self-assigned this Oct 18, 2022
@tgross tgross self-requested a review October 18, 2022 21:01
@DerekStrickland DerekStrickland changed the title Update alloc ModifyTime when reconnect is processed Add ReconnectModifyIndex to handle reconnect lifecycle Oct 21, 2022
@DerekStrickland DerekStrickland marked this pull request as ready for review October 21, 2022 13:59
@DerekStrickland DerekStrickland added theme/edge type/bug backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line labels Oct 21, 2022
@DerekStrickland DerekStrickland added this to the 1.4.2 milestone Oct 21, 2022
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgfa29 it looks like there are some correctness issues here around the state store; let's chat internally about carrying this PR.

Comment on lines +9733 to +9734
// ReconnectModifyIndex is used to determine if the server has processed the node reconnect.
ReconnectModifyIndex uint64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure this gets onto the api.Allocation struct as well.

Comment on lines +1351 to +1352
// Set the reconnect modify index so that the scheduler can track that the reconnect has not been processed.
alloc.ReconnectModifyIndex = ar.Alloc().AllocModifyIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value comes from the server:

// AllocModifyIndex is not updated when the client updates allocations. This
// lets the client pull only the allocs updated by the server.

But that made me remember there are two code paths in the state store for updating allocations: one for upserting allocs from the server and one for updating allocs from the client. But in any case neither of them are handling the ReconnectModifyIndex field because for existing allocations (which is what we care about here), we copy the existing Allocation and then merge the needed fields over before inserting.

So we're not actually updating this field in Nomad's state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in client.go line 2033 it get's added to the stripped alloc during allocSync, that then gets sent to Node Update, which then updates state, and triggers an eval, When the eval fires, the index is set. We then have to unset it when applying the plan. Have you tried it out? I had logging in here during development showing it all flowed through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which then updates state

That's the bit where I don't see how it's happening. Any update of an existing object takes a copy first (ref state_store.go#L3474) and then modifies the copy before inserting it. So if we haven't pulled in information that the client is authoritative on, the state isn't getting updated for the transaction.

I haven't had a chance to test it out thoroughly (still trying to get 1.4.2 out! 😁 ) but I suspect the reason it's "working" right now is because of the state store corruption on line 1223. That'll appear correct under some circumstances but won't have gone thru raft correctly.

@@ -1220,6 +1220,7 @@ func (n *Node) UpdateAlloc(args *structs.AllocUpdateRequest, reply *structs.Gene
if evalTriggerBy != structs.EvalTriggerJobDeregister &&
alloc.ClientStatus == structs.AllocClientStatusUnknown {
evalTriggerBy = structs.EvalTriggerReconnect
alloc.ReconnectModifyIndex = allocToUpdate.ReconnectModifyIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment corrupts the state store because alloc hasn't been copied after being queried from the state store. I'm fairly certain this line isn't needed at all, as the allocToUpdate is what's getting added to the batch of updates and not alloc.

@tgross tgross modified the milestones: 1.4.2, 1.4.x Oct 24, 2022
@tgross
Copy link
Member

tgross commented Oct 24, 2022

Per our discussion, moving this out of 1.4.2 so that we don't risk rushing it out.

@mikenomitch
Copy link
Contributor

mikenomitch commented Oct 27, 2022

Passing in some feedback from a customer. I think it might be related to this underlying issue since it is max_client_disconnect related, but I am not sure.

I found scenario, where I see duplicated ALLOC_INDEXes in one JOB.
Below are required steps (Nomad 1.3.5):

We have job with count=2 and set option max_client_disconnect running as below:
NOMAD_ALLOC_INDEX=0 - NodeA
NOMAD_ALLOC_INDEX=1 - NodeB

We stop Nomad Agent on NodeA, after that we have temporarily 3 allocations:
NOMAD_ALLOC_INDEX=0 - NodeA (Unknown state)
NOMAD_ALLOC_INDEX=1 - NodeB
NOMAD_ALLOC_INDEX=0 - NodeC

When we start agent on NodeA, then again we have 2 allocations - first was recovered:
NOMAD_ALLOC_INDEX=0 - NodeA
NOMAD_ALLOC_INDEX=1 - NodeB
NodeC - allocation terminated here - as expected

I'm changing count from 2 to 3 and 3rd allocation appears with the same ALLOC_INDEX as the first one!!!
NOMAD_ALLOC_INDEX=0 - NodeA
NOMAD_ALLOC_INDEX=1 - NodeB
NOMAD_ALLOC_INDEX=0 - NodeD

Apart from that, in the state with duplicated ALLOC_INDEXes, EXEC feature in Nomad UI stopped working properly (for affected JOB only)

Does this seem related or should I make a new issue?

@lgfa29
Copy link
Contributor

lgfa29 commented Oct 27, 2022

I think I understand the problem now 😅

I have an alternative approach in #15068 that I think makes the disconnect/reconnect flows more similar and so easier to understand, but it's still an early work. I will keep investigating the problem to see which solution would be better.

@lgfa29
Copy link
Contributor

lgfa29 commented Oct 27, 2022

@mikenomitch I think this may be related to this problem. From our docs on NOMAD_ALLOC_INDEX:

The index is unique within a given version of a job

I think #14925 may prevent the job version from changing, which means you could end up with reused indexes.

But it may be better to open a separate issue just in case. If it's the same problem we can close both issues.

@lgfa29
Copy link
Contributor

lgfa29 commented Nov 2, 2022

Closing this in favour of #15068.

Thanks for the all the work and guidance on this issue @DerekStrickland!

@lgfa29 lgfa29 closed this Nov 2, 2022
@DerekStrickland
Copy link
Contributor Author

@lgfa29 I'm glad you found a good solution!

@github-actions
Copy link

github-actions bot commented Mar 3, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line theme/edge type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job updates not applied to reconnected allocations
4 participants