-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad lifecyle prestart/non-sidecar task not restarting(before main task) when reboot node #9840
Comments
Hi @johnzhanghua! I think there's some overlap here with what you're seeing in #9841 but...
I don't think garbage collection has anything to do with it in this case, especially given the status you're seeing. The only time the alloc dir would be garbage collected is if the entire allocation has been marked as failed (in this case that would typically because the client has been offline long enough that the server has marked its allocations as lost). In that case, when the Nomad client comes back online, it will garbage collect the entire allocation, and it should be rescheduled. Individual tasks won't be restarted. So what you're seeing here is similar to #9841 but we're hitting a corner case where the prestart task isn't being re-run because we haven't re-entered the hook that triggers it when the entire host restarts. I'd be curious to see if we can replicate this with just a client restart and not a full host restart. |
Hi @johnzhanghua I wanted to follow up on this one. Something that makes this a little complicated to figure out with respect to a client host restarting (and not just the agent) is that I would expect in almost all cases for all the allocations on the host to be marked "lost" if the client host has been restarted. Are you managing to get the host restarted and Nomad back up before the servers have marked the node as lost? Or are you running the server and client on the same node? In that case, the behavior for a host restart honestly isn't that well defined (because it's a poorly supported topology) but I'd expect to lose all tasks. |
@tgross Yes, we are running the server and client on the same node. In the test, it is a single node, but it happens for multiple nodes during restart all the nodes. I will try restart the nodes one after another, see how it goes. |
Nomad version
Nomad v0.12.0 (8f7fbc8)
Operating system and Environment details
Centos 7.5 VM env on virtualbox 6.1
Issue
For a
main
task which depends on the output of theprestart
task, for example, a output file at sharedalloc
dir, in this case. When restarting the nomad host, and the output file in thealloc
dir has beengarbage collected
(A guess, not 100% sure), while thepretask
is not executed, which lead to the main task always fail.From the alloc status of the prestart task, it only shows the task received log :
From the understanding of
main
/prestart
task's dependency, like the example of waiting for db ready at https://www.nomadproject.io/docs/job-specification/lifecycleIt looks
prestart
task can't be a one-time only job. Restarting themain
task should always restart theprestart
task.Also the reboot test below shows the issue that, there is no reliable way to pass the output from one task to another dependent task, by using the shared alloc dir.
Reproduction steps
client { gc_max_allocs = 1 }
, for quickly reproduce the issuepending
stateJob file (if appropriate)
The text was updated successfully, but these errors were encountered: