Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad lifecyle prestart/non-sidecar task not restarting(before main task) when reboot node #9840

Open
johnzhanghua opened this issue Jan 18, 2021 · 4 comments

Comments

@johnzhanghua
Copy link

johnzhanghua commented Jan 18, 2021

Nomad version

Nomad v0.12.0 (8f7fbc8)

Operating system and Environment details

Centos 7.5 VM env on virtualbox 6.1

Issue

For a main task which depends on the output of the prestart task, for example, a output file at shared alloc dir, in this case. When restarting the nomad host, and the output file in the alloc dir has been garbage collected(A guess, not 100% sure), while the pretask is not executed, which lead to the main task always fail.

From the alloc status of the prestart task, it only shows the task received log :

2021-01-18T12:06:51Z  Received    Task received by client

From the understanding of main/prestart task's dependency, like the example of waiting for db ready at https://www.nomadproject.io/docs/job-specification/lifecycle
It looks prestart task can't be a one-time only job. Restarting the main task should always restart the prestart task.

Also the reboot test below shows the issue that, there is no reliable way to pass the output from one task to another dependent task, by using the shared alloc dir.

Reproduction steps

  • Update nomad.hcl with the config client { gc_max_allocs = 1 }, for quickly reproduce the issue
  • Run the following job file with nomad job run <job_file>
  • Show the job status
nomad job status test
ID            = test
Name          = test
Submit Date   = 2021-01-18T11:47:41Z
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
test        0       0         1        0       0         0

Latest Deployment
ID          = ff4a14b1
Status      = running
Description = Deployment is running

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
test        1        1       0        0          2021-01-18T11:57:41Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
df20b14a  1d358dc0  test        0        run      running  15s ago  7s ago

nomad alloc status df20b14a
ID                  = df20b14a-9242-1925-bb4f-ba955df54a82
Eval ID             = 8bb6ff37
Name                = test.test[0]
Node ID             = 1d358dc0
Node Name           = bne_dev4
Job ID              = test
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 23s ago
Modified            = 5s ago
Deployment ID       = ff4a14b1
Deployment Health   = healthy

Task "test-pre" (prestart) is "dead"
Task Resources
CPU        Memory       Disk     Addresses
0/100 MHz  0 B/300 MiB  300 MiB  

Task Events:
Started At     = 2021-01-18T11:47:42Z
Finished At    = 2021-01-18T11:47:42Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2021-01-18T11:47:42Z  Terminated  Exit Code: 0
2021-01-18T11:47:42Z  Started     Task started by client
2021-01-18T11:47:41Z  Task Setup  Building Task Directory
2021-01-18T11:47:41Z  Received    Task received by client

Task "test" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/100 MHz  156 KiB/300 MiB  300 MiB  

Task Events:
Started At     = 2021-01-18T11:47:49Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2021-01-18T11:47:49Z  Started     Task started by client
2021-01-18T11:47:42Z  Driver      Downloading image
2021-01-18T11:47:42Z  Task Setup  Building Task Directory
2021-01-18T11:47:41Z  Received    Task received by client
  • Reboot(repeat reboot) the node, until the job is at pending state
  • Show the job status
nomad job status test
ID            = test
Name          = test
Submit Date   = 2021-01-18T11:47:41Z
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
test        0       1         0        0       1         0

Latest Deployment
ID          = ff4a14b1
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
test        1        1       1        0          2021-01-18T11:57:59Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
df20b14a  1d358dc0  test        0        run      pending  20m12s ago  8s ago
nomad alloc status df20b14a
ID                  = df20b14a-9242-1925-bb4f-ba955df54a82
Eval ID             = 8bb6ff37
Name                = test.test[0]
Node ID             = 1d358dc0
Node Name           = bne_dev4
Job ID              = test
Job Version         = 0
Client Status       = pending
Client Description  = No tasks have started
Desired Status      = run
Desired Description = <none>
Created             = 20m19s ago
Modified            = 2s ago
Deployment ID       = ff4a14b1
Deployment Health   = healthy

Task "test-pre" (prestart) is "dead"
Task Resources
CPU      Memory   Disk     Addresses
100 MHz  300 MiB  300 MiB  

Task Events:
Started At     = 2021-01-18T11:47:42Z
Finished At    = 2021-01-18T11:47:42Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
**2021-01-18T12:06:51Z  Received    Task received by client**
2021-01-18T11:47:42Z  Terminated  Exit Code: 0
2021-01-18T11:47:42Z  Started     Task started by client
2021-01-18T11:47:41Z  Task Setup  Building Task Directory
2021-01-18T11:47:41Z  Received    Task received by client

Task "test" is "pending"
Task Resources
CPU        Memory           Disk     Addresses
0/100 MHz  164 KiB/300 MiB  300 MiB  

Task Events:
Started At     = 2021-01-18T12:07:41Z
Finished At    = N/A
Total Restarts = 4
Last Restart   = 2021-01-18T12:07:46Z

Recent Events:
Time                  Type        Description
2021-01-18T12:07:58Z  Driver      Downloading image
2021-01-18T12:07:46Z  Restarting  Task restarting in 12.425820555s
2021-01-18T12:07:46Z  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
2021-01-18T12:07:41Z  Started     Task started by client
2021-01-18T12:07:33Z  Driver      Downloading image
2021-01-18T12:07:21Z  Restarting  Task restarting in 11.619813438s
2021-01-18T12:07:21Z  Terminated  Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
2021-01-18T12:07:16Z  Started     Task started by client
2021-01-18T12:07:08Z  Driver      Downloading image
2021-01-18T12:06:57Z  Restarting  Task restarting in 11.438944209s

Job file (if appropriate)

job "test" {
  datacenters = ["dc1"]
  type = "service"

  group "test" {
    restart {
      interval = "6m"
      attempts = 10
      delay    = "10s"
      mode     = "delay"
    }

    # add prestart task
    task "test-pre" {
      driver = "docker"
      lifecycle {
        hook = "prestart"
        sidecar = false
      }

      config {
        image = "alpine:3.8"
        command = "sh"

        args = ["-c", "echo test > /alloc/test_file"]
      }
    }

    task "test" {
      driver = "docker"

      config {
        image = "alpine:3.8"
        command = "sh"

        args = ["-c", "if [ ! -s /alloc/test_file ]; then sleep 5; exit 1; else while sleep 3600; do :; done; fi"]
      }
    }
  }
}
@tgross
Copy link
Member

tgross commented Jan 19, 2021

Hi @johnzhanghua! I think there's some overlap here with what you're seeing in #9841 but...

When restarting the nomad host, and the output file in the alloc dir has been garbage collected(A guess, not 100% sure), while the pretask is not executed, which lead to the main task always fail.

I don't think garbage collection has anything to do with it in this case, especially given the status you're seeing. The only time the alloc dir would be garbage collected is if the entire allocation has been marked as failed (in this case that would typically because the client has been offline long enough that the server has marked its allocations as lost). In that case, when the Nomad client comes back online, it will garbage collect the entire allocation, and it should be rescheduled. Individual tasks won't be restarted.

So what you're seeing here is similar to #9841 but we're hitting a corner case where the prestart task isn't being re-run because we haven't re-entered the hook that triggers it when the entire host restarts. I'd be curious to see if we can replicate this with just a client restart and not a full host restart.

@johnzhanghua
Copy link
Author

Hi @tgross , the reason I thought of garbage reallocation is that the output file in the alloc dir, generated by the prestart task is gone.

It is similar to #9841, in the way that the prestart task not re-run, so the outputfile in the alloc dir is not re-generated.

@jazzyfresh jazzyfresh self-assigned this Feb 2, 2021
@tgross
Copy link
Member

tgross commented Feb 2, 2021

Hi @johnzhanghua I wanted to follow up on this one. Something that makes this a little complicated to figure out with respect to a client host restarting (and not just the agent) is that I would expect in almost all cases for all the allocations on the host to be marked "lost" if the client host has been restarted.

Are you managing to get the host restarted and Nomad back up before the servers have marked the node as lost? Or are you running the server and client on the same node? In that case, the behavior for a host restart honestly isn't that well defined (because it's a poorly supported topology) but I'd expect to lose all tasks.

@johnzhanghua
Copy link
Author

@tgross Yes, we are running the server and client on the same node. In the test, it is a single node, but it happens for multiple nodes during restart all the nodes.

I will try restart the nodes one after another, see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants