UI does not support ability to Start/Restart failed Allocation and tasks #9881

scyd-cb · 2021-01-25T01:45:03Z

Nomad version

Output from nomad version
Nomad v1.0.1

Operating system and Environment details

CentOS 8

Issue

While allocation and task are in running state there is a way to stop/restart allocation or restart task from the UI. However when allocation is in failed state (max attempts have been reached), there is no way to start/restart a failed allocation and task from the UI. only work around is to mark nomad client as ineligible and toggle back to eligible to restart allocation or to stop/start the whole job.

Reproduction steps

Run job with a failing task, let allocation failed after max attempt of restart have been reached. allocation will be in failed state, with no start/restart options from the UI.

Our intention is to use Nomad to manage or services using raw_exec driver as a replacement of systemd.

Job file (if appropriate)

here is an example of the job file we are using with some generic names
`job "myJob" {
datacenters = ["dc1"]
type = "system"

group "myGroup" {
constraint {
attribute = "${meta.nodeId}"
value = "node1"
}

restart {
  attempts = 3
  delay    = "15s"
  interval = "24h"
  mode     = "fail"
}

task "service1" {
  user = "user1"
  driver = "raw_exec"

  config {
    command = "/bin/service1"
  }
}

task "service2" {
  user = "user1"
  driver = "raw_exec"

  config {
    command = "/bin/service2"
  }
}

}
}`

screenshot of Running alloc/Failing alloc

screenshot of Running / Failing task

The text was updated successfully, but these errors were encountered:

DingoEatingFuzz · 2021-01-27T23:38:28Z

I am definitely seeing what you're seeing, but I think this is by design. If you attempt this same workflow from the CLI, you'll get the error message Unexpected response code: 500 (Task not running).

I suspect the original design is that if a task isn't running, it shouldn't be resurrected like this. Once a task is terminal, it is always terminal and the scheduler is free to use these resources elsewhere. Side-stepping this axiom has implications for rescheduling behavior, preemption behavior, and generally scheduling.

To avoid this, you have two options:

Set reschedule rules which will tell Nomad to attempt running an alloc on a different node after restart attempts are exhausted.
Stop/start the whole job from the UI which will go through the full scheduler process of finding capacity in your cluster and creating new allocations.

This isn't exactly my area of expertise, so I want to verify that this is indeed the intended design before closing.

DingoEatingFuzz · 2021-01-27T23:51:13Z

Alright, after chatting with @cgbaker, this is indeed not a supported use case. I hope these two alternative options help you out. If it still feels like something is missing please feel free to express the workflow you're looking for here.

github-actions · 2022-10-24T02:45:33Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added theme/ui type/enhancement theme/restart/reschedule labels Jan 25, 2021

tgross mentioned this issue Jan 25, 2021

"allocation lifecycle permissions" when trying to restart with a stopped allocation #7875

Closed

DingoEatingFuzz added stage/needs-discussion and removed type/enhancement labels Jan 27, 2021

DingoEatingFuzz closed this as completed Jan 27, 2021

github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI does not support ability to Start/Restart failed Allocation and tasks #9881

UI does not support ability to Start/Restart failed Allocation and tasks #9881

scyd-cb commented Jan 25, 2021

DingoEatingFuzz commented Jan 27, 2021 •

edited

Loading

DingoEatingFuzz commented Jan 27, 2021

github-actions bot commented Oct 24, 2022

UI does not support ability to Start/Restart failed Allocation and tasks #9881

UI does not support ability to Start/Restart failed Allocation and tasks #9881

Comments

scyd-cb commented Jan 25, 2021

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Job file (if appropriate)

DingoEatingFuzz commented Jan 27, 2021 • edited Loading

DingoEatingFuzz commented Jan 27, 2021

github-actions bot commented Oct 24, 2022

DingoEatingFuzz commented Jan 27, 2021 •

edited

Loading