Skip to content

Commit

Permalink
Merge pull request kubernetes#26466 from adtac/suspend-1.21
Browse files Browse the repository at this point in the history
job.md: add section on suspended jobs
  • Loading branch information
k8s-ci-robot authored Mar 24, 2021
2 parents c6f0215 + cd61594 commit 6adc893
Show file tree
Hide file tree
Showing 2 changed files with 107 additions and 1 deletion.
104 changes: 103 additions & 1 deletion content/en/docs/concepts/workloads/controllers/job.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ weight: 50
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
As pods successfully complete, the Job tracks the successful completions. When a specified number
of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up
the Pods it created.
the Pods it created. Suspending a Job will delete its active Pods until the Job
is resumed again.

A simple case is to create one Job object in order to reliably run one Pod to completion.
The Job object will start a new Pod if the first Pod fails or is deleted (for example
Expand Down Expand Up @@ -404,6 +405,107 @@ Here, `W` is the number of work items.

## Advanced usage

### Suspending a Job

{{< feature-state for_k8s_version="v1.21" state="alpha" >}}

{{< note >}}
Suspending Jobs is available in Kubernetes versions 1.21 and above. You must
enable the `SuspendJob` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
on the [API server](docs/reference/command-line-tools-reference/kube-apiserver/)
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/)
in order to use this feature.
{{< /note >}}

When a Job is created, the Job controller will immediately begin creating Pods
to satisfy the Job's requirements and will continue to do so until the Job is
complete. However, you may want to temporarily suspend a Job's execution and
resume it later. To suspend a Job, you can update the `.spec.suspend` field of
the Job to true; later, when you want to resume it again, update it to false.
Creating a Job with `.spec.suspend` set to true will create it in the suspended
state.

When a Job is resumed from suspension, its `.status.startTime` field will be
reset to the current time. This means that the `.spec.activeDeadlineSeconds`
timer will be stopped and reset when a Job is suspended and resumed.

Remember that suspending a Job will delete all active Pods. When the Job is
suspended, your [Pods will be terminated](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
with a SIGTERM signal. The Pod's graceful termination period will be honored and
your Pod must handle this signal in this period. This may involve saving
progress for later or undoing changes. Pods terminated this way will not count
towards the Job's `completions` count.

An example Job definition in the suspended state can be like so:

```shell
kubectl get job myjob -o yaml
```

```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
suspend: true
parallelism: 1
completions: 5
template:
spec:
...
```

The Job's status can be used to determine if a Job is suspended or has been
suspended in the past:

```shell
kubectl get jobs/myjob -o yaml
```

```json
apiVersion: batch/v1
kind: Job
# .metadata and .spec omitted
status:
conditions:
- lastProbeTime: "2021-02-05T13:14:33Z"
lastTransitionTime: "2021-02-05T13:14:33Z"
status: "True"
type: Suspended
startTime: "2021-02-05T13:13:48Z"
```

The Job condition of type "Suspended" with status "True" means the Job is
suspended; the `lastTransitionTime` field can be used to determine how long the
Job has been suspended for. If the status of that condition is "False", then the
Job was previously suspended and is now running. If such a condition does not
exist in the Job's status, the Job has never been stopped.

Events are also created when the Job is suspended and resumed:

```shell
kubectl describe jobs/myjob
```

```
Name: myjob
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12m job-controller Created pod: myjob-hlrpl
Normal SuccessfulDelete 11m job-controller Deleted pod: myjob-hlrpl
Normal Suspended 11m job-controller Job suspended
Normal SuccessfulCreate 3s job-controller Created pod: myjob-jvb44
Normal Resumed 3s job-controller Job resumed
```
The last four events, particularly the "Suspended" and "Resumed" events, are
directly a result of toggling the `.spec.suspend` field. In the time between
these two events, we see that no Pods were created, but Pod creation restarted
as soon as the Job was resumed.
### Specifying your own Pod selector
Normally, when you create a Job object, you do not specify `.spec.selector`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ different Kubernetes components.
| `StorageVersionAPI` | `false` | Alpha | 1.20 | |
| `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 |
| `StorageVersionHash` | `true` | Beta | 1.15 | |
| `SuspendJob` | `false` | Alpha | 1.21 | |
| `Sysctls` | `true` | Beta | 1.11 | |
| `TTLAfterFinished` | `false` | Alpha | 1.12 | |
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
Expand Down Expand Up @@ -775,6 +776,9 @@ Each feature gate is designed for enabling/disabling a specific feature:
options can be specified to ensure that the specified number of process IDs
will be reserved for the system as a whole and for Kubernetes system daemons
respectively.
- `SuspendJob`: Enable support to suspend and resume Jobs. See
[the Jobs docs](/docs/concepts/workloads/controllers/job/) for
more details.
- `Sysctls`: Enable support for namespaced kernel parameters (sysctls) that can be
set for each pod. See
[sysctls](/docs/tasks/administer-cluster/sysctl-cluster/) for more details.
Expand Down

0 comments on commit 6adc893

Please sign in to comment.