From a86732a455126c6f74a000425995319cf0c62a24 Mon Sep 17 00:00:00 2001 From: Michail Kargakis Date: Wed, 28 Sep 2016 23:23:04 +0200 Subject: [PATCH 1/4] Add docs for perma-failed Deployments --- docs/user-guide/deployments.md | 156 +++++++++++++++++++++++++++++++++ 1 file changed, 156 insertions(+) diff --git a/docs/user-guide/deployments.md b/docs/user-guide/deployments.md index d160cce1e7fd5..b0bd6bee5cc38 100644 --- a/docs/user-guide/deployments.md +++ b/docs/user-guide/deployments.md @@ -454,6 +454,151 @@ nginx-deployment-3066724191 0 0 1h Note: You cannot rollback a paused Deployment until you resume it. +## Deployment status + +A Deployment enters various states during its lifecycle. It can be [progressing](#progressing-deployment) while rolling out a new ReplicaSet, +it can be [complete](#complete-deployment), or it can [fail to progress](#failed-deployment). + +### Progressing Deployment + +A Deployment is progressing when one of the following tasks is performed: + +* the creation of the new ReplicaSet. +* scaling up the new ReplicaSet. +* scaling down old ReplicaSets. + +You can monitor the progress for a Deployment by using `kubectl rollout status`. + +### Complete Deployment + +A Deployment is complete when it has the following characteristics: + +* The Deployment has minimum availability. Minimum availability means that the Deployment's number of available replicas +equals or exceeds the number required by the Deployment strategy. +* All of the replicas associated with the Deployment have been updated to the latest version you've specified, meaning any +updates you've requested have been completed. + +TODO: Link to doc that references generation/observedGeneration + +### Failed Deployment + +Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing, due to: + +* insufficient quota +* readiness probe failures +* image pull error +* insufficient permissions +* limit ranges +* application runtime misconfiguration + +For any Pod creation or deletion failure, you will be notified with a Deployment status condition of `ReplicaFailure` +type. You can also specify a deadline parameter in the spec ([spec.progressDeadlineSeconds](#progress-deadline-seconds)) +that denotes the number of seconds to wait for your Deployment to report any progress. + +To make the controller report lack of progress for a Deployment after 10 minutes: + +```shell +$ kubectl patch deployment/nginx-deployment -p '{"spec":{"progressDeadlineSeconds":600}}' +"nginx-deployment" patched +``` + +Once the deadline has been exceeded, the Deployment controller adds a DeploymentCondition with the following attributes to +the Deployment's status.conditions: + +* Type=Progressing +* Status=False +* Reason=ProgressDeadlineExceeded + +See the [Kubernetes API conventions](https://github.com/kubernetes/kubernetes/blob/{{page.githubbranch}}/docs/devel/api-conventions.md#typical-status-properties) for more information on status conditions. + +Note that in version 1.5, Kubernetes will take no action on a stalled Deployment other than to report a status condition with +`Reason=ProgressDeadlineExceeded`. + +You may experience transient errors with your Deployments, either due to a low timeout that you have set or due to any other kind +of error that can be treated as transient. For example, let's suppose you have insufficient quota. If you describe the Deployment +you will notice the following section: + +``` +$ kubectl describe deployment nginx-deployment +<...> +Conditions: + Type Status Reason + ---- ------ ------ + Available True MinimumReplicasAvailable + Progressing True ReplicaSetUpdated + ReplicaFailure True FailedCreate +<...> +``` + +This is how the status of the Deployment looks like if you would run `kubectl get deployment nginx-deployment -o yaml` +(spec of the object is omitted for brevity): + +``` +status: + availableReplicas: 2 + conditions: + - lastTransitionTime: 2016-10-04T12:25:39Z + lastUpdateTime: 2016-10-04T12:25:39Z + message: Replica set "nginx-deployment-4262182780" is progressing. + reason: ReplicaSetUpdated + status: "True" + type: Progressing + - lastTransitionTime: 2016-10-04T12:25:42Z + lastUpdateTime: 2016-10-04T12:25:42Z + message: Deployment has minimum availability. + reason: MinimumReplicasAvailable + status: "True" + type: Available + - lastTransitionTime: 2016-10-04T12:25:39Z + lastUpdateTime: 2016-10-04T12:25:39Z + message: 'Error creating: pods "nginx-deployment-4262182780-" is forbidden: exceeded quota: + object-counts, requested: pods=1, used: pods=3, limited: pods=2' + reason: FailedCreate + status: "True" + type: ReplicaFailure + observedGeneration: 3 + replicas: 2 + unavailableReplicas: 2 +``` + +Eventually, once the Deployment progress deadline is exceeded, the status and reason of the Progressing condition +will be switched: + +``` +Conditions: + Type Status Reason + ---- ------ ------ + Available True MinimumReplicasAvailable + Progressing False ProgressDeadlineExceeded + ReplicaFailure True FailedCreate +``` + +You can address an issue of insufficient quota by scaling down your Deployment, by scaling down other controllers you may be running, +or by increasing quota in your namespace. If you satisfy the quota conditions and the Deployment controller then completes the Deployment +rollout, you'll see the Deployment's status update with a successful condition (`Status=True` and `Reason=NewReplicaSetAvailable`). + +``` +Conditions: + Type Status Reason + ---- ------ ------ + Available True MinimumReplicasAvailable + Progressing True NewReplicaSetAvailable +``` + +`Type=Available` with `Status=True` means that your Deployment has minimum availability. Minimum availability is dictated +by the parameters specified in the deployment strategy. `Type=Progressing` with `Status=True` means that your Deployment +is either in the middle of a rollout and it is progressing or that it has successfully completed its progress and the minimum +required new replicas are available (see the Reason of the condition for the particulars - in our case +`Reason=NewReplicaSetAvailable` means that the Deployment is complete). + +### Operating on a failed deployment + +All actions that apply to a complete Deployment also apply to a failed Deployment. You can scale it up/down, roll back +to a previous revision, or even pause it if you need to apply multiple tweaks in the Deployment pod template. Note +that progress for a Deployment is not estimated while the Deployment is paused so you can safely pause a Deployment +in the middle of a rollout and resume it whenever you want without it failing accidentally because of an exceeded +deadline. + ## Use Cases ### Canary Deployment @@ -556,6 +701,17 @@ the rolling update starts, such that the total number of old and new Pods do not the new Replica Set can be scaled up further, ensuring that the total number of Pods running at any time during the update is at most 130% of desired Pods. +### Progress Deadline Seconds + +`.spec.progressDeadlineSeconds` is an optional field that specifies the number of seconds you want +to wait for your Deployment to progress before the system reports back that the Deployment has +[failed progressing](#failed-deployment) - surfaced as a condition with `Type=Progressing`, `Status=False`. +and `Reason=ProgressDeadlineExceeded` in the status of the resource. The deployment controller will keep +retrying the Deployment. In the future, once automatic rollback will be implemented, the deployment +controller will roll back a Deployment as soon as it observes such a condition. + +If specified, this field needs to be greater than `.spec.minReadySeconds`. + ### Min Ready Seconds `.spec.minReadySeconds` is an optional field that specifies the From 7427bc7a6abf9fd1b6abb404b2ef6a6a4c323f85 Mon Sep 17 00:00:00 2001 From: devin-donnelly Date: Tue, 6 Dec 2016 16:37:10 -0800 Subject: [PATCH 2/4] Update deployments.md --- docs/user-guide/deployments.md | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) diff --git a/docs/user-guide/deployments.md b/docs/user-guide/deployments.md index b0bd6bee5cc38..95af55ba7e9ba 100644 --- a/docs/user-guide/deployments.md +++ b/docs/user-guide/deployments.md @@ -461,41 +461,39 @@ it can be [complete](#complete-deployment), or it can [fail to progress](#failed ### Progressing Deployment -A Deployment is progressing when one of the following tasks is performed: +Kubernetes marks a Deployment as _progressing_ when one of the following tasks is performed: -* the creation of the new ReplicaSet. -* scaling up the new ReplicaSet. -* scaling down old ReplicaSets. +* The Deployment is in the process of creating a new ReplicaSet. +* The Deployment is scaling up an existing ReplicaSet. +* The Deployment is scaling down an existing ReplicaSet. You can monitor the progress for a Deployment by using `kubectl rollout status`. ### Complete Deployment -A Deployment is complete when it has the following characteristics: +Kubernetes marks a Deployment as _complete_ when it has the following characteristics: * The Deployment has minimum availability. Minimum availability means that the Deployment's number of available replicas equals or exceeds the number required by the Deployment strategy. * All of the replicas associated with the Deployment have been updated to the latest version you've specified, meaning any updates you've requested have been completed. -TODO: Link to doc that references generation/observedGeneration - ### Failed Deployment -Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing, due to: +Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing. This can occur due to some of the following factors: -* insufficient quota -* readiness probe failures -* image pull error -* insufficient permissions -* limit ranges -* application runtime misconfiguration +* Insufficient quota +* Readiness probe failures +* Image pull errors +* Insufficient permissions +* Limit ranges +* Application runtime misconfiguration For any Pod creation or deletion failure, you will be notified with a Deployment status condition of `ReplicaFailure` type. You can also specify a deadline parameter in the spec ([spec.progressDeadlineSeconds](#progress-deadline-seconds)) that denotes the number of seconds to wait for your Deployment to report any progress. -To make the controller report lack of progress for a Deployment after 10 minutes: +The following `kubectl` command sets the spec with `progressDeadlineSeconds` to make the controller report lack of progress for a Deployment after 10 minutes: ```shell $ kubectl patch deployment/nginx-deployment -p '{"spec":{"progressDeadlineSeconds":600}}' @@ -530,8 +528,7 @@ Conditions: <...> ``` -This is how the status of the Deployment looks like if you would run `kubectl get deployment nginx-deployment -o yaml` -(spec of the object is omitted for brevity): +If you run `kubectl get deployment nginx-deployment -o yaml`, the Deployement status might look like this: ``` status: @@ -561,8 +558,7 @@ status: unavailableReplicas: 2 ``` -Eventually, once the Deployment progress deadline is exceeded, the status and reason of the Progressing condition -will be switched: +Eventually, once the Deployment progress deadline is exceeded, Kubernetes updates the status and the reason for the Progressing condition: ``` Conditions: From 5ec4e650dbe39cddaed6ae8d127692447c4c34f9 Mon Sep 17 00:00:00 2001 From: Michail Kargakis Date: Thu, 8 Dec 2016 12:02:06 +0100 Subject: [PATCH 3/4] More updates --- docs/user-guide/deployments.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/user-guide/deployments.md b/docs/user-guide/deployments.md index 95af55ba7e9ba..8239077c69e21 100644 --- a/docs/user-guide/deployments.md +++ b/docs/user-guide/deployments.md @@ -478,6 +478,17 @@ equals or exceeds the number required by the Deployment strategy. * All of the replicas associated with the Deployment have been updated to the latest version you've specified, meaning any updates you've requested have been completed. +You can check if a Deployment has completed by using `kubectl rollout status`. Zero exit code will be returned +in case it has completed successfully. + +``` +$ kubectl rollout status deploy/nginx +Waiting for rollout to finish: 2 of 3 updated replicas are available... +deployment "nginx" successfully rolled out +$ echo $? +0 +``` + ### Failed Deployment Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing. This can occur due to some of the following factors: @@ -587,6 +598,17 @@ is either in the middle of a rollout and it is progressing or that it has succes required new replicas are available (see the Reason of the condition for the particulars - in our case `Reason=NewReplicaSetAvailable` means that the Deployment is complete). +You can check if a Deployment has failed progressing by using `kubectl rollout status`. Non-zero exit code will be returned +in case it has exceeded its deadline. + +``` +$ kubectl rollout status deploy/nginx +Waiting for rollout to finish: 2 out of 3 new replicas have been updated... +error: deployment "nginx" exceeded its progress deadline +$ echo $? +1 +``` + ### Operating on a failed deployment All actions that apply to a complete Deployment also apply to a failed Deployment. You can scale it up/down, roll back From 5186d558c2d9562916999b642b5a706c47870ab9 Mon Sep 17 00:00:00 2001 From: devin-donnelly Date: Thu, 8 Dec 2016 09:48:57 -0800 Subject: [PATCH 4/4] Update deployments.md --- docs/user-guide/deployments.md | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/docs/user-guide/deployments.md b/docs/user-guide/deployments.md index 8239077c69e21..84ea561bf4fb8 100644 --- a/docs/user-guide/deployments.md +++ b/docs/user-guide/deployments.md @@ -478,8 +478,7 @@ equals or exceeds the number required by the Deployment strategy. * All of the replicas associated with the Deployment have been updated to the latest version you've specified, meaning any updates you've requested have been completed. -You can check if a Deployment has completed by using `kubectl rollout status`. Zero exit code will be returned -in case it has completed successfully. +You can check if a Deployment has completed by using `kubectl rollout status`. If the rollout completed successfully, `kubectl rollout status` returns a zero exit code. ``` $ kubectl rollout status deploy/nginx @@ -500,9 +499,7 @@ Your Deployment may get stuck trying to deploy its newest ReplicaSet without eve * Limit ranges * Application runtime misconfiguration -For any Pod creation or deletion failure, you will be notified with a Deployment status condition of `ReplicaFailure` -type. You can also specify a deadline parameter in the spec ([spec.progressDeadlineSeconds](#progress-deadline-seconds)) -that denotes the number of seconds to wait for your Deployment to report any progress. +One way you can detect this condition is to specify specify a deadline parameter in your Deployment spec: ([`spec.progressDeadlineSeconds`](#progress-deadline-seconds)). `spec.progressDeadlineSeconds` denotes the number of seconds the Deployment controller waits before indicating (via the Deployment status) that the Deployment progress has stalled. The following `kubectl` command sets the spec with `progressDeadlineSeconds` to make the controller report lack of progress for a Deployment after 10 minutes: @@ -510,9 +507,8 @@ The following `kubectl` command sets the spec with `progressDeadlineSeconds` to $ kubectl patch deployment/nginx-deployment -p '{"spec":{"progressDeadlineSeconds":600}}' "nginx-deployment" patched ``` - Once the deadline has been exceeded, the Deployment controller adds a DeploymentCondition with the following attributes to -the Deployment's status.conditions: +the Deployment's `status.conditions`: * Type=Progressing * Status=False @@ -523,6 +519,8 @@ See the [Kubernetes API conventions](https://github.com/kubernetes/kubernetes/bl Note that in version 1.5, Kubernetes will take no action on a stalled Deployment other than to report a status condition with `Reason=ProgressDeadlineExceeded`. +**Note:** If you pause a Deployment, Kubernetes does not check progress against your specified deadline. You can safely pause a Deployment in the middle of a rollout and resume without triggering a the condition for exceeding the deadline. + You may experience transient errors with your Deployments, either due to a low timeout that you have set or due to any other kind of error that can be treated as transient. For example, let's suppose you have insufficient quota. If you describe the Deployment you will notice the following section: @@ -598,8 +596,7 @@ is either in the middle of a rollout and it is progressing or that it has succes required new replicas are available (see the Reason of the condition for the particulars - in our case `Reason=NewReplicaSetAvailable` means that the Deployment is complete). -You can check if a Deployment has failed progressing by using `kubectl rollout status`. Non-zero exit code will be returned -in case it has exceeded its deadline. +You can check if a Deployment has failed to progress by using `kubectl rollout status`. `kubectl rollout status` returns a non-zero exit code if the Deployment has exceeded the progression deadline. ``` $ kubectl rollout status deploy/nginx @@ -612,10 +609,7 @@ $ echo $? ### Operating on a failed deployment All actions that apply to a complete Deployment also apply to a failed Deployment. You can scale it up/down, roll back -to a previous revision, or even pause it if you need to apply multiple tweaks in the Deployment pod template. Note -that progress for a Deployment is not estimated while the Deployment is paused so you can safely pause a Deployment -in the middle of a rollout and resume it whenever you want without it failing accidentally because of an exceeded -deadline. +to a previous revision, or even pause it if you need to apply multiple tweaks in the Deployment pod template. ## Use Cases