Skip to content

Commit

Permalink
Update PDB KEP with latest status
Browse files Browse the repository at this point in the history
  • Loading branch information
mortent committed Mar 31, 2021
1 parent 841863e commit cbd6ea1
Showing 1 changed file with 33 additions and 34 deletions.
67 changes: 33 additions & 34 deletions keps/sig-apps/85-Graduate-PDB-to-Stable/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ The feature was introduced in Kubernetes 1.4 and promoted to beta in 1.5.
It has been in beta for a long time. This document lays out the plan to promote
it to stable.

Update: PDBs was promoted to GA in 1.21. Eviction is still only in policy/v1beta1
due to some challenges described below. Ideally Eviction should also be GA in 1.22.

## Motivation

PDB API has been stable and is an important feature that allows users to improve
Expand All @@ -81,6 +84,20 @@ long term.

### Implementation Details/Notes/Constraints

#### Allow promotion of Eviction to policy/v1 without breaking pods/eviction support for policy/v1beta1

Eviction is part of policy/v1beta1, but is in a different spot than PDBs since
it is a subresource of the v1 Pod API. Luckily, the endpoint only supports
Create and returns v1 Status, so it should be possible to let the current
endpoint accept both policy/v1 and policy/v1beta1 Evictions.

The current apiserver decoding stack does allow support for multiple versions
on a single endpoint, and discovery does not support advertising more than one
accepted version. If we can add support for accepting multiple versions for a
subresource endpoint, we can add support for policy/v1 and then phase out
policy/v1beta1 over a long period of time. There is a discussion about this
[here](https://github.com/kubernetes/kubernetes/pull/99290#discussion_r582003770).

#### Mutable PDBs

A mutable PDB object allows its `MinAvailable`, `MaxUnavailable`, and `Selector`
Expand Down Expand Up @@ -126,43 +143,23 @@ The current behavior of the disruption controller for the different types of
input and the different types of pods that might be encountered are documented in:
https://docs.google.com/spreadsheets/d/12HUundBS-slA6axfQYZPRCeIu_Au_wsGD0Vu_oKAnM8/edit?usp=sharing

##### Proposed solution

To fix this, we will make two adjustments:

Loosen the rules somewhat, so instead of just returning an error and block all
evictions when the controller encounters invalid pods, it will ignore those pods
when computing `AllowedDisruptions`. And similarly, the Eviction API will ignore
the PDB when evicting those pods. This means that in a situation like was
mentioned in the issue, where a PDB selector happen to match both pods belonging
to a Deployment and pods belonging to a CronJob, the PDB will protect the
Deployment pods as expected, but ignore the pods from the CronJob.

Combined with loosening the rules in the controller, we also want to improve
feedback to the user in these situations. So while the pods from the CronJob
described above will not be covered by the PDB, the controller will generate
warning events for these situations that will provide accurate descriptions of
what makes the pod ineligible for the PDB. Events should only be generated the
first time the controller encounters the workload to avoid generating new events
on every reconcile. In this case it would mean an event explaining that the pod
has a controller that does not implement scale, and therefore can only be used
if the PDB is using `minAvilable` as a number. We also want to introduce
conditions on the PDB status object that can signal error situations to users
and tools. In particular, whenever the failsafe functionality of the disruption
controller forces `AllowedDisruptions`to be 0, there should be a condition that
allow tools to distinguish this situation from `AllowedDisruptions` being 0
simply because there are not enough ready pods.

Combined, these should make the behavior of the disruption controller more
intuitive, but also provide signals to users whenever the configuration of the
pdb and/or the targeted workloads are invalid.
This has been addressed by improving the users' visibility into any issues
encountered by the disruption controller, primarily through the addition of
conditions to the PDB status: https://github.com/kubernetes/kubernetes/pull/98127.
We also improve the error message provided to users in the case where a controller
resource can not be found with the scale client: https://github.com/kubernetes/kubernetes/pull/98346

We considered changing the current behavior of setting `DisruptionsAllowed` to zero,
but decided against it as it creates issues for the Eviction API and it makes it
harder to understand the behavior of the disruption controller.

#### Address scalability issues with the disruption controller

The disruption controller has some performance issues as reported in
https://github.com/kubernetes/kubernetes/issues/92826.
https://github.com/kubernetes/kubernetes/pull/92827 was merged to remove the 30s
resync period which should improve performance.
resync period which should improve performance. The frequency at which the
disruption controller creates events has also been reduced.

#### Fix handling of empty selector in disruption controller

Expand All @@ -172,6 +169,8 @@ directly in the v1beta1 version of PDBs, but we should fix this as part of
promoting PDBs to v1 using the approach described in
https://github.com/kubernetes/kubernetes/issues/95083#issuecomment-703723763

Fixed as part of https://github.com/kubernetes/kubernetes/pull/99290

### API changes

* Add conditions to the status object for PodDisruptionBudget.
Expand Down Expand Up @@ -321,9 +320,9 @@ The following e2e tests will be included in the conformance tests:

Graduation to GA:
- [x] Implement Mutable PDBs
- [ ] Address performance issues
- [ ] Pass conformance tests
- [ ] Update documents to reflect the changes
- [x] Address performance issues
- [x] Pass conformance tests
- [x] Update documents to reflect the changes

## Production Readiness Review Questionnaire

Expand Down

0 comments on commit cbd6ea1

Please sign in to comment.