Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: issue#723 -- Do-not-evict only for running pods #778

Closed
wants to merge 7 commits into from

Conversation

mallow111
Copy link

Fixes #723

Description
Check if the pod has do-not-evict annotation also in ready status, then we treat this pod as non-evictable. Otherwise, when pods are not in ready status, even if it has do-not-evict annotation, we can still evict the pods.

How was this change tested?
I tested this in my local cluster.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mallow111 mallow111 requested a review from a team as a code owner November 11, 2023 00:54
@mallow111 mallow111 requested a review from tzneal November 11, 2023 00:54
@@ -69,6 +69,15 @@ func IsOwnedByNode(pod *v1.Pod) bool {
})
}

func IsPodReady(pod *v1.Pod) bool {
for _, condition := range pod.Status.Conditions {
if condition.Type != v1.PodReady {
Copy link
Member

@jonathan-innis jonathan-innis Nov 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may have intended to check that the condition.Type is the v1.PodReady type and then you want to check if this is true rather than just checking if there is any condition that isn't the PodReadiness condition

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jonathan-innis
Copy link
Member

/assign jmdeal

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mallow111
Once this PR has been reviewed and has the lgtm label, please assign mwielgus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 28, 2023
@jmdeal
Copy link
Member

jmdeal commented Dec 5, 2023

There are a few concerns that were laid out in the original issue that I don't think are addressed here. The main concern with the original feature request is that it isn't clear which container states are non-recoverable. We don't want to ignore the do-not-evict annotation if the pod could eventually become ready. Otherwise, the pod may become ready between the time Karpenter chooses to disrupt the node and it is removed. By waiting for the pod to become ready before respecting the do-not-evict annotation this could happen frequently during normal operation, such as pulling the container image. Do you have any thoughts on how to address these concerns?

Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2023
@mallow111
Copy link
Author

@jmdeal @jonathan-innis what do you guys recommend the fix here, the initial issue has been discussed back and forth, but there is no solid solution there. What I am proposing is to make sure do-not-evict annotation only applies on running pods, for pods not ready, it contains various reasons, if we want to set a waiting period, how long it should wait, also is that worthy the effort to wait for those un ready pods?

@jmdeal
Copy link
Member

jmdeal commented Dec 20, 2023

Our stance was that Karpenter should only ignore the do-not-evict annotation in cases where it is certain that the pod will not become ready. Currently Karpenter does this when a pod is in a terminal state, i.e. Succeeded / Failed, or if it is terminating (ref).

We don't want to only consider the annotation for running pods because it is possible that a pending pod becomes ready between the time of the disruption decision and the time that the pod is removed. IMO what this PR would have to do is determine a set of container states that are truly unrecoverable. Then, if a pod is pending but one of it's containers is in an unrecoverable state we know we could disrupt it. I haven't looked into this deeply myself but from @njtran's comment on the original issue it doesn't look like there's a standard set of enum values for this. Without this standard set of 'unrecoverable values', anything Karpenter does here would be pretty opinionated and potentially dangerous.

@mallow111
Copy link
Author

If there is no standard set of enum values, is there a way to move forward the change. Shall we put the issue as hold until there is such a standard coming out, otherwise I cannot see how shall we proceed the fix.

@jmdeal
Copy link
Member

jmdeal commented Dec 20, 2023

Ya, I don't think we can move forward with this unless this core concern is addressed. Like I said, I haven't looked too deeply into it myself so it may be possible with the information available today but it's not immediately obvious.

@mallow111
Copy link
Author

@jonathan-innis do you agree with the above comments, I am going to close this PR if you agree on the above.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2023
@jonathan-innis
Copy link
Member

jonathan-innis commented Dec 27, 2023

@mallow111 I'm mainly interested about the use-case around you wanting this feature. I'm assuming that it's quite similar to #723? I wonder if #752 works as a better, more comprehensive solution here as a first pass. You can optionally define a duration value on your karpenter.sh/do-not-disrupt annotation like karpenter.sh/do-not-disrupt: 24h which says that we should start ignoring the do-not-disrupt annotation after 24h from the initial creationTimestamp for the pod.

Realistically, I think we need a design doc to map out the problem and the use-cases here that we can solve the problems that we are hitting here. Putting something like that together combined with a few solution options should help us drive toward a solution that doesn't leave nodes around that contain pods that will never go ready.

If it's alright with you, we can close this PR in favor of a design proposal around handling stuck nodes that hang due to misconfiguration of pods that schedule to the nodes.

@mallow111 mallow111 closed this Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Do not evict" only for running pods
4 participants