-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for terminating pods when doing preemption #510
Comments
/assign |
Hey @trasc, @alculquicondor suggested I look into this issue as part of kubernetes/enhancements#3940. I think it’s good that you are looking into it because I’m a little confused on what Kueue would want for accounting for terminating pods. |
There are 2 parts to this problem:
|
As I see this:
#599 will add a new condition "Evicted" set upon preemption, an we can have this set without clearing up the admission (both the condition and the struct) and only reset the admission when
In my opinion this is the job of |
Right, we can always implement the logic in Kueue by watching pods. But ideally Kueue doesn't need to know about pods, for separation of concerns. For the purpose of making progress, we can start with 1. and we can wait to see if we can leverage kubernetes/enhancements#3940 |
@kannon92 in the implementation of kueue/pkg/controller/jobs/job/job_controller.go Lines 125 to 127 in 3113e4b
we are relying on |
Yea so the issue is that If PodFailurePolicy is on, then job will mark a terminating pod as failed once it is fully terminated. If PodFailurePolicy is off, then a job immediately transitions to failed but the pod still has resources. I think that we will go with what @alculquicondor has suggested and probably have a status field for terminating so we can catch this intermittent state without changing the behavior. Does this mean that if you want this feature for other workloads (MPI, etc) then they should include terminating pods in their status? |
Let's leave job conversations to k/k :) @trasc, let start by keeping the |
Synced with @trasc offline to understand his suggestion better. So the idea is to add the This works better and is backwards-compatible. |
What would you like to be added:
Account for the quota in use by pods that are still terminating when doing preemptions.
Why is this needed:
We issue preemptions by setting
Workload.spec.admission=nil
and immediately consider these resources freed. But in reality, pods take time to terminate.We will need to keep the old admission somewhere for the calculations, and bubble up information about running pods from the Job. Maybe this will require improvements to the job controller.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: