Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent starvation of large jobs #534

Closed
3 tasks done
Tracked by #636
ahg-g opened this issue Jan 30, 2023 · 11 comments · Fixed by #710
Closed
3 tasks done
Tracked by #636

Prevent starvation of large jobs #534

ahg-g opened this issue Jan 30, 2023 · 11 comments · Fixed by #710
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@ahg-g
Copy link
Contributor

ahg-g commented Jan 30, 2023

What would you like to be added:

Workloads with the same priority don't preempt each other within a ClusterQueue, one consequence is that large size jobs may be starved if users submit a stream of smaller jobs from the same priority class.

To address this case, we could add a preemption strategy that allows older jobs to preempt newer ones from the same priority class.

Why is this needed:

To prevent starvation of large jobs via preemption.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@ahg-g ahg-g added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 30, 2023
@alculquicondor
Copy link
Contributor

The field TriggerAfterWorkloadWaitingSeconds in [1] was a proposal along these lines. Any extra signal we can incorporate in a new design?

[1] https://github.com/kubernetes-sigs/kueue/tree/main/keps/83-workload-preemption#extra-knobs-in-clusterqueue-preemption-policy

@ahg-g
Copy link
Contributor Author

ahg-g commented Jan 30, 2023

Yes, so smaller jobs that were submitted after the big job, but got a chance to start before the big one because they happen to fit, should be the primary candidates for preemption.

Imagine this example (all jobs from the same priority class):

  1. quota is 100
  2. job1 and job2 each consumes 30; available quota is 40
  3. job3 submitted and requests 50, it doesn't fit and will wait
  4. job4 submitted and requests 30, it fits and will start; available quota is 10
  5. job1 finishes, available quota is 40, so job3 still doesn't fit, but we can preempt job4 to fit it; job4 is a candidate because it was submitted after job3.

@alculquicondor
Copy link
Contributor

alculquicondor commented Jan 30, 2023

Gotcha, the creation timestamps for each Jobs is a very clear signal.

@KunWuLuan
Copy link
Member

Maybe we can add a property in clusterQueue to allow workload preempting workloads with same priority after some time?

@trasc
Copy link
Contributor

trasc commented Apr 19, 2023

/assign

@trasc
Copy link
Contributor

trasc commented Apr 20, 2023

A middle ground between this and TriggerAfterWorkloadWaitingSeconds could be something like:

	// PreemptIfNewerThen if set, jobs with the same priority having a scheduling timestamp
	// bigger than the current workload + PreemptIfNewerThen seconds will be
	// considered for preemption in order to avoid starvation of large jobs.
	// This is only applicable within the same cluster queue when WithinClusterQueue
	// is set to LowerPriority.
	//
	// +optional
	PreemptIfNewerThen *int64

@alculquicondor
Copy link
Contributor

This should be a Clusterqueue API field.
I like the flexibility of the proposal above. However, the wording needs to include that this affects the same priority. And maybe that this is specific to ClusterQueue preemption, and not cohort reclaim? does it have to be?

Also, do we expect more tuning options? If so, it could be something like:

preemption:
  withinClusterQueue: LowerThanOrEqualPriority
  samePriority:
     ifNewerThanSeconds: 100

Although I'm not convinced about the wording

@alculquicondor
Copy link
Contributor

Let's step back.
@KunWuLuan, do you have a use case for configuring the time threshold?
We could start simple and just add the policy for LowetThanOrEqualPriority

@trasc
Copy link
Contributor

trasc commented Apr 21, 2023

I have opened #710, but I'd hold the merge and use workload.GetSchedulingTimestamp after #689 is merged.

@KunWuLuan
Copy link
Member

Let's step back. @KunWuLuan, do you have a use case for configuring the time threshold? We could start simple and just add the policy for LowetThanOrEqualPriority

Hi, Aldo. I think ifNewerThanSeconds is enougth. Thanks. 👍

@alculquicondor
Copy link
Contributor

My question was whether we could start without ifNewerThanSeconds.
Can you clarify why do you think this is important?
Could there be other alternatives that apply to any kind of preemption? Like TriggerAfterWorkloadWaitingSeconds in [1]
Or something like IfAdmittedLessThanSeconds.

[1] https://github.com/kubernetes-sigs/kueue/tree/main/keps/83-workload-preemption#extra-knobs-in-clusterqueue-preemption-policy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants