Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU Numbers Predicates #1692

Merged
merged 1 commit into from
Aug 15, 2022
Merged

Add GPU Numbers Predicates #1692

merged 1 commit into from
Aug 15, 2022

Conversation

peiniliu
Copy link
Contributor

Support specify GPU numbers for pod resource requests issue#1440

Currently, Volcano only supports specified GPU share memory. Specified GPU number is not supported. This pr supports defining GPU numbers for pod resource requests. You can check the design doc https://github.com/peiniliu/volcano/blob/dev/docs/user-guide/how_to_use_gpu_number.md for more details.

@volcano-sh-bot
Copy link
Contributor

Welcome @peiniliu!

It looks like this is your first PR to volcano-sh/volcano 馃帀.

Thank you, and welcome to Volcano. 😃

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 23, 2021
@jasonliu747
Copy link
Member

Hi @peiniliu , thanks for your contribution. Please fix the DCO check.

@Thor-wl Thor-wl requested review from Thor-wl, jasonliu747 and william-wang and removed request for hudson741 and hzxuzhonghu August 24, 2021 02:49
@william-wang
Copy link
Member

@jasonliu747 Really hope this feature can be involved into v1.4. I have not got a gpu env. do you have gpu environment for verification?

@jasonliu747
Copy link
Member

@william-wang let me double check and get back to you!

@volcano-sh-bot volcano-sh-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 27, 2021
@peiniliu
Copy link
Contributor Author

fix the DCO check.

Hi Jason,

thx! I've fixed the DCO.

Pls let me know if further things are needed.

Best,

Peini

Copy link
Member

@shinytang6 shinytang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase the master and push again, this pr contains too many previous commits

@william-wang
Copy link
Member

@peiniliu Rebase and submit the pr again.

@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 30, 2021
@peiniliu
Copy link
Contributor Author

@jasonliu747 @william-wang
Hi,
the pr has been updated!
Best,
Peini

@peiniliu peiniliu force-pushed the dev branch 3 times, most recently from edf8719 to 6db3361 Compare August 31, 2021 10:05
@zamog
Copy link

zamog commented Sep 9, 2021

This PR will fix #1686 ?

@Thor-wl
Copy link
Contributor

Thor-wl commented Oct 15, 2021

This PR will fix #1686 ?

@peiniliu

@peiniliu
Copy link
Contributor Author

this PR supports users to define GPU number predicates via container level resource requirements using the 'volcano.sh/gpu-numbers'. For provided issue, you may look at the queue and preempt or reclaim actions.

@Thor-wl Thor-wl requested a review from hwdef November 27, 2021 03:53

The main architecture is similar as the previous, but the gpu-index results of each pod will be a list of gpu cards index.

![gpu_number](../images/gpu-number.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kebe-API-Server => Kube-API-Server

@peiniliu
Copy link
Contributor Author

I fixed those comments.

@@ -46,6 +46,8 @@ Same as above, after installed, update the scheduler configuration in `volcano-s

Please refer to [volcano device plugin](https://github.com/volcano-sh/devices/blob/master/README.md#quick-start)

* By default volcano device plugin supports shared GPUs, users do not need to config volcano device plugin. Default setting is the same as setting --gpu-strategy=number. For more information [volcano device plugin configuration](https://github.com/volcano-sh/devices/blob/dev/doc/config.md)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link is 404

Copy link
Member

@shinytang6 shinytang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM.
/cc @Thor-wl @william-wang Please take another look

@volcano-sh-bot volcano-sh-bot requested a review from Thor-wl July 27, 2022 08:22
@volcano-sh-bot
Copy link
Contributor

@shinytang6: GitHub didn't allow me to request PR reviews from the following users: another, look, Please, take.

Note that only volcano-sh members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Generally LGTM.
/cc @Thor-wl @william-wang Please take another look

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@Thor-wl Thor-wl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is the comment about initContainer resolved?

@peiniliu
Copy link
Contributor Author

BTW, is the comment about initContainer resolved?

as explained, currently device plugin does not fit multiple containers. as well as the initContainer is designed for some start scripts which may not request GPUs. Well, I think the important part is the device plugin, once it supports, the GPU usage for initContainer can be added.

@Thor-wl
Copy link
Contributor

Thor-wl commented Jul 28, 2022

BTW, is the comment about initContainer resolved?

as explained, currently device plugin does not fit multiple containers. as well as the initContainer is designed for some start scripts which may not request GPUs. Well, I think the important part is the device plugin, once it supports, the GPU usage for initContainer can be added.

IC, that's OK for me.

@Thor-wl
Copy link
Contributor

Thor-wl commented Jul 28, 2022

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 28, 2022
@volcano-sh-bot volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2022
@peiniliu
Copy link
Contributor Author

I removed some unrelated info in the doc after the presentation.

@Thor-wl
Copy link
Contributor

Thor-wl commented Jul 30, 2022

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2022
@volcano-sh-bot volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 12, 2022
@volcano-sh-bot volcano-sh-bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 15, 2022
Copy link
Member

@shinytang6 shinytang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shinytang6, Thor-wl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot merged commit aa84a7d into volcano-sh:master Aug 15, 2022

id := predicateGPU(pod, nodeInfo)
if id < 0 {
ids := predicateGPUbyMemory(pod, nodeInfo)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

share gpu pod only need to one gpu id, but predicateGPUbyMemory reture all gpu id which are suitable for the pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.