-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU CI/CD #253
Add GPU CI/CD #253
Conversation
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix trailing whitespaces in gpuci.yml
and should make the spacing in auto-label.yml
consistent with gpuci.yml
.
Also, maybe I should rename it to auto-label-gpuci.yml
.
.github/workflows/gpuci.yml
Outdated
|
||
- name: Run PyTests with GPU mark | ||
run: | | ||
python -m pytest -m gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run our GPU CI tests only on PyTests with the pytest.mark.gpu
decorator. We already have quite a few of them in our repository:
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and helps us catch issues much earlier! Thank you for pushing through this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions
Signed-off-by: Sarah Yurick <[email protected]>
@ko3n1g is disconnecting our runner for a bit. Should be back up tomorrow, then I can rerun the checks and merge. |
Signed-off-by: Sarah Yurick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important since it impacts other projects that use this infra
Signed-off-by: Sarah Yurick <[email protected]>
* add yaml files to gh workflows Signed-off-by: Sarah Yurick <[email protected]> * edit spacing Signed-off-by: Sarah Yurick <[email protected]> * no cache dir Signed-off-by: Sarah Yurick <[email protected]> * cmake Signed-off-by: Sarah Yurick <[email protected]> * fasttext wheel Signed-off-by: Sarah Yurick <[email protected]> * python3 dev Signed-off-by: Sarah Yurick <[email protected]> * get update Signed-off-by: Sarah Yurick <[email protected]> * c installs Signed-off-by: Sarah Yurick <[email protected]> * setuptools pip upgrade Signed-off-by: Sarah Yurick <[email protected]> * use stable rapids Signed-off-by: Sarah Yurick <[email protected]> * remove wheel see what happens Signed-off-by: Sarah Yurick <[email protected]> * edit readme and remove autolabel for now Signed-off-by: Sarah Yurick <[email protected]> * add container logic Signed-off-by: Sarah Yurick <[email protected]> * add dockerfile and oliver's other suggestions Signed-off-by: Sarah Yurick <[email protected]> * fix run format Signed-off-by: Sarah Yurick <[email protected]> * forked repo url Signed-off-by: Sarah Yurick <[email protected]> * docker run with all gpus Signed-off-by: Sarah Yurick <[email protected]> * remove running container Signed-off-by: Sarah Yurick <[email protected]> * Update .github/workflows/gpuci.yml Co-authored-by: oliver könig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> * re add test Signed-off-by: Sarah Yurick <[email protected]> * debug attempt Signed-off-by: Sarah Yurick <[email protected]> * remove it Signed-off-by: Sarah Yurick <[email protected]> * add library path Signed-off-by: Sarah Yurick <[email protected]> * remove nvcc check Signed-off-by: Sarah Yurick <[email protected]> * more debugging Signed-off-by: Sarah Yurick <[email protected]> * specify curator dir Signed-off-by: Sarah Yurick <[email protected]> * more debugging Signed-off-by: Sarah Yurick <[email protected]> * try pytorch container Signed-off-by: Sarah Yurick <[email protected]> * use rapids container Signed-off-by: Sarah Yurick <[email protected]> * fix RUN instructions Signed-off-by: Sarah Yurick <[email protected]> * add comments and review suggestions Signed-off-by: Sarah Yurick <[email protected]> * update runners Signed-off-by: Sarah Yurick <[email protected]> * move args Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* add yaml files to gh workflows Signed-off-by: Sarah Yurick <[email protected]> * edit spacing Signed-off-by: Sarah Yurick <[email protected]> * no cache dir Signed-off-by: Sarah Yurick <[email protected]> * cmake Signed-off-by: Sarah Yurick <[email protected]> * fasttext wheel Signed-off-by: Sarah Yurick <[email protected]> * python3 dev Signed-off-by: Sarah Yurick <[email protected]> * get update Signed-off-by: Sarah Yurick <[email protected]> * c installs Signed-off-by: Sarah Yurick <[email protected]> * setuptools pip upgrade Signed-off-by: Sarah Yurick <[email protected]> * use stable rapids Signed-off-by: Sarah Yurick <[email protected]> * remove wheel see what happens Signed-off-by: Sarah Yurick <[email protected]> * edit readme and remove autolabel for now Signed-off-by: Sarah Yurick <[email protected]> * add container logic Signed-off-by: Sarah Yurick <[email protected]> * add dockerfile and oliver's other suggestions Signed-off-by: Sarah Yurick <[email protected]> * fix run format Signed-off-by: Sarah Yurick <[email protected]> * forked repo url Signed-off-by: Sarah Yurick <[email protected]> * docker run with all gpus Signed-off-by: Sarah Yurick <[email protected]> * remove running container Signed-off-by: Sarah Yurick <[email protected]> * Update .github/workflows/gpuci.yml Co-authored-by: oliver könig <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> * re add test Signed-off-by: Sarah Yurick <[email protected]> * debug attempt Signed-off-by: Sarah Yurick <[email protected]> * remove it Signed-off-by: Sarah Yurick <[email protected]> * add library path Signed-off-by: Sarah Yurick <[email protected]> * remove nvcc check Signed-off-by: Sarah Yurick <[email protected]> * more debugging Signed-off-by: Sarah Yurick <[email protected]> * specify curator dir Signed-off-by: Sarah Yurick <[email protected]> * more debugging Signed-off-by: Sarah Yurick <[email protected]> * try pytorch container Signed-off-by: Sarah Yurick <[email protected]> * use rapids container Signed-off-by: Sarah Yurick <[email protected]> * fix RUN instructions Signed-off-by: Sarah Yurick <[email protected]> * add comments and review suggestions Signed-off-by: Sarah Yurick <[email protected]> * update runners Signed-off-by: Sarah Yurick <[email protected]> * move args Signed-off-by: Sarah Yurick <[email protected]> --------- Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Sarah Yurick <[email protected]> Co-authored-by: oliver könig <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
This PR enables gpuCI for our PyTests marked with
@pytest.mark.gpu
.To trigger it, a user has to add the
gpuci
label to their PR for the GPU tests to run. Only users with write access are able to add labels to PRs.If more commits are added to the PR, the
gpuci
label has to be removed and re-added to run the GPU tests again. This is for security reasons; for example, a random user could open a PR, a user with write access adds thegpuci
label, the GPU tests run safely, and then the user pushes malicious code. As long as no one with write access re-adds thegpuci
label, it won't run at all.