Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update test-linux-gpu.yml #7019

Closed
wants to merge 3 commits into from
Closed

Update test-linux-gpu.yml #7019

wants to merge 3 commits into from

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Dec 8, 2022

Fixes CUDA drivers issue with test-linux-gpu job

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Dec 8, 2022

@osalpekar do you know why the following error can happen : https://github.com/pytorch/vision/actions/runs/3648497295/attempts/1 ?
image

@pmeier
Copy link
Collaborator

pmeier commented Dec 8, 2022

@pmeier
Copy link
Collaborator

pmeier commented Dec 8, 2022

It seems on linux.4xlarge.nvidia.gpu we are getting OOM errors, while on linux.g5.4xlarge.nvidia.gpu the driver is not set up properly.

@osalpekar
Copy link
Member

@osalpekar do you know why the following error can happen : https://github.com/pytorch/vision/actions/runs/3648497295/attempts/1 ? image

cc @seemethere or @atalman Looks like cleaning the workspace in the generic job fails due to permissions issues with certain directories. @lequytra also saw this while migrating CI jobs in torchrec: https://github.com/pytorch/torchrec/actions/runs/3651587724/jobs/6168971846.

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Jan 20, 2023

Outdated PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants