-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for GPU Operator Subscription, InstallPlan & Deployment #1108
Conversation
Robot Results
|
This has been tested both on Cluster without Nvidia operator, and also when the operator was already installed. Output example when running the commands on a cluster without the Operator:
|
|
||
oc wait installplan -n nvidia-gpu-operator --all --for condition=Installed --timeout=3m | ||
|
||
oc rollout status -n nvidia-gpu-operator deployment gpu-operator --watch --timeout=3m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so rollout is necessary. Also could you add the check for nvidia operator as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the script to also wait for gpu-operator-certified
subscription, and for the nfd-controller-manager
deployment.
The rollout status is still needed, see the updated execution in my previous comment.
I also added wait for Operator's resource to exists, and the wait for pods as you suggested.
Signed-off-by: manosnoam <[email protected]>
a22c982
to
e9992f5
Compare
Signed-off-by: manosnoam <[email protected]>
Signed-off-by: manosnoam <[email protected]>
|
To avoid situation where gpu_deploy.sh waits for nvidia-gpu-operator pods,
before they were created by the Nvidia GPU Operator installation,
it is required to initially wait for the Operator Subscription, InstallPlan & Deployment to complete.