Merge pull request #230 from dystewart/gpu-doc

Update RHOAI docs to reflect changes coming Jan 14
nerc-project · Jan 16, 2025 · 6ced34b · 6ced34b
2 parents 960ad55 + e7e5824
commit 6ced34b
Show file tree

Hide file tree

Showing 4 changed files with 39 additions and 18 deletions.
diff --git a/docs/openshift-ai/data-science-project/images/tensor-flow-workbench.png b/docs/openshift-ai/data-science-project/images/tensor-flow-workbench.png
diff --git a/docs/openshift-ai/data-science-project/using-projects-the-rhoai.md b/docs/openshift-ai/data-science-project/using-projects-the-rhoai.md
@@ -64,7 +64,7 @@ On the Create workbench page, complete the following information.
 
 -   Notebook image (Image selection)
 
--   Deployment size (Container size and Number of GPUs)
+-   Deployment size (Container size, Type and Number of GPUs)
 
 -   Environment variables
 
@@ -82,13 +82,15 @@ On the Create workbench page, complete the following information.
     resources, including CPUs and memory. Each container size comes with pre-configured
     CPU and memory resources.
 
-    Optionally, you can specify the desired **Number of GPUs** depending on the
+    Optionally, you can specify the desired **Accelerator** and **Number of Accelerators** (GPUs), depending on the
     nature of your data analysis and machine learning code requirements. However,
     this number should not exceed the GPU quota specified by the value of the
     "**OpenShift Request on GPU Quota**" attribute that has been approved for
     this "**NERC-OCP (OpenShift)**" resource allocation on NERC's ColdFront, as
     [described here](../../get-started/allocation/allocation-details.md#pi-and-manager-allocation-view-of-openshift-resource-allocation).
 
+    The different options for accelerator are "NVIDIA A100 GPU", "NVIDIA V100 GPU", and "NONE".
+
     If you need to increase this quota value, you can request a change as
     [explained here](../../get-started/allocation/allocation-change-request.md#request-change-resource-allocation-attributes-for-openshift-project).
 
@@ -97,7 +99,8 @@ Once you have entered the information for your workbench, click **Create**.
 ![Fill Workbench Information](images/tensor-flow-workbench.png)
 
 For our example project, let's name it "Tensorflow Workbench". We'll select the
-**TensorFlow** image, choose a **Deployment size** of **Small**, **Number of GPUs**
+**TensorFlow** image, choose a **Deployment size** of **Small**,
+**Accelerator** of **NVIDIA A100 GPU**, **Number of Accelerators**
 as **1** and allocate a **Cluster storage** space of **1GB**.
 
 !!! info "More About Cluster Storage"

diff --git a/docs/openshift/applications/scaling-and-performance-guide.md b/docs/openshift/applications/scaling-and-performance-guide.md
@@ -136,7 +136,8 @@ Gi, Mi, Ki).
 ## How to specify pod to use GPU?
 
 So from a **Developer** perspective, the only thing you have to worry about is
-asking for GPU resources when defining your pods, with something like:
+asking for GPU resources when defining your pods, with something like the
+following for requesting (NVIDIA A100 GPU):
 
     spec:
       containers:
@@ -150,14 +151,26 @@ asking for GPU resources when defining your pods, with something like:
           limits:
             memory: "128Mi"
             cpu: "500m"
+      tolerations:
+        - key: nvidia.com/gpu.product
+          operator: Equal
+          value: NVIDIA-A100-SXM4-40GB
+          effect: NoSchedule
+      nodeSelector:
+        nvidia.com/gpu.product: NVIDIA-A100-SXM4-40GB
 
-In the sample Pod Spec above, you can allocate GPUs to pods by specifying the GPU
+In the sample Pod Spec above, you can allocate GPUs to containers by specifying
+ the GPU
 resource `nvidia.com/gpu` and indicating the desired number of GPUs. This number
 should not exceed the GPU quota specified by the value of the
 "**OpenShift Request on GPU Quota**" attribute that has been approved for your
 "**NERC-OCP (OpenShift)**" resource allocation on NERC's ColdFront as
 [described here](../../get-started/allocation/allocation-details.md#pi-and-manager-allocation-view-of-openshift-resource-allocation).
 
+    !!! note "Pod Spec: tolerations & nodeSelector"
+
+        When requesting GPU resources directly from pods and deployments, you must include the spec.tolerations and spec.nodeSelector shown above, for your ddesired GPU type.
+
 If you need to increase this quota value, you can request a change as
 [explained here](../../get-started/allocation/allocation-change-request.md#request-change-resource-allocation-attributes-for-openshift-project).
 
@@ -203,22 +216,25 @@ the name of the GPU device:
 We can specify information about the GPU product type, family, count, and so on,
 as shown in the Pod Spec above. Also, these node labels can be used in the Pod Spec
 to schedule workloads based on criteria such as the GPU device name, as shown under
-_nodeSelector_ as shown below:
+_nodeSelector_ in this case (NVIDIA V100 GPU):
 
-    apiVersion: v1
-    kind: Pod
-    metadata:
-      name: gpu-pod2
     spec:
-      restartPolicy: Never
       containers:
-        - name: cuda-container
-          image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
-          command: ["sleep"]
-          args: ["infinity"]
-          resources:
-            limits:
-              nvidia.com/gpu: 1
+      - name: app
+        image: ...
+        resources:
+          requests:
+            memory: "64Mi"
+            cpu: "250m"
+            nvidia.com/gpu: 1
+          limits:
+            memory: "128Mi"
+            cpu: "500m"
+      tolerations:
+        - key: nvidia.com/gpu.product
+          operator: Equal
+          value: Tesla-V100-PCIE-32GB
+          effect: NoSchedule
       nodeSelector:
         nvidia.com/gpu.product: Tesla-V100-PCIE-32GB
 

diff --git a/nerc-theme/main.html b/nerc-theme/main.html
@@ -1,3 +1,4 @@
+<!--
 {% extends "base.html" %} {% block announce %}
 <div class="parent">
     <div class="maintain">
@@ -23,6 +24,7 @@
         ></iframe>
     </div>
 </div>
+-->
 {% endblock %} {% block htmltitle %}
 <title>New England Research Cloud(NERC)</title>
 {% endblock %} {% block footer %}