Skip to content

Commit

Permalink
Add GPU subsection to OCP docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dystewart committed Jan 20, 2025
1 parent e31e0ca commit 60ebdd6
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 0 deletions.
52 changes: 52 additions & 0 deletions docs/openshift/gpus/intro-to-gpus-in-nerc-ocp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Introduction to GPUs in NERC OpenShift

NERC OCP clusters leverage the [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html)
as well as the [Node Feature Discovery Operator](https://docs.openshift.com/container-platform/4.15/hardware_enablement/psap-node-feature-discovery-operator.html)
to manage and deploy GPU worker nodes to clusters.

## NERC GPU Worker Node Arhitectures

The NERC OpenShift environment currently supports two different NVIDIA GPU
products:
1. NVIDIA-A100-SXM4-40GB (A100)
2. Tesla-V100-PCIE-32GB (V100)

A100 worker nodes contain 4 individual gpus, each with 40GB of memory
V100 worker nodes contain 1 gpu with 32 GB of memory

## Accessing GPU Resources

Access to GPU nodes is handled via OCP project allocations through NERC
ColdFront. By default, user projects in NERC OCP clusters do not have access to
GPUs and access must be granted through the user's ColdFront allocation by a
NERC admin.

## Deploying Workloads to GPUs

There are two ways to deploy workloads on GPU nodes:

1. Deploy directly in your OCP namespace:

In your project namespace you can deploy a GPU workload by explicitely
requesting a GPU in your manifest, like for instance:
```
apiVersion: v1
kind: Pod
metadata:
name: sample-gpu-request
spec:
restartPolicy: Never
containers:
- name: sample-gpu-request
image: <your-image-url>
...
...
resources:
limits:
nvidia.com/gpu: 1
```

1. Deploy through RHOAI

See [Populate the data science project with a Workbench](https://github.com/nerc-project/nerc-docs/blob/main/docs/openshift-ai/data-science-project/using-projects-the-rhoai.md#populate-the-data-science-project-with-a-workbench)
for selecting GPU options.
4 changes: 4 additions & 0 deletions docs/openshift/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ the list below.

- [Storage Overview](storage/storage-overview.md)

## GPUs

- [Intro to GPUs in NERC OCP Clusters](gpus/intro-to-gpus-in-nerc-ocp.md)

## Deleting Applications

- [Deleting your applications](applications/deleting-applications.md)
Expand Down

0 comments on commit 60ebdd6

Please sign in to comment.