You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the light of more and more accelerator applications (AI, base
mapping, ...) the fall-back onto `slurm_extra` becomes a bit tedious to
use. Hence, the resource support for `gres`.
Addresses issue #52 (and to a minor extent: #18 and #104). Supersedes PR
#172 .
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Updated documentation section on "GPU Jobs" to clarify how to request
GPU resources with new syntax examples.
- **Bug Fixes**
- Improved error handling and reporting for job submission processes.
- Clarified error messages in test cases for better understanding.
- **Dependency Updates**
- Updated `snakemake-executor-plugin-slurm-jobstep` dependency version
from `^0.2.0` to `^0.3.0`.
- **Tests**
- Streamlined test cases by removing less relevant tests and enhancing
clarity of error messages.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Johannes Köster <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
To submit "ordinary" MPI jobs, submitting with `tasks` (the MPI ranks) is sufficient. Alternatively, on some clusters, it might be convenient to just configure `nodes`. Consider using a combination of `tasks` and `cpus_per_task` for hybrid applications (those that use ranks (multiprocessing) and threads). A detailed topology layout can be achieved using the `slurm_extra` parameter (see below) using further flags like `--distribution`.
90
90
91
+
### GPU Jobs
92
+
93
+
SLURM allows to specify GPU request with the `--gres` or `--gpus` flags and Snakemake takes a similar approach. Resources can be asked for with
94
+
95
+
- The resource `gpu` can be used, e.g. by just requesting the number of GPUs like `gpu=2`. This can be combined with the `gpu_model` resource, i.e. `gpu_model=a100` or independently. The combination will result in a flag to `sbatch` like `--gpus=a100:2`. The Snakemake `gpu` resource has to be number.
96
+
- Alternatively, the resource `gres`, the syntax is `<string>:<number>` or `<string>:<model>:<number>`, i.e. `gres=gpu:1` or `gres=gpu:a100:2` (assuming GPU model).
97
+
98
+
.. note:: Internally, Snakemake knows the resource `gpu_manufacturer`, too. However, SLURM does not know the distinction between model and manufacturer. Essentially, the preferred way to request an accelerator will depend on your specific cluster setup.
99
+
Also, to be consistent within Snakemake, the resource is called `gpu` not `gpus`.
100
+
101
+
Additionally, `cpus_per_gpu` can be set - Snakemakes `threads` settings will otherwise be used to set `cpus_per_gpu`. If `cpus_per_gpu` is lower or equal to zero, no CPU is requested from SLURM (and cluster defaults will kick in, if any).
102
+
103
+
A sample workflow profile might look like:
104
+
105
+
```YAML
106
+
set-resources:
107
+
gres_request_rule:
108
+
gres: "gpu:1"
109
+
110
+
multi_gpu_rule:
111
+
gpu: 2
112
+
gpu_model: "a100"
113
+
cpus_per_gpu: 4
114
+
115
+
no_cpu_gpu_rule:
116
+
gpu: 1
117
+
cpus_per_gpu: 0# Values <= 0 indicate that NO CPU request string
118
+
# will be issued.
119
+
```
120
+
91
121
### Running Jobs locally
92
122
93
123
Not all Snakemake workflows are adapted for heterogeneous environments, particularly clusters. Users might want to avoid the submission of _all_ rules as cluster jobs. Non-cluster jobs should usually include _short_ jobs, e.g. internet downloads or plotting rules.
0 commit comments