-
Notifications
You must be signed in to change notification settings - Fork 78
Satori Cluster
The Satori user documentation is available at https://mit-satori.github.io/ and the below focuses on running ClimateMachine.jl on Satori. If you run into troubles or have questions contact either @vchuravy or @christophernhill
export HOME2=/nobackup/users/`whoami`
cd ${HOME2}
git clone https://github.com/CliMA/ClimateMachine.jl ClimateMachine
First we need to request resources from SLURM. Our configuration is to launch one MPI rank per available GPU.
Satori has 4 GPUs per node, so we ask for 4 tasks per node. To scale up you can increase the number of nodes. We ask for mem=0
to not hit the soft-limit on memory for large problem sizes.
!/bin/bash
# Begin SLURM Directives
#SBATCH --job-name=ClimateMachine
#SBATCH --time=30:00
#SBARCH --mem=0
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gres="gpu:4" # GPUs per Node
#SBATCH --cpus-per-task=4
Loading the necessary modules. We purge the previously loaded set of modules to prevent loading conflicts.
Julia 1.3 has an implicit dependency on GCC 8.3 so that we can load libquadmath
, and CUDA 10.1 is the latest
currently supported CUDA on Satori.
# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1
module load spack
module load gcc/8.3.0 # to get libquadmath
module load julia/1.3.0
module load cuda/10.1.243
module load openmpi/3.1.4-pmi-cuda
Set JULIA_PROJECT
to be the directory to which you installed ClimateMachine
. We use a separate depot (think global cache),
if you plan to do many runs in parallel with different package configurations you might want to separate this out further, but
be aware that separate caches lead to higher startup overhead.
We turn of pre-compiled binaries for MPI
and CUDA
, so that we use the system installations. Lastly we allow Julia to use
multiple threads, matching the number of CPUs per SLURM task.
export JULIA_PROJECT=${HOME2}/ClimateMachine
export JULIA_DEPOT_PATH=${HOME2}/julia_depot
export JULIA_MPI_BINARY=system
export JULIA_CUDA_USE_BINARYBUILDER=false
export JULIA_NUM_THREADS=${SLURM_CPUS_PER_TASK:=1}
julia -e 'using Pkg; pkg"instantiate"; pkg"build MPI"'
julia -e 'using Pkg; pkg"precompile"'
-
julia -e 'using Pkg; pkg"instantiate"; pkg"build MPI"'
: instantiates the project and makes sure thatMPI.jl
picks up any changes any changes to the environment. -
julia -e 'using Pkg; pkg"precompile"'
Precompiles the project so that the cache is not contended by the MPI ranks. Needs to be on it's own line, otherwise we might miss some packages.
You can comment these two lines out, if you are running many runs with the same configuration and MPI variant.
Set EXPERIMENT
to the experiment you want to run. You can also add additional command line flags there.
# Cleaning `CUDA_VISIBLE_DEVICES`
# This is needed to take advantage of faster local CUDA-aware communication
cat > launch.sh << EoF_s
#! /bin/sh
export CUDA_VISIBLE_DEVICES=0,1,2,3
exec \$*
EoF_s
chmod +x launch.sh
EXPERIMENT="${HOME2}/ClimateMachine/experiments/AtmosLES/dycoms.jl --output-dir=${HOME2}/clima-${SLURM_JOB_ID}"
srun --mpi=pmi2 ./launch.sh julia ${EXPERIMENT}