GPU Jobs on Picotte
Summary
CUDA toolkit to use -- one of:
- 11.0
- 11.1
- 11.2
- 11.4
Options for sbatch:
- Partition:
--partition=gpu
- ~~GRES (Generic RESources), where 'N' is the number of GPU devices per node to use (<= 4): --gres=gpu:N~~
- Number of GPUs per node:
--gpus-per-node=4
- this is equivalent to
--gres=gpu:4
, which is still a valid way of specifying GPUs
- this is equivalent to
- CAUTION sometimes "--cpus-per-gpu" behaves as expected, sometimes
"--ntasks" needs to be specified; we have not pinned down when one
works and when it does not; please check your application.
- CPU cores:
--cpus-per-gpu=12
(i.e. allocates 12 CPU cores per allocated GPU device) - Manually specify the total number of slots/tasks, taking into
account number of nodes as well: use 12 per GPU, e.g.
--nodes=2 --gpus-per-node=4 --ntasks=96
- CPU cores:
- Memory can be specified per GPU:
--mem-per-gpu=42G
MISSING TMPDIR /local/scratch/jobid -- this happens when specifying "--ntasks", but not when specifying "--cpus-per-gpu"
- If your program emits error messages like
"
/local/scratch/nnnnnnn: no such file or directory
", manually create that directory:
mkdir /local/scratch/$SLURM_JOBID
Module Path
CUDA-enabled software is in a different directory tree. In your job scripts, or in interactive sessions, do
module use /ifs/opt_cuda/modulefiles
NVIDIA GPU Cloud (NGC) Containers
See: NVIDIA GPU Cloud Containers
Do:
module use /ifs/opt_cuda/ngc-container-environment-modules
Hybrid MPI with CUDA
Modulefile (in /ifs/opt_cuda/modulefiles
)
- picotte-openmpi/cuda11.0/4.1.0 (Uses GCC)
- picotte-openmpi/cuda11.2/4.1.0 (Uses Intel ICC)