GPU Jobs on Picotte

Summary

CUDA toolkit to use -- one of:

Options for sbatch:

Partition: --partition=gpu
~~GRES (Generic RESources), where 'N' is the number of GPU devices per node to use (<= 4): --gres=gpu:N~~
Number of GPUs per node: --gpus-per-node=4
- this is equivalent to --gres=gpu:4, which is still a valid way of specifying GPUs
CAUTION sometimes "--cpus-per-gpu" behaves as expected, sometimes "--ntasks" needs to be specified; we have not pinned down when one works and when it does not; please check your application.
- CPU cores: --cpus-per-gpu=12 (i.e. allocates 12 CPU cores per allocated GPU device)
- Manually specify the total number of slots/tasks, taking into account number of nodes as well: use 12 per GPU, e.g. --nodes=2 --gpus-per-node=4 --ntasks=96
Memory can be specified per GPU: --mem-per-gpu=42G

MISSING TMPDIR /local/scratch/jobid -- this happens when specifying "--ntasks", but not when specifying "--cpus-per-gpu"

If your program emits error messages like "/local/scratch/nnnnnnn: no such file or directory", manually create that directory:

mkdir /local/scratch/$SLURM_JOBID

CUDA-enabled software is in a different directory tree. In your job scripts, or in interactive sessions, do

module use /ifs/opt_cuda/modulefiles

Do:

module use /ifs/opt_cuda/ngc-container-environment-modules

Modulefile (in /ifs/opt_cuda/modulefiles)