NVIDIA CUDA on Proteus

Hardware

Eight nodes (gpu01 -- gpu08), each with
dual NVIDIA K20Xm - Kepler architecture, Tesla microarchitecture, GK110 die[1][2] -- 2688 CUDA cores, 6144 MB RAM

NVIDIA CUDA 9.0 is installed on all GPU nodes.

Compile for native compute capability target 3.5 architecture (sm_35).[3]

-gencode arch=compute_35,code=sm_35

If the software expects a CUDA_ARCH environment variable, use:

CUDA_ARCH=35

CUDA will not work with the Intel Compiler. Please use GCC: the modulefile gcc/4.8.1 that is loaded by default will work.

NOTES

This may or may not improve the performance of your code. Benchmarking your own code is necessary.
Simply using mpicc to compile a CUDA-enabled code is not enough to generate an executable that does both CUDA and MPI. The code has to be specifically written to integrate CUDA with MPI.

PyCUDA[4][5] is a Python interface to CUDA.

PyCUDA is installed on the GPU nodes as part of the python37 conda environment.