Message Passing Interface
Overview
The Message Passing Interface is a standardized and portable message-passing system for parallel computation using multiple physical computers (nodes).[1][2]
There are multiple implementations of the standard, some of which are available on Proteus.
Available Implementations
Picotte
These packages are compiler-specific:
- Open MPI
- GCC
- picotte-openmpi/gcc/4.0.5
- picotte-openmpi/gcc/4.1.0
- picotte-openmpi/gcc/4.1.4
- Intel ICC
- picotte-openmpi/intel/2020/4.0.5
- picotte-openmpi/intel/2020/4.1.0
- picotte-openmpi/intel/2020/4.1.2
- picotte-openmpi/intel/2020/4.1.4
- CUDA-enabled -- first, do
"
module use /ifs/opt_cuda/modulefiles
"- picotte-openmpi/cuda11.0/4.0.5 (Uses GCC)
- picotte-openmpi/cuda11.0/4.1.0 (Uses GCC)
- picotte-openmpi/cuda11.2/4.1.0 (Uses Intel ICC)
- picotte-openmpi/cuda11.2/4.1.4 (Uses Intel ICC)
- picotte-openmpi/cuda11.4/4.1.4 (GCC)
- GCC
For the CUDA-enabled implementations, you can see the compiler version by doing:
[juser@gpu001 ~]$ module load picotte-openmpi/cuda11.4/4.1.4
[juser@gpu001 ~]$ mpicc --version
icc (ICC) 19.1.3.304 20200925
Copyright (C) 1985-2020 Intel Corporation. All rights reserved.
The CUDA-enabled implementations also require loading
Picotte hardware notes
Details of all hardware are at: Picotte Hardware and Software.
All Picotte compute nodes use one of these two CPUs:
- Intel Xeon Platinum 8268
(
def
,bm
partitions) - Intel Xeon Platinum 8260
(
gpu
partition)
You should not need to manually specify
NUMA layouts since
Open MPI uses hwloc
to
determine it.
You can view the "socket, core, thread" configuration using the
sinfo_detail
alias from the slurm_util
modulefile:
[juser@picotte001 ~]$ module load slurm_util
[juser@picotte001 ~]$ sinfo_detail -p def
NODELIST NODES PART STATE CPUS S:C:T GRES MEMORY FREE_MEM TMP_DISK CPU_LOAD REASON
node001 1 def* mixed 48 4:12:1 (null) 192000 140440 864000 4.08 none
node002 1 def* mixed 48 4:12:1 (null) 192000 171040 864000 3.00 none
node003 1 def* mixed 48 4:12:1 (null) 192000 171621 864000 3.03 none
node004 1 def* mixed 48 4:12:1 (null) 192000 147536 864000 3.00 none
node005 1 def* mixed 48 4:12:1 (null) 192000 162570 864000 3.00 none
node006 1 def* mixed 48 4:12:1 (null) 192000 169135 864000 2.99 none
...
node072 1 def* mixed 48 4:12:1 (null) 192000 157423 864000 3.00 none
node073 1 def* mixed 48 4:12:1 (null) 192000 157114 864000 3.00 none
node074 1 def* mixed 48 4:12:1 (null) 192000 152783 864000 3.00 none
[juser@picotte001 ~]$ sinfo_detail -p gpu
NODELIST NODES PART STATE CPUS S:C:T GRES MEMORY FREE_MEM TMP_DISK CPU_LOAD REASON
gpu001 1 gpu mixed 48 2:24:1 gpu:v100 192000 18191 1637000 4.69 none
gpu002 1 gpu idle 48 2:24:1 gpu:v100 192000 94592 1637000 0.00 none
...
gpu011 1 gpu idle 48 2:24:1 gpu:v100 192000 39289 1637000 0.01 none
gpu012 1 gpu idle 48 2:24:1 gpu:v100 192000 142535 1637000 0.15 none
[juser@picotte001 ~]$ sinfo_detail -p bm
NODELIST NODES PART STATE CPUS S:C:T GRES MEMORY FREE_MEM TMP_DISK CPU_LOAD REASON
bigmem001 1 bm idle 48 2:24:1 (null) 1546000 1368526 1724000 0.00 none
bigmem002 1 bm idle 48 2:24:1 (null) 1546000 1541778 1724000 0.00 none
The column “S:C:T
” shows “Socket”, “Core”, and “Thread”. Here,
“Thread” means Intel’s Hyper-Threading,[3] where a single physical core
is presented by the hardware as two virtual cores. This feature may
increase performance in consumer applications (Office, web browsing,
etc.) but will decrease performance in compute-intensive applications.
In an HPC context, Hyper-Threading is always turned off, so T=1
.
Open MPI
Note that Open MPI is not OpenMP[4]. OpenMP is an API for multi-platform shared-memory parallel programming in C/C++ and Fortran, i.e. single-host multithreaded programming on our compute nodes. Open MPI is an implementation of the MPI-2 standard, which provides multi-host parallel execution. Open MPI uses OpenMP for single-host shared-memory parallel execution.
Common Environment Variables
OpenMPI may be controlled by environment variables named OMPI_*
.
Please note that some of these should not be changed because they define
necessary compile-time flags, and library locations.
For convenience, these environment variables are set -- actual values will vary by version loaded:
MPICC=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpicc
MPI_CPPFLAGS=-I/ifs/opt/openmpi/intel/2020/4.1.4/include
MPICXX=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpic++
MPIF77=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpif77
MPIF90=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpif90
MPIFC=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpifort
MPI_HOME=/ifs/opt/openmpi/intel/2020/4.1.4
MPI_INCDIR=/ifs/opt/openmpi/intel/2020/4.1.4/include
MPI_LIBDIR=/ifs/opt/openmpi/intel/2020/4.1.4/lib
MPI_RUN=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpirun -x LD_LIBRARY_PATH -x BASH_ENV
MPIRUN=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpirun -x LD_LIBRARY_PATH -x BASH_ENV
OMPI_CFLAGS=-fopenmp
OMPI_LDFLAGS=-L/ifs/opt/openmpi/intel/2020/4.1.4/lib -Wl,-rpath -Wl,/ifs/opt/openmpi/intel/2020/4.1.4/lib
Running
Invocation of mpirun
is the same as the others. Using the full
path to the "mpirun" command is recommended. It is given by the
MPI_RUN
environment variable:
${MPI_RUN} myprogram --opt optval
The MPI_RUN
environment variable also sets some common command line
options to export environment variables:
-x LD_LIBRARY_PATH -x BASH_ENV
; so, you do not need to set them
manually.
Example: job requests 2 nodes, 48 MPI ranks per node:
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
See sbatch
documentation[5] (or man page on picotte001) for more
detailed information.
Performance Differences
Performance differences tend to be very application-specific. However, there is some experience which indicates that on Intel CPUs, Intel MPI tends to perform better.[6]
Hybrid MPI-OpenMP Jobs
Please see Hybrid MPI-OpenMP Jobs.
MPI for Python
MPI for Python (a.k.a. mpi4py) is a Python module which takes advantage of MPI.[7]
Please see: MPI for Python
References
[1] The Message Passing Interface (MPI) official website
[2] Message Passing Interface Wikipedia article
[3] Intel® Hyper-Threading Technology
[5] Slurm 21.08.8 Documentation - sbatch
[6] StackOverflow: Will mvapich be substantially better than openmpi? And How?