Skip to content

Message Passing Interface

Overview

The Message Passing Interface is a standardized and portable message-passing system for parallel computation using multiple physical computers (nodes).[1][2]

There are multiple implementations of the standard, some of which are available on Proteus.

Available Implementations

Picotte

These packages are compiler-specific:

  • Open MPI
    • GCC
      • picotte-openmpi/gcc/4.0.5
      • picotte-openmpi/gcc/4.1.0
      • picotte-openmpi/gcc/4.1.4
    • Intel ICC
      • picotte-openmpi/intel/2020/4.0.5
      • picotte-openmpi/intel/2020/4.1.0
      • picotte-openmpi/intel/2020/4.1.2
      • picotte-openmpi/intel/2020/4.1.4
    • CUDA-enabled -- first, do "module use /ifs/opt_cuda/modulefiles"
      • picotte-openmpi/cuda11.0/4.0.5 (Uses GCC)
      • picotte-openmpi/cuda11.0/4.1.0 (Uses GCC)
      • picotte-openmpi/cuda11.2/4.1.0 (Uses Intel ICC)
      • picotte-openmpi/cuda11.2/4.1.4 (Uses Intel ICC)
      • picotte-openmpi/cuda11.4/4.1.4 (GCC)

For the CUDA-enabled implementations, you can see the compiler version by doing:

[juser@gpu001 ~]$ module load picotte-openmpi/cuda11.4/4.1.4
[juser@gpu001 ~]$ mpicc --version
icc (ICC) 19.1.3.304 20200925
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

The CUDA-enabled implementations also require loading

Picotte hardware notes

Details of all hardware are at: Picotte Hardware and Software.

All Picotte compute nodes use one of these two CPUs:

You should not need to manually specify NUMA layouts since Open MPI uses hwloc to determine it.

You can view the "socket, core, thread" configuration using the sinfo_detail alias from the slurm_util modulefile:

[juser@picotte001 ~]$ module load slurm_util
[juser@picotte001 ~]$ sinfo_detail -p def
NODELIST      NODES PART       STATE CPUS    S:C:T     GRES   MEMORY FREE_MEM TMP_DISK CPU_LOAD REASON
node001           1 def*       mixed   48   4:12:1   (null)   192000   140440   864000     4.08 none
node002           1 def*       mixed   48   4:12:1   (null)   192000   171040   864000     3.00 none
node003           1 def*       mixed   48   4:12:1   (null)   192000   171621   864000     3.03 none
node004           1 def*       mixed   48   4:12:1   (null)   192000   147536   864000     3.00 none
node005           1 def*       mixed   48   4:12:1   (null)   192000   162570   864000     3.00 none
node006           1 def*       mixed   48   4:12:1   (null)   192000   169135   864000     2.99 none
...
node072           1 def*       mixed   48   4:12:1   (null)   192000   157423   864000     3.00 none
node073           1 def*       mixed   48   4:12:1   (null)   192000   157114   864000     3.00 none
node074           1 def*       mixed   48   4:12:1   (null)   192000   152783   864000     3.00 none
[juser@picotte001 ~]$ sinfo_detail -p gpu
NODELIST      NODES PART       STATE CPUS    S:C:T     GRES   MEMORY FREE_MEM TMP_DISK CPU_LOAD REASON
gpu001            1 gpu        mixed   48   2:24:1 gpu:v100   192000    18191  1637000     4.69 none
gpu002            1 gpu         idle   48   2:24:1 gpu:v100   192000    94592  1637000     0.00 none
...
gpu011            1 gpu         idle   48   2:24:1 gpu:v100   192000    39289  1637000     0.01 none
gpu012            1 gpu         idle   48   2:24:1 gpu:v100   192000   142535  1637000     0.15 none
[juser@picotte001 ~]$ sinfo_detail -p bm
NODELIST      NODES PART       STATE CPUS    S:C:T     GRES   MEMORY FREE_MEM TMP_DISK CPU_LOAD REASON
bigmem001         1 bm          idle   48   2:24:1   (null)  1546000  1368526  1724000     0.00 none
bigmem002         1 bm          idle   48   2:24:1   (null)  1546000  1541778  1724000     0.00 none

The column “S:C:T” shows “Socket”, “Core”, and “Thread”. Here, “Thread” means Intel’s Hyper-Threading,[3] where a single physical core is presented by the hardware as two virtual cores. This feature may increase performance in consumer applications (Office, web browsing, etc.) but will decrease performance in compute-intensive applications. In an HPC context, Hyper-Threading is always turned off, so T=1.

Open MPI

Note that Open MPI is not OpenMP[4]. OpenMP is an API for multi-platform shared-memory parallel programming in C/C++ and Fortran, i.e. single-host multithreaded programming on our compute nodes. Open MPI is an implementation of the MPI-2 standard, which provides multi-host parallel execution. Open MPI uses OpenMP for single-host shared-memory parallel execution.

Common Environment Variables

OpenMPI may be controlled by environment variables named OMPI_*. Please note that some of these should not be changed because they define necessary compile-time flags, and library locations.

For convenience, these environment variables are set -- actual values will vary by version loaded:

MPICC=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpicc
MPI_CPPFLAGS=-I/ifs/opt/openmpi/intel/2020/4.1.4/include
MPICXX=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpic++
MPIF77=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpif77
MPIF90=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpif90
MPIFC=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpifort
MPI_HOME=/ifs/opt/openmpi/intel/2020/4.1.4
MPI_INCDIR=/ifs/opt/openmpi/intel/2020/4.1.4/include
MPI_LIBDIR=/ifs/opt/openmpi/intel/2020/4.1.4/lib
MPI_RUN=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpirun -x LD_LIBRARY_PATH -x BASH_ENV
MPIRUN=/ifs/opt/openmpi/intel/2020/4.1.4/bin/mpirun -x LD_LIBRARY_PATH -x BASH_ENV
OMPI_CFLAGS=-fopenmp
OMPI_LDFLAGS=-L/ifs/opt/openmpi/intel/2020/4.1.4/lib -Wl,-rpath -Wl,/ifs/opt/openmpi/intel/2020/4.1.4/lib

Running

Invocation of mpirun is the same as the others. Using the full path to the "mpirun" command is recommended. It is given by the MPI_RUN environment variable:

     ${MPI_RUN} myprogram --opt optval

The MPI_RUN environment variable also sets some common command line options to export environment variables: -x LD_LIBRARY_PATH -x BASH_ENV; so, you do not need to set them manually.

Example: job requests 2 nodes, 48 MPI ranks per node:

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48

See sbatch documentation[5] (or man page on picotte001) for more detailed information.

Performance Differences

Performance differences tend to be very application-specific. However, there is some experience which indicates that on Intel CPUs, Intel MPI tends to perform better.[6]

Hybrid MPI-OpenMP Jobs

Please see Hybrid MPI-OpenMP Jobs.

MPI for Python

MPI for Python (a.k.a. mpi4py) is a Python module which takes advantage of MPI.[7]

Please see: MPI for Python

References

See Also


[1] The Message Passing Interface (MPI) official website

[2] Message Passing Interface Wikipedia article

[3] Intel® Hyper-Threading Technology

[4] OpenMP official website

[5] Slurm 21.08.8 Documentation - sbatch

[6] StackOverflow: Will mvapich be substantially better than openmpi? And How?

[7] MPI for Python Bitbucket project page

[8] OpenMPI FAQ - Tuning: Selecting Components