Picotte Early Access

General

NEW 2020-11-20 Access to Picotte now requires Drexel VPN: https://drexel.edu/it/connect/vpn/
Login: picottelogin.urcf.drexel.edu (a.k.a. picotte001.urcf.drexel.edu)

Home directories: 64 GB quota (subject to change)
Group directories: /ifs/groups/somethingGrp
Fast parallel scratch (replaces Lustre): /beegfs/scratch
- No need to manually set striping
Local scratch on jobs: /local/scratch
Per-job local scratch (automatically deleted at end of job): given by environment variable TMP_DIR

CPU: Intel Xeon Platinum 8268 @ 2.90 GHz (Cascade Lake) -- GCC "-march=cascadelake"
GPUs: Nvidia V100-SXM2 (NvLink)

Job scheduler is Slurm. Documentation and examples:
sge2slurm script will partially automate translating a Grid Engine job script to a Slurm job script; should be available by default (full path is /usr/local/bin/sge2slurm)
- See: https://github.com/Delphinite/sge2slurm
Partitions:
- def - default partition; total 74 nodes; 48 hour limit
- bm - big memory nodes; total 2 nodes, both with 1.5 TB RAM
- gpu - GPU nodes; total 12 nodes, 24 hour limit; GPU: 4x Nvidia Tesla V100-SXM2 per node
  - Request GPU resources needed (up to 4): #SBATCH --gres=gpu:N

Modulefile system is now (mostly) Lua-based Lmod
- This shouldn't make much practical difference: old Tcl-based modulefiles still work. Lua-based module command has more (helpful) features.
- FAQ https://lmod.readthedocs.io/en/latest/040_FAQ.html
Necessary module:
- slurm/picotte/20.02.5 (should be loaded by default)
Modules for various tools:
- Compilers:
  - GCC - gcc/9.2.0 (should be loaded by default)
  - Intel Compilers (available as of Fri Nov 6 19:15); may require unloading the gcc/9.2.0 module. Preliminary tests (using LAMMPS) show 2-3x speedup over GCC 8.3
- CUDA Toolkit & SDK 10.1, 10.2, 11.0, 11.1 installed using Bright-provided modules
  - Do "module avail cuda"
  - Recommended version is 11.0
- CuDNN (Deep Neural Networks)
  - Do "module avail cudnn"
- R
  - No modulefile needed
  - Default is Microsoft R Open 4.0.2 https://mran.microsoft.com/open
  - multithreaded - may be limited to 32 threads - you will have to test
- Python
  - python37
  - python/intel/2020.4.912 (Intel Python 3.7.7)
- OpenMPI w/ Mellanox Infiniband acceleration
  - openmpi/mlnx/gcc/64/4.0.3rc4
Jupyter Hub
Maple
- maple/2018

Request an interactive session in the gpu partition, and request at least one GPU device:

srun --gres=gpu:1 --time=8:00:00 --account=somethingPrj -p gpu --pty /bin/bash

cuda11.0/toolkit

Modules to load:
- gcc/9.2.0
- openmpi/mlnx/gcc/64/4.0.3rc4
- cm-pmix3
To run in job, use either "mpirun" as usual, or use "srun" (NB there are issues with srun, so use mpirun):

srun --mpi=pmix_v3 ${PROGRAM} ${PROGRAM_ARGS}

Modules to load:
- XXX will change - openmpi/4.0.5 (from /opt/picotte/modulefiles)
  - Will also load: gcc/9.2.0, cuda11.0/toolkit, hwloc/1.11.11, cm-pmix3/3.1.4
Uses:
- CUDA 11.0
- PMIx 3.1.4
- Mellanox UCX
- Mellanox HCOLL