Picotte Early Access
General
Access
- NEW 2020-11-20 Access to Picotte now requires Drexel VPN: https://drexel.edu/it/connect/vpn/
- Login:
picottelogin.urcf.drexel.edu
(a.k.a.picotte001.urcf.drexel.edu
)
Storage
- Home directories: 64 GB quota (subject to change)
- Group directories:
/ifs/groups/somethingGrp
- Fast parallel scratch (replaces Lustre):
/beegfs/scratch
- No need to manually set striping
- Local scratch on jobs:
/local/scratch
- Per-job local scratch (automatically deleted at end of job): given by environment variable TMP_DIR
Hardware
- CPU: Intel Xeon Platinum 8268 @ 2.90 GHz (Cascade Lake) -- GCC "-march=cascadelake"
- GPUs: Nvidia V100-SXM2 (NvLink)
Job Scheduler
- Job scheduler is Slurm. Documentation and examples:
sge2slurm
script will partially automate translating a Grid Engine job script to a Slurm job script; should be available by default (full path is/usr/local/bin/sge2slurm
)- Partitions:
def
- default partition; total 74 nodes; 48 hour limitbm
- big memory nodes; total 2 nodes, both with 1.5 TB RAMgpu
- GPU nodes; total 12 nodes, 24 hour limit; GPU: 4x Nvidia Tesla V100-SXM2 per node- Request GPU resources needed (up to 4): #SBATCH --gres=gpu:N
Software
- Modulefile system is now (mostly) Lua-based Lmod
- This shouldn't make much practical difference: old Tcl-based modulefiles still work. Lua-based module command has more (helpful) features.
- FAQ https://lmod.readthedocs.io/en/latest/040_FAQ.html
- Necessary module:
- slurm/picotte/20.02.5 (should be loaded by default)
- Modules for various tools:
- Compilers:
- GCC - gcc/9.2.0 (should be loaded by default)
- Intel Compilers (available as of Fri Nov 6 19:15); may require unloading the gcc/9.2.0 module. Preliminary tests (using LAMMPS) show 2-3x speedup over GCC 8.3
- CUDA Toolkit & SDK 10.1, 10.2, 11.0, 11.1 installed using
Bright-provided modules
- Do "
module avail cuda
" - Recommended version is 11.0
- Do "
- CuDNN (Deep Neural Networks)
- Do "
module avail cudnn
"
- Do "
- R
- No modulefile needed
- Default is Microsoft R Open 4.0.2 https://mran.microsoft.com/open
- multithreaded - may be limited to 32 threads - you will have to test
- Python
- python37
- python/intel/2020.4.912 (Intel Python 3.7.7)
- OpenMPI w/ Mellanox Infiniband acceleration
- openmpi/mlnx/gcc/64/4.0.3rc4
- Compilers:
- Jupyter Hub
- Maple
- maple/2018
Installing Your Own Software for Your Group
- Use Spack - https://github.com/spack/spack; also installed on Picotte Spack
- Perform a manual installation, and create an appropriate modulefile
GPU/CUDA applications
Compiling CUDA Applications
- Request an interactive session in the
gpu
partition, and request at least one GPU device:
srun --gres=gpu:1 --time=8:00:00 --account=somethingPrj -p gpu --pty /bin/bash
- Recommended CUDA version: 11.0
- All CUDA-enabled applications should load at least:
cuda11.0/toolkit
- http://developer.nvidia.com/content/introduction-cuda-aware-mpi
- https://developer.nvidia.com/blog/benchmarking-cuda-aware-mpi/
MPI (non-GPU)
- Modules to load:
- gcc/9.2.0
- openmpi/mlnx/gcc/64/4.0.3rc4
- cm-pmix3
- To run in job, use either "mpirun" as usual, or use "srun" (NB there are issues with srun, so use mpirun):
srun --mpi=pmix_v3 ${PROGRAM} ${PROGRAM_ARGS}
MPI (CUDA GPU)
- Modules to load:
- XXX will change - openmpi/4.0.5 (from /opt/picotte/modulefiles)
- Will also load: gcc/9.2.0, cuda11.0/toolkit, hwloc/1.11.11, cm-pmix3/3.1.4
- XXX will change - openmpi/4.0.5 (from /opt/picotte/modulefiles)
- Uses:
- CUDA 11.0
- PMIx 3.1.4
- Mellanox UCX
- Mellanox HCOLL
Tensorflow on GPU
- Load the following modules:
- cuda-dcgm
- cuda11.0/toolkit
- cudnn8.0-cuda11.0