Skip to content

Picotte Early Access

General

Access

Storage

  • Home directories: 64 GB quota (subject to change)
  • Group directories: /ifs/groups/somethingGrp
  • Fast parallel scratch (replaces Lustre): /beegfs/scratch
    • No need to manually set striping
  • Local scratch on jobs: /local/scratch
  • Per-job local scratch (automatically deleted at end of job): given by environment variable TMP_DIR

Hardware

  • CPU: Intel Xeon Platinum 8268 @ 2.90 GHz (Cascade Lake) -- GCC "-march=cascadelake"
  • GPUs: Nvidia V100-SXM2 (NvLink)

Job Scheduler

  • Job scheduler is Slurm. Documentation and examples:
  • sge2slurm script will partially automate translating a Grid Engine job script to a Slurm job script; should be available by default (full path is /usr/local/bin/sge2slurm)
  • Partitions:
    • def - default partition; total 74 nodes; 48 hour limit
    • bm - big memory nodes; total 2 nodes, both with 1.5 TB RAM
    • gpu - GPU nodes; total 12 nodes, 24 hour limit; GPU: 4x Nvidia Tesla V100-SXM2 per node
      • Request GPU resources needed (up to 4): #SBATCH --gres=gpu:N

Software

  • Modulefile system is now (mostly) Lua-based Lmod
  • Necessary module:
    • slurm/picotte/20.02.5 (should be loaded by default)
  • Modules for various tools:
    • Compilers:
      • GCC - gcc/9.2.0 (should be loaded by default)
      • Intel Compilers (available as of Fri Nov 6 19:15); may require unloading the gcc/9.2.0 module. Preliminary tests (using LAMMPS) show 2-3x speedup over GCC 8.3
    • CUDA Toolkit & SDK 10.1, 10.2, 11.0, 11.1 installed using Bright-provided modules
      • Do "module avail cuda"
      • Recommended version is 11.0
    • CuDNN (Deep Neural Networks)
      • Do "module avail cudnn"
    • R
      • No modulefile needed
      • Default is Microsoft R Open 4.0.2 https://mran.microsoft.com/open
      • multithreaded - may be limited to 32 threads - you will have to test
    • Python
      • python37
      • python/intel/2020.4.912 (Intel Python 3.7.7)
    • OpenMPI w/ Mellanox Infiniband acceleration
      • openmpi/mlnx/gcc/64/4.0.3rc4
  • Jupyter Hub
  • Maple
    • maple/2018

Installing Your Own Software for Your Group

GPU/CUDA applications

Compiling CUDA Applications

  • Request an interactive session in the gpu partition, and request at least one GPU device:

srun --gres=gpu:1 --time=8:00:00 --account=somethingPrj -p gpu --pty /bin/bash

  • Recommended CUDA version: 11.0
  • All CUDA-enabled applications should load at least:

cuda11.0/toolkit

MPI (non-GPU)

  • Modules to load:
    • gcc/9.2.0
    • openmpi/mlnx/gcc/64/4.0.3rc4
    • cm-pmix3
  • To run in job, use either "mpirun" as usual, or use "srun" (NB there are issues with srun, so use mpirun):
srun --mpi=pmix_v3 ${PROGRAM} ${PROGRAM_ARGS}

MPI (CUDA GPU)

  • Modules to load:
    • XXX will change - openmpi/4.0.5 (from /opt/picotte/modulefiles)
      • Will also load: gcc/9.2.0, cuda11.0/toolkit, hwloc/1.11.11, cm-pmix3/3.1.4
  • Uses:
    • CUDA 11.0
    • PMIx 3.1.4
    • Mellanox UCX
    • Mellanox HCOLL

Tensorflow on GPU

  • Load the following modules:
    • cuda-dcgm
    • cuda11.0/toolkit
    • cudnn8.0-cuda11.0

See Also