Skip to content

Picotte Benchmarks

Files located in /beegfs/scratch/benchmarks on Picotte.

High Performance Linpack (HPL)

  • Benchmarks performed by Dell as part of installation

t.b.a.

Deep Learning

Deep learning benchmark code located on Picotte at /beegfs/scratch/benchmarks/deepLearn/code.

Tests performed on Picotte 48 core GPU nodes using two v100 GPUs per node. Tests performed on XSEDE - PSC Bridges 28 core GPU nodes using two p100 GPUs per node. The entire node was requested.

NOTE: Picotte benchmarks taken before general user access and XSEDE benchmarks taken during general user access.

Dependencies

python==3.6.5

scikit-learn==0.19.2

tensorflow-gpu==1.9.0

keras==2.2.2

Picotte Specific:

module load cuda-dcgm
module load cuda10.1/toolkit

nvidia-smi -L

Trials

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Picotte 6.03 h (21707 sec) 5.90 h (21230 sec) 6.01 h (21623 sec) 5.93 h (21349 sec) 5.94 h (21378 sec) 5.96 h (21457.5 sec)
XSEDE - (PSC) 9.02 h (32473 sec) 9.06 h (32613 sec) 9.01 h (32423 sec) 9.04 h (32550 sec) 9.03 h (32500 sec) 9.03 h (32511.8 sec)

Large Memory

Large memory benchmarking code located on Picotte at /beegfs/scratch/benchmarks/LM. Python code for large memory tests is XGBLM.py. Required subdirectories are InputLM and OutputLM.

Tests performed on Picotte 48 core BM nodes with a maximum of 1.5 TB of total available memory (1546799MB). Tests performed on XSEDE - PSC Bridges 64 core LM nodes with a maximum of 3 TB of total available memory (3096000MB). 1 Node with 10 slots and a maximum 512GB of memory was requested on both Picotte and XSEDE.

NOTE: Picotte benchmarks taken before general user access and XSEDE benchmarks taken during general user access.

Dependencies

python==3.8.5

pandas==1.1.1

sklearn(scikit-learn)==0.23.2

matplotlib==3.3.1

xgboost==1.2.0 (minimum of 1.0.2 required to run)

Trials

211GB Dataset

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Picotte 01:20:40 01:20:33 01:20:37 01:20:34 01:20:40 01:20:36.8
XSEDE - (PSC) 02:10:26 02:05:49 02:15:51 02:04:50 02:07:50 02:08:57.2

371GB Dataset

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Average
Picotte 02:12:59 02:13:02 02:13:00 02:12:58 02:13:09 02:13:01.6
XSEDE - (PSC) 03:35:33 03:48:02 03:32:28 04:09:14 04:09:55 03:51:02.4

BeeGFS Storage

BeeGFS is a high-performance parallel file system.[1]

Network Benchmark

[root@picottemgmt]# taskset -c 0-12 ~/BeeGFS/iozone/iozone3_434/src/current/iozone -i0 -r2m -s128g -x -t12
        Iozone: Performance Test of File I/O
                Version $Revision: 3.434 $
                Compiled for 64 bit mode.
                Build: linux

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                     Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                     Vangel Bojaxhi, Ben England, Vikentsi Lapa,
                     Alexey Skidanov.

        Run began: Wed Aug 19 11:30:46 2020

        Record Size 2048 kB
        File size set to 134217728 kB
        Stonewall disabled
        Command line used: /root/BeeGFS/iozone/iozone3_434/src/current/iozone -i0 -r2m -s128g -x -t12
        Output is in kBytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 kBytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 12 processes
        Each process writes a 134217728 kByte file in 2048 kByte records

        Children see throughput for 12 initial writers  = 12066132.12 kB/sec
        Parent sees throughput for 12 initial writers   = 11999579.71 kB/sec
        Min throughput per process                      =  999969.31 kB/sec
        Max throughput per process                      = 1012962.00 kB/sec
        Avg throughput per process                      = 1005511.01 kB/sec
        Min xfer                                        = 134217728.00 kB

        Children see throughput for 12 rewriters        = 12050098.31 kB/sec
        Parent sees throughput for 12 rewriters         = 11979829.23 kB/sec
        Min throughput per process                      =  998330.44 kB/sec
        Max throughput per process                      = 1018436.94 kB/sec
        Avg throughput per process                      = 1004174.86 kB/sec
        Min xfer                                        = 134217728.00 kB



iozone test complete.
  • Remember to do this after completion: echo 0 > /proc/fs/beegfs/*/netbench_mode

Write Benchmark

  • beegfs-ctl --storagebench --write --alltargets --blocksize=512K --size=128G --threads=15 --wait
Server benchmark status:
Finished:    15

Write benchmark results:
Min throughput:            2319500  KiB/s   nodeID: beegfs003-numa0-2 [ID: 5], targetID: 5
Max throughput:            3024892  KiB/s   nodeID: beegfs001-numa1-3 [ID: 9], targetID: 9
Avg throughput:            2731805  KiB/s
Aggregate throughput:     40977080  KiB/s

Read Benchmark

  • beegfs-ctl --storagebench --read --alltargets --blocksize=512K --size=128G --threads=15 --wait
Server benchmark status:
Finished:    15

Read benchmark results:
Min throughput:            2662166  KiB/s   nodeID: beegfs003-numa0-3 [ID: 6], targetID: 6
Max throughput:            3732209  KiB/s   nodeID: beegfs001-numa1-3 [ID: 9], targetID: 9
Avg throughput:            2951499  KiB/s
Aggregate throughput:     44272491  KiB/s

After read benchmark is done, cleanup benchmark files:

beegfs-ctl --storagebench --cleanup  --alltargets

References

[1] BeeGFS website

[2] IOzone Filesystem Benchmark website