Picotte Benchmarks
Files located in /beegfs/scratch/benchmarks on Picotte.
High Performance Linpack (HPL)
- Benchmarks performed by Dell as part of installation
t.b.a.
Deep Learning
Deep learning benchmark code located on Picotte at /beegfs/scratch/benchmarks/deepLearn/code.
Tests performed on Picotte 48 core GPU nodes using two v100 GPUs per node. Tests performed on XSEDE - PSC Bridges 28 core GPU nodes using two p100 GPUs per node. The entire node was requested.
NOTE: Picotte benchmarks taken before general user access and XSEDE benchmarks taken during general user access.
Dependencies
python==3.6.5
scikit-learn==0.19.2
tensorflow-gpu==1.9.0
keras==2.2.2
Picotte Specific:
module load cuda-dcgm
module load cuda10.1/toolkit
nvidia-smi -L
Trials
Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Average | |
Picotte | 6.03 h (21707 sec) | 5.90 h (21230 sec) | 6.01 h (21623 sec) | 5.93 h (21349 sec) | 5.94 h (21378 sec) | 5.96 h (21457.5 sec) |
XSEDE - (PSC) | 9.02 h (32473 sec) | 9.06 h (32613 sec) | 9.01 h (32423 sec) | 9.04 h (32550 sec) | 9.03 h (32500 sec) | 9.03 h (32511.8 sec) |
Large Memory
Large memory benchmarking code located on Picotte at /beegfs/scratch/benchmarks/LM. Python code for large memory tests is XGBLM.py. Required subdirectories are InputLM and OutputLM.
Tests performed on Picotte 48 core BM nodes with a maximum of 1.5 TB of total available memory (1546799MB). Tests performed on XSEDE - PSC Bridges 64 core LM nodes with a maximum of 3 TB of total available memory (3096000MB). 1 Node with 10 slots and a maximum 512GB of memory was requested on both Picotte and XSEDE.
NOTE: Picotte benchmarks taken before general user access and XSEDE benchmarks taken during general user access.
Dependencies
python==3.8.5
pandas==1.1.1
sklearn(scikit-learn)==0.23.2
matplotlib==3.3.1
xgboost==1.2.0 (minimum of 1.0.2 required to run)
Trials
211GB Dataset
Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Average | |
Picotte | 01:20:40 | 01:20:33 | 01:20:37 | 01:20:34 | 01:20:40 | 01:20:36.8 |
XSEDE - (PSC) | 02:10:26 | 02:05:49 | 02:15:51 | 02:04:50 | 02:07:50 | 02:08:57.2 |
371GB Dataset
Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 | Average | |
Picotte | 02:12:59 | 02:13:02 | 02:13:00 | 02:12:58 | 02:13:09 | 02:13:01.6 |
XSEDE - (PSC) | 03:35:33 | 03:48:02 | 03:32:28 | 04:09:14 | 04:09:55 | 03:51:02.4 |
BeeGFS Storage
BeeGFS is a high-performance parallel file system.[1]
Network Benchmark
- Follow instructions: https://community.mellanox.com/s/article/howto-configure-and-test-beegfs-with-rdma
- Using iozone[2]
- Results:
[root@picottemgmt]# taskset -c 0-12 ~/BeeGFS/iozone/iozone3_434/src/current/iozone -i0 -r2m -s128g -x -t12
Iozone: Performance Test of File I/O
Version $Revision: 3.434 $
Compiled for 64 bit mode.
Build: linux
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
Vangel Bojaxhi, Ben England, Vikentsi Lapa,
Alexey Skidanov.
Run began: Wed Aug 19 11:30:46 2020
Record Size 2048 kB
File size set to 134217728 kB
Stonewall disabled
Command line used: /root/BeeGFS/iozone/iozone3_434/src/current/iozone -i0 -r2m -s128g -x -t12
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 12 processes
Each process writes a 134217728 kByte file in 2048 kByte records
Children see throughput for 12 initial writers = 12066132.12 kB/sec
Parent sees throughput for 12 initial writers = 11999579.71 kB/sec
Min throughput per process = 999969.31 kB/sec
Max throughput per process = 1012962.00 kB/sec
Avg throughput per process = 1005511.01 kB/sec
Min xfer = 134217728.00 kB
Children see throughput for 12 rewriters = 12050098.31 kB/sec
Parent sees throughput for 12 rewriters = 11979829.23 kB/sec
Min throughput per process = 998330.44 kB/sec
Max throughput per process = 1018436.94 kB/sec
Avg throughput per process = 1004174.86 kB/sec
Min xfer = 134217728.00 kB
iozone test complete.
- Remember to do this after completion:
echo 0 > /proc/fs/beegfs/*/netbench_mode
Write Benchmark
- beegfs-ctl --storagebench --write --alltargets --blocksize=512K --size=128G --threads=15 --wait
Server benchmark status:
Finished: 15
Write benchmark results:
Min throughput: 2319500 KiB/s nodeID: beegfs003-numa0-2 [ID: 5], targetID: 5
Max throughput: 3024892 KiB/s nodeID: beegfs001-numa1-3 [ID: 9], targetID: 9
Avg throughput: 2731805 KiB/s
Aggregate throughput: 40977080 KiB/s
Read Benchmark
- beegfs-ctl --storagebench --read --alltargets --blocksize=512K --size=128G --threads=15 --wait
Server benchmark status:
Finished: 15
Read benchmark results:
Min throughput: 2662166 KiB/s nodeID: beegfs003-numa0-3 [ID: 6], targetID: 6
Max throughput: 3732209 KiB/s nodeID: beegfs001-numa1-3 [ID: 9], targetID: 9
Avg throughput: 2951499 KiB/s
Aggregate throughput: 44272491 KiB/s
After read benchmark is done, cleanup benchmark files:
beegfs-ctl --storagebench --cleanup --alltargets
References
[1] BeeGFS website