Technology Testing at CSCS including BeeGFS Preliminary Results. Hussein N. Harake CSCS-ETHZ

Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake CSCS-ETHZ

Agenda About CSCS About the Systems Integration (SI) Unit Technology Overview DDN IME DDN WOS OpenStack BeeGFS Case Study What is BeeGFS? Test System Layout Tuning Monitoring Benchmark tools Results Next Steps Q&A CSCS 2016 2

CSCS (Swiss National Supercomputing Centre) Founded in 1991 Enables world-class research with a scientific user lab Available to domestic and international researchers through a transparent, peer-reviewed allocation process. Open to academia and are available as well to users from industry and the business sector. Operated by ETH Zurich and is located in Lugano. CSCS 2016 3

24 years of supercomputers at CSCS 1991 NEC SX3 5.5 GF Adula 1996 NEC SX4 10 GF Gottardo 1999 NEC SX5 64 GF Prometeo 2002 IBM SP4 1.3 TF Venus 2005 Cray XT3 5.8 TF Palu 2006 IBM P5 4.5 TF Blanc 2009-12 Cray XE6 402 TF Monte Rosa 2012-13 Cray XC30 7.7 PF Piz Daint 2014 XC30 1.25 PF Piz Daint extension 4

Data Centre - 2000 sq.m Machine Room - 20 MW of power and Cooling capacity - Lake Water cooling - 700 Liters/s CSCS 2016 5

Overview of Systems Integration (SI) Unit Unit missions: - Managing projects - Relations with Vendors - Evaluating Technologies - Software deployments

Technology Overview DDN IME Image courtesy of DDN CSCS 2016 7

Tchnology Overview DDN WOS (1) CSCS 2016 8 Image courtesy of DDN

Technology Overview DDN WOS (2) CSCS 2016 9

Technology Overview DDN WOS (3) CSCS 2016 10

Technology Overview - OpenStack Image source: https://www.openstack.org/software/ CSCS 2016 11

BeeGFS Case Study

What is BeeGFS? Parallel filesystem HPC oriented Used to be called FhGFS Alternative to Lustre and GPFS Developed by Fraunhofer Open-source Support delivered by ThinkParq Image courtesy of BeeGFS 13

Basic Features of BeeGFS Supports failover for data and Metadata using applications like Peacemaker, heartbeat Replication failover mechanism Supports Multiple data and metadata on both servers and targets Supports quota Uses Robin-hood to scan the entire filesystem Beegfs on demand filesystem (BeeOND) Easy to deploy and manage CSCS 2016 14

BeeOND - Create a filesystem on Demand - Uses the hard drive / SSDs on every compute node - Filesystem get created by submitting a job to the schedule We are working on confirming SLURM support - Memory could used instead of SSDs - We used 20 SSDs on 20 nodes for our tests CSCS 2016 15

Benefits of BeeOND Benefits from unused space No impact on the parallel filesystem Real utilization of the high speed network Filesystem scales with the compute nodes Open point: What is the overhead on the compute nodes? CSCS 2016 16

Test System Layout One couplet (two controllers) 4 * FDR Links Two X86 servers One enclosure 60 drives DDN 7700 6 SSDs one raid volume 6 * 9 Raid 5 volumes 2 * FDR Links Dual sockets SB 128GB memory Fabric 1 * FDR Links CSCS 2016 17

Tuning the servers echo 5 > /proc/sys/vm/dirty_background_ratio echo 20 > /proc/sys/vm/dirty_ratio echo 50 > /proc/sys/vm/vfs_cache_pressure echo 262144 > /proc/sys/vm/min_free_kbytes echo always > /sys/kernel/mm/transparent_hugepage/enabled echo always > /sys/kernel/mm/transparent_hugepage/defrag for dev in dm-0 dm-1 dm-2 dm-3 dm-4 dm-5 dm-6 do echo deadline > /sys/block/$dev/queue/scheduler echo 4096 > /sys/block/$dev/queue/nr_requests echo 32768 > /sys/block/$dev/queue/read_ahead_kb echo 32767 > /sys/block/$dev/queue/max_sectors_kb done echo performance tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo 1 > /proc/sys/vm/zone_reclaim_mode Documentation for the tuned parameters: https://www.kernel.org/doc/documentation/sysctl/vm.txt https://access.redhat.com/solutions/46111 http://www.slideshare.net/rampalliraj/linux-kernel-io-schedulers?from_action=save CSCS 2016 18

Monitoring clients activities (1) CSCS 2016 19

Monitoring servers activities (2) CSCS 2016 20

Benchmark tools Mdtest measuring metadata https://sourceforge.net/projects/mdtest/ IOzone throughput read and write http://www.iozone.org CSCS 2016 21

Iozone results on /beegfs Test running: Children see throughput for 64 initial writers = 5032700.90 kb/sec Min throughput per process = 63754.09 kb/sec Max throughput per process = 103798.58 kb/sec Avg throughput per process = 78635.95 kb/sec Min xfer = 12880896.00 kb Test running: Children see throughput for 64 rewriters = 4996297.63 kb/sec Min throughput per process = 68781.82 kb/sec Max throughput per process = 90666.23 kb/sec Avg throughput per process = 78067.15 kb/sec Min xfer = 16473088.00 kb Test running: Children see throughput for 64 readers = 4225632.91 kb/sec Min throughput per process = 40047.24 kb/sec Max throughput per process = 77678.61 kb/sec Avg throughput per process = 66025.51 kb/sec Min xfer = 10813440.00 kb Test running: Children see throughput for 64 re-readers = 4253662.00 kb/sec Min throughput per process = 56998.73 kb/sec Max throughput per process = 76042.87 kb/sec Avg throughput per process = 66463.47 kb/sec Min xfer = 15729664.00 kb CSCS 2016 22

Mdtest results on BeeOND Directory creation Directory Stat Directories per second 120000 100000 80000 60000 40000 20000 0 1 2 4 8 16 20 Numer of MDSs Directories per second 900000 800000 700000 600000 500000 400000 300000 200000 100000 0 1 2 4 8 16 20 Numer of MDSs Directories per second Directory Stat 160000 140000 120000 100000 80000 60000 40000 20000 0 1 2 4 8 16 20 Numer of MDSs CSCS 2016 23

Mdtest results on BeeOND File Creation File Stat 300000 900000 Files per second 250000 200000 150000 100000 50000 Files per second 800000 700000 600000 500000 400000 300000 200000 100000 0 1 2 4 8 16 20 Numer of MDSs 0 1 2 3 4 5 6 Numer of MDSs File removal 250000 Files per second 200000 150000 100000 50000 0 1 2 3 4 5 6 Numer of MDSs CSCS 2016 24

Next steps Scaling on bigger cluster Verifying the fail over procedures Verify the BeeOND overhead on compute nodes Using Nvme instead of SSDs Using tmpfs Create BeeOND through SLURM jobs Use Robinhood to scan millions of files CSCS 2016 25

Q&A hussein@cscs.ch 26