Automated Verifica/on of I/O Performance. F. Delalondre, M. Baerstchi. EPFL/Blue Brain Project - confiden6al

Size: px

Start display at page:

Download "Automated Verifica/on of I/O Performance. F. Delalondre, M. Baerstchi. EPFL/Blue Brain Project - confiden6al"

Blaze Parrish
6 years ago
Views:

1 Automated Verifica/on of I/O Performance F. Delalondre, M. Baerstchi

2 Requirements Support Scien6sts Crea6vity Minimize Development 6me Maximize applica6on performance

3 Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

4 Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

5 Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

6 Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

7 Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me) Goal: Regression tes6ng & fast input to Developer/System Engineer

8 Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me) Goal: Regression tes6ng & fast input to Developer/System Engineer

9 Scien/fic Use Cases Interac6ve Supercompu6ng Tradi6onal Applica6on Use Case using GPFS File System

10 Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers

11 Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers

12 Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers Monitoring

13 Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers Steering Monitoring

14 Interac/ve Supercompu/ng Data Path 4096 Blue Gene/Q Compute Nodes 40 Nodes IdataPlex 64x2x2 GB/s 256 GB/s IB 64 Blue Gene/Q I/O Nodes 3 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

15 Regular Use Case Data Path 4096 Blue Gene/Q Compute Nodes 40 Nodes IdataPlex 64x2x2 GB/s 256 GB/s Infiniband Switch 8 40x56 Gb/s 280 GB/s 7 10x2x56Gb/s 135 GB/s GSS Servers 10x12x6Gb/s 72 GB/s 9 GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s

16 Regression Tes/ng & Performance Benchmark System I/O Regression Tes/ng Regression of System a]er Maintenance? Is the system delivering maximum performance?

17 Regression Tes/ng & Performance Benchmark System I/O Regression Tes/ng Regression of System a]er Maintenance? Is the system delivering maximum performance? Input to Developers & System Engineers System performance (bandwidth, latency, ) Scaling numbers: I/O fabric satura6on point Best I/O configura6on (block size, )

18 Tes/ng Framework For each path, test performance & scaling & I/O parameters All tests must be fully scripted (no manual interven6on) Tests include IOR, NsdPerf, Qperf, gpfsperf, Ib_read_*, ib_write_* Tests executed using Jenkins Con6nuous Integra6on Framework

19 IOR to I/O Node Memory 4096 Blue Gene/Q Compute Nodes 40 Nodes IdataPlex 64x2x2 GB/s 256 GB/s 1 IB 64 Blue Gene/Q I/O Nodes 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

20 IOR to I/O Node Memory IBM Blue Gene/Q I/O scaling cnk - IO node memory Memory Bandwith (MB/s) Posix w Posix r MPI IO w MPI IO r Number of Nodes

IOR to I/O Node Memory IBM Blue Gene/Q I/O scaling cnk - IO node memory 180000 217 GB/s Memory Bandwith (MB/s) 160000 140000 120000 100000 80000 60000 40000 20000 Posix w Posix r MPI IO w MPI IO r 0

21 IOR to I/O Node Memory IBM Blue Gene/Q I/O scaling cnk - IO node memory GB/s Memory Bandwith (MB/s) Posix w Posix r MPI IO w MPI IO r Number of Nodes Write performance scaling loss >2 racks (~74% peak [1]), almost linear scaling every 5-10 runs (~94% peak) Read opera6on twice slower but linear scaling (~56% peak) To be tested at larger scale Why is it important? Interac6ve Supercompu6ng (ISC) [1] D. Chen, N.A. Eisley, P. Heidelberger, R.M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D.L. SaOerfield, B. Steinmacher- Burow, J.J. Parker, The IBM Blue Gene/Q Interconnec6on Network and Message Unit, SC11 Proceedings, Networking, Storage and Analysis, 2011

22 IB Test - I/O Nodes to Viz Nodes 4096 Blue Gene/Q Compute Nodes 64x2x2 GB/s 256 GB/s 2 IB 40 Nodes IdataPlex 64 Blue Gene/Q I/O Nodes 3 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

23 IB Test - I/O Nodes to Viz Nodes Test Setup Pair (I/O nodes, Cluster Node) Increase Number of nodes up to 40 Observed Performance per Node & Outliers Detec6on of misconfigura6on/faulty card

24 IB Test - I/O Nodes to Viz Nodes 4096 Blue Gene/Q Compute Nodes 64x2x2 GB/s 256 GB/s 2 IB 40 Nodes IdataPlex 64 Blue Gene/Q I/O Nodes 3 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

25 IOR to Disk 64 Blue Gene/Q I/O Nodes 40 Nodes IdataPlex 64x40 Gb/s 320 GB/s 40x56 Gb/s 280 GB/s Infiniband Switch 10x2x56Gb/s 135 GB/s GSS Servers 10x12x6Gb/s 72 GB/s GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s

26 IOR to Disk Test Setup Read/write, MPI/Posix Various transfer sizes & access paoerns Observed Satura6on at 41 GB/s in op6mal configura6on Crashing research system when performing IOPS (4k) & large GPFS block size

27 I/O Nodes to GSS Servers 64 Blue Gene/Q I/O Nodes 40 Nodes IdataPlex 64x40 Gb/s 320 GB/s Infiniband Switch 8 40x56 Gb/s 280 GB/s 7 10x2x56Gb/s 135 GB/s 10 GSS Servers 10x12x6Gb/s 72 GB/s 9 GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s

28 I/O Nodes to GSS Servers 64 Blue Gene/Q I/O Nodes 64x40 Gb/s 320 GB/s 8: NSDperf Qperf/ib 40 Nodes IdataPlex 40x56 Gb/s 280 GB/s 7: Nsdperf /qperf/ib Infiniband Switch 10x2x56Gb/s 135 GB/s 10 GSS Servers What service can we run/install On GSS servers? 10x12x6Gb/s 72 GB/s GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s 9: What is the best test?

29 Performance Analysis System Performance Applica/on Performance Applica/on on System Performance (Real /me)

30 Can we go one step further? Reduce HPC development cycle by fast trouble shoo6ng Monitoring HPC/Simula6on plarorm real 6me & provide input to BBP Portal

31 Building an HPC Development Tool Building/simula6on Hardware monitoring Console Ok/not ok HW So]ware/Hardware mapping So]ware monitoring Whole Infrastructure Ok/not ok SW BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - conﬁden6al BG/Q

32 Building an HPC Development Tool Building/simula6on Hardware monitoring Console Ok/not ok HW So]ware/Hardware mapping So]ware monitoring Against DB Ok/not ok SW Whole Infrastructure BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - conﬁden6al BG/Q

33 Building an HPC Development Tool Git Graphical Interface Responsible Patch set Console Performance DB & Graph So]ware monitoring Perf Numbers Ok/not ok SW EPFL/Blue Brain Project - conﬁden6al Graph

34 Building an HPC Development Tool Git Graphical Interface Responsible Patch set Console Proﬁling Performance DB & Graph So]ware monitoring Perf Numbers Ok/not ok SW EPFL/Blue Brain Project - conﬁden6al Graph

35 Building an HPC Development Tool Vtune intel cluster(x86) HPM - Extrae Scalasca (BG/Q) Proﬁling So]ware/Hardware mapping Recording BG/Q Whole Infrastructure So]ware monitoring Recording Ok/not ok SW BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - conﬁden6al

36 Building an HPC Development Tool Vtune intel cluster(x86) HPM - Extrae Scalasca (BG/Q) Debugging So]ware/Hardware mapping Recording BG/Q Whole Infrastructure So]ware monitoring Recording Ok/not ok SW BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - conﬁden6al

37 Building an HPC Development Tool Debugger Interface Responsible Patch set Console Debugging So]ware monitoring So]ware/Hardware mapping Whole Infrastructure BG/Q Ok/not ok SW BG/Q Cluster Lugano Cluster EPFL

38 Thank you

Automated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows

Automated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows J. Bernard 1, P. Morjan 2, B. Hagley 3, F. Delalondre 1, F. Schürmann 1,