Automated Verifica/on of I/O Performance. F. Delalondre, M. Baerstchi. EPFL/Blue Brain Project - confiden6al

Similar documents
Automated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

Economic Viability of Hardware Overprovisioning in Power- Constrained High Performance Compu>ng

Operational Robustness of Accelerator Aware MPI

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Analytics in the cloud

An Overview of Fujitsu s Lustre Based File System

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Parallel I/O on JUQUEEN

From Rack Scale to Network Scale: NVMe Over Fabrics Enables Exabyte Applica>ons. Zivan Ori, CEO & Co-founder, E8 Storage

IBM Blue Gene/Q solution

I/O Monitoring at JSC, SIONlib & Resiliency

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

OPENFABRICS INTERFACES: PAST, PRESENT, AND FUTURE

CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY

Outline. March 5, 2012 CIRMMT - McGill University 2

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Outline. In Situ Data Triage and Visualiza8on

I/O and Scheduling aspects in DEEP-EST

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

Stockholm Brain Institute Blue Gene/L

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

The IBM Blue Gene/Q: Application performance, scalability and optimisation

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Blue Waters I/O Performance

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

CLOUD SERVICES. Cloud Value Assessment.

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Interconnect Your Future

HPC Storage Use Cases & Future Trends

Habanero Operating Committee. January

How to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

Implica(ons of Non Vola(le Memory on So5ware Architectures. Nisha Talagala Lead Architect, Fusion- io

Visualiza(on So-ware and Hardware for In- Silico Brain Research. Stefan Eilemann Visualiza0on Team Lead Blue Brain Project, EPFL

A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems. Zhiqi Tao, Sr. System Engineer Lugano, March

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

UCX: An Open Source Framework for HPC Network APIs and Beyond

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

MPI-IO Performance Optimization IOR Benchmark on IBM ESS GL4 Systems

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

A Case for High Performance Computing with Virtual Machines

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

PREDICTING COMMUNICATION PERFORMANCE

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

Latest Trends in Database Technology NoSQL and Beyond

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

IBM Information Technology Guide For ANSYS Fluent Customers

Blue Gene/Q A system overview

libhio: Optimizing IO on Cray XC Systems With DataWarp

Improved Solutions for I/O Provisioning and Application Acceleration

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Efficient Object Storage Journaling in a Distributed Parallel File System

Con$nuous Integra$on Development Environment. Kovács Gábor

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

Enterprise2014. GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

CSCS HPC storage. Hussein N. Harake

5.4 - DAOS Demonstration and Benchmark Report

Application Acceleration Beyond Flash Storage

The RAMDISK Storage Accelerator

Implementing MPI on Windows: Comparison with Common Approaches on Unix

Real Parallel Computers

Deep Learning Performance and Cost Evaluation

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Introduc)on to the RCE September 21, 2010 Len Wisniewski

Parallel File Systems. John White Lawrence Berkeley National Lab

Improving Data Movement Performance for Sparse Data Patterns on the Blue Gene/Q Supercomputer

Readme for Platform Open Cluster Stack (OCS)

Data storage services at KEK/CRC -- status and plan

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

High Performance MPI on IBM 12x InfiniBand Architecture

Redefining x86 A New Era of Solutions

Future Routing Schemes in Petascale clusters

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

Single-Points of Performance

Multifunction Networking Adapters

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Application Performance on IME

NAMD Performance Benchmark and Profiling. November 2010

Con$nuous Deployment with Docker Andrew Aslinger. Oct

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

believe in more SDN for Datacenter A Simple Approach

Enabling web-based interactive notebooks on geographically distributed HPC resources. Alexandre Beche

HPC Performance in the Cloud: Status and Future Prospects


Benchmarking computers for seismic processing and imaging

Interconnect Your Future

Birds of a Feather Presentation

Design and Evaluation of a 2048 Core Cluster System

Progress Report on Transparent Checkpointing for Supercomputing

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Transcription:

Automated Verifica/on of I/O Performance F. Delalondre, M. Baerstchi

Requirements Support Scien6sts Crea6vity Minimize Development 6me Maximize applica6on performance

Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me)

Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me) Goal: Regression tes6ng & fast input to Developer/System Engineer

Performance Analysis System Performance Applica6on Performance Applica6on on System Performance (Real 6me) Goal: Regression tes6ng & fast input to Developer/System Engineer

Scien/fic Use Cases Interac6ve Supercompu6ng Tradi6onal Applica6on Use Case using GPFS File System

Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers

Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers

Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers Monitoring

Interac/ve Supercompu/ng Machine u6liza6on does not maoer Time to scien6fic delivery maoers Steering Monitoring

Interac/ve Supercompu/ng Data Path 4096 Blue Gene/Q Compute Nodes 40 Nodes IdataPlex 64x2x2 GB/s 256 GB/s 1 4 2 IB 64 Blue Gene/Q I/O Nodes 3 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

Regular Use Case Data Path 4096 Blue Gene/Q Compute Nodes 40 Nodes IdataPlex 64x2x2 GB/s 256 GB/s Infiniband Switch 8 40x56 Gb/s 280 GB/s 7 10x2x56Gb/s 135 GB/s 5 6 10 GSS Servers 10x12x6Gb/s 72 GB/s 9 GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s

Regression Tes/ng & Performance Benchmark System I/O Regression Tes/ng Regression of System a]er Maintenance? Is the system delivering maximum performance?

Regression Tes/ng & Performance Benchmark System I/O Regression Tes/ng Regression of System a]er Maintenance? Is the system delivering maximum performance? Input to Developers & System Engineers System performance (bandwidth, latency, ) Scaling numbers: I/O fabric satura6on point Best I/O configura6on (block size, )

Tes/ng Framework For each path, test performance & scaling & I/O parameters All tests must be fully scripted (no manual interven6on) Tests include IOR, NsdPerf, Qperf, gpfsperf, Ib_read_*, ib_write_* Tests executed using Jenkins Con6nuous Integra6on Framework

IOR to I/O Node Memory 4096 Blue Gene/Q Compute Nodes 40 Nodes IdataPlex 64x2x2 GB/s 256 GB/s 1 IB 64 Blue Gene/Q I/O Nodes 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

IOR to I/O Node Memory IBM Blue Gene/Q I/O scaling cnk - IO node memory Memory Bandwith (MB/s) 180000 160000 140000 120000 100000 80000 60000 40000 20000 Posix w Posix r MPI IO w MPI IO r 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Number of Nodes

IOR to I/O Node Memory IBM Blue Gene/Q I/O scaling cnk - IO node memory 180000 217 GB/s Memory Bandwith (MB/s) 160000 140000 120000 100000 80000 60000 40000 20000 Posix w Posix r MPI IO w MPI IO r 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Number of Nodes Write performance scaling loss >2 racks (~74% peak [1]), almost linear scaling every 5-10 runs (~94% peak) Read opera6on twice slower but linear scaling (~56% peak) To be tested at larger scale Why is it important? Interac6ve Supercompu6ng (ISC) [1] D. Chen, N.A. Eisley, P. Heidelberger, R.M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D.L. SaOerfield, B. Steinmacher- Burow, J.J. Parker, The IBM Blue Gene/Q Interconnec6on Network and Message Unit, SC11 Proceedings, Networking, Storage and Analysis, 2011

IB Test - I/O Nodes to Viz Nodes 4096 Blue Gene/Q Compute Nodes 64x2x2 GB/s 256 GB/s 2 IB 40 Nodes IdataPlex 64 Blue Gene/Q I/O Nodes 3 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

IB Test - I/O Nodes to Viz Nodes Test Setup Pair (I/O nodes, Cluster Node) Increase Number of nodes up to 40 Observed Performance per Node & Outliers Detec6on of misconfigura6on/faulty card

IB Test - I/O Nodes to Viz Nodes 4096 Blue Gene/Q Compute Nodes 64x2x2 GB/s 256 GB/s 2 IB 40 Nodes IdataPlex 64 Blue Gene/Q I/O Nodes 3 40x56 Gb/s 280 GB/s IB 64x40 Gb/s 320 GB/s Infiniband Switch

IOR to Disk 64 Blue Gene/Q I/O Nodes 40 Nodes IdataPlex 64x40 Gb/s 320 GB/s 40x56 Gb/s 280 GB/s Infiniband Switch 10x2x56Gb/s 135 GB/s 5 6 10 GSS Servers 10x12x6Gb/s 72 GB/s GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s

IOR to Disk Test Setup Read/write, MPI/Posix Various transfer sizes & access paoerns Observed Satura6on at 41 GB/s in op6mal configura6on Crashing research system when performing IOPS (4k) & large GPFS block size

I/O Nodes to GSS Servers 64 Blue Gene/Q I/O Nodes 40 Nodes IdataPlex 64x40 Gb/s 320 GB/s Infiniband Switch 8 40x56 Gb/s 280 GB/s 7 10x2x56Gb/s 135 GB/s 10 GSS Servers 10x12x6Gb/s 72 GB/s 9 GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s

I/O Nodes to GSS Servers 64 Blue Gene/Q I/O Nodes 64x40 Gb/s 320 GB/s 8: NSDperf Qperf/ib 40 Nodes IdataPlex 40x56 Gb/s 280 GB/s 7: Nsdperf /qperf/ib Infiniband Switch 10x2x56Gb/s 135 GB/s 10 GSS Servers What service can we run/install On GSS servers? 10x12x6Gb/s 72 GB/s GSS Disk Drives 177 SAS Disk/Server 50Mb/s per disk => 88 GB/s 9: What is the best test?

Performance Analysis System Performance Applica/on Performance Applica/on on System Performance (Real /me)

Can we go one step further? Reduce HPC development cycle by fast trouble shoo6ng Monitoring HPC/Simula6on plarorm real 6me & provide input to BBP Portal

Building an HPC Development Tool Building/simula6on Hardware monitoring Console Ok/not ok HW So]ware/Hardware mapping So]ware monitoring Whole Infrastructure Ok/not ok SW BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - confiden6al BG/Q

Building an HPC Development Tool Building/simula6on Hardware monitoring Console Ok/not ok HW So]ware/Hardware mapping So]ware monitoring Against DB Ok/not ok SW Whole Infrastructure BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - confiden6al BG/Q

Building an HPC Development Tool Git Graphical Interface Responsible Patch set Console Performance DB & Graph So]ware monitoring Perf Numbers Ok/not ok SW EPFL/Blue Brain Project - confiden6al Graph

Building an HPC Development Tool Git Graphical Interface Responsible Patch set Console Profiling Performance DB & Graph So]ware monitoring Perf Numbers Ok/not ok SW EPFL/Blue Brain Project - confiden6al Graph

Building an HPC Development Tool Vtune intel cluster(x86) HPM - Extrae Scalasca (BG/Q) Profiling So]ware/Hardware mapping Recording BG/Q Whole Infrastructure So]ware monitoring Recording Ok/not ok SW BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - confiden6al

Building an HPC Development Tool Vtune intel cluster(x86) HPM - Extrae Scalasca (BG/Q) Debugging So]ware/Hardware mapping Recording BG/Q Whole Infrastructure So]ware monitoring Recording Ok/not ok SW BG/Q Cluster EPFL Cluster Lugano EPFL/Blue Brain Project - confiden6al

Building an HPC Development Tool Debugger Interface Responsible Patch set Console Debugging So]ware monitoring So]ware/Hardware mapping Whole Infrastructure BG/Q Ok/not ok SW BG/Q Cluster Lugano Cluster EPFL

Thank you