Analyzing I/O Performance on a NEXTGenIO Class System

Similar documents
I/O Profiling Towards the Exascale

NEXTGenIO Performance Tools for In-Memory I/O

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

ECMWF s Next Generation IO for the IFS Model

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

ECMWF's Next Generation IO for the IFS Model and Product Generation

Fujitsu SC 18. Human Centric Innovation Co-creation for Success 2018 FUJITSU

Monitoring and evaluating I/O performance of HPC systems

Comprehensive Lustre I/O Tracing with Vampir

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

PCIe Storage Beyond SSDs

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Improved Solutions for I/O Provisioning and Application Acceleration

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Powernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems

Capabilities and System Benefits Enabled by NVDIMM-N

Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS)

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

* Contributed while interning at SAP. September 1 st, 2017 PUBLIC

Profiling: Understand Your Application

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

Leveraging Flash in HPC Systems

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

THOUGHTS ABOUT THE FUTURE OF I/O

Adrian Proctor Vice President, Marketing Viking Technology

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

VMware vsphere Virtualization of PMEM (PM) Richard A. Brunner, VMware

SSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh

Flash Memory Summit Persistent Memory - NVDIMMs

STAR-CCM+ Performance Benchmark and Profiling. July 2014

A New Key-Value Data Store For Heterogeneous Storage Architecture

HPC Storage Use Cases & Future Trends

Vampir and Lustre. Understanding Boundaries in I/O Intensive Applications

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München

Fast Forward I/O & Storage

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Best Practices for Setting BIOS Parameters for Performance

Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

March NVM Solutions Group

Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM

Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

I/O at the Center for Information Services and High Performance Computing

Software and Tools for HPE s The Machine Project

Emerging Technologies for HPC Storage

Introduction to Operating Systems. Chapter Chapter

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Warehouse- Scale Computing and the BDAS Stack

NVMe SSDs with Persistent Memory Regions

RDMA Requirements for High Availability in the NVM Programming Model

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory

Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Persistent Memory over Fabrics

Exploiting the benefits of native programming access to NVM devices

Using DDN IME for Harmonie

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop

Creating Storage Class Persistent Memory With NVDIMM

Efficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han

Efficiency Evaluation of the Input/Output System on Computer Clusters

Kaladhar Voruganti Senior Technical Director NetApp, CTO Office. 2014, NetApp, All Rights Reserved

Memory and Storage-Side Processing

Programming for Fujitsu Supercomputers

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

IBM Power Systems HPC Cluster

Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

EIOW Exa-scale I/O workgroup (exascale10)

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

READEX: A Tool Suite for Dynamic Energy Tuning. Michael Gerndt Technische Universität München

COS 318: Operating Systems. Journaling, NFS and WAFL

The Long-Term Future of Solid State Storage Jim Handy Objective Analysis

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Overcoming System Memory Challenges with Persistent Memory and NVDIMM-P

Challenges in HPC I/O

Coordinating Parallel HSM in Object-based Cluster Filesystems

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION

Brent Gorda. General Manager, High Performance Data Division

Allinea Unified Environment

Lustre* - Fast Forward to Exascale High Performance Data Division. Eric Barton 18th April, 2013

Performance Analysis with Vampir

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

AcuSolve Performance Benchmark and Profiling. October 2011

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Diagnosing Performance Issues in Cray ClusterStor Systems

PASTE: A Network Programming Interface for Non-Volatile Main Memory

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

Introducing OTF / Vampir / VampirTrace

Cray Performance Tools Enhancements for Next Generation Systems Heidi Poxon

Transcription:

Analyzing I/O Performance on a NEXTGenIO Class System holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden LUG17, Indiana University, June 2 nd 2017

NEXTGenIO Fact Sheet Project Research & Innovation Action 36 month duration 8.1 million Partners EPCC INTEL FUJITSU BSC TUD ALLINEA ECMWF ARCTUR June 2nd, 2017 LUG17 2

Approx. 50% Committed to Hardware Development Intel DIMMs are a Key feature Non-volatile RAM Much larger capacity than DRAM Slower than DRAM By a certain factor Significantly faster than SSDs 12 DIMM slots per socket Combination of DDR4 and Intel DIMMs How can files systems like Lustre benefit? June 2nd, 2017 LUG17 3

Three Usage Models The memory usage model Extension of the main memory Data is volatile like normal main memory The storage usage model Classic persistent block device Like a very fast SSD The application direct usage model Maps persistent storage into address space Direct CPU load/store instructions June 2nd, 2017 LUG17 4

New Members in Memory Hierarchy New memory technology Changes the memory hierarchy we have Impact on applications e.g. simulations? I/O performance is one of the critical components for scaling applications Lustre serves as primary file system socket socket socket DIMM IO 1x 10x 10x 100,000x backup CPU Register Cache Spinning storage disk 10,000x DRAM Storage tape HPC systems today Memory & Storage Latency Gaps socket CPU 1x Quo Register vadis? socket 10x Lustre + NVRAM + SSD? socket Cache DIMM 10x 10x DRAM DIMM Memory NVRAM IO IO 100x 100x Storage SSD Spinning storage disk 1,000x backup Storage disk - MAID backup 10x Storage tape HPC systems of the future June 2nd, 2017 LUG17 5

Using Distributed Storage Lustre global file system No changes to apps Support for NVRAM? Required functionality Create and tear down file systems for jobs Works across nodes Preload and postmove filesystems Support multiple filesystems across system e.g. Lustre I/O Performance Sum of many layers Filesystem Memory Node Memory Node Memory Memory Memory Memory Node Node Node Node Network Filesystem June 2nd, 2017 LUG17 6

Using an Object Store Needs changes in apps Needs same functionality as global filesystem Removes need for POSIX functionality I/O Performance Different type of abstraction Mapping to objects Different kind of Instrumentation Memory Node Object store Indirect Lustre scenario Memory Node Memory Memory Memory Memory Node Node Node Node Network Filesystem June 2nd, 2017 LUG17 7

Towards Workflows Resident data sets Sharing preloaded data across a range of jobs Data analytic workflows How to control access/authorisation/securi ty/etc.? Workflows Producer-consumer model Remove file system from intermediate stages I/O Performance Data merging/integration? Job 1 Job 2 Job 2 Job 4 Job 3 Job 2 Job 2 Job 4 Filesystem June 2nd, 2017 LUG17 8

New Tool Key Objectives Analysis tools need to Reveal performance interdependencies in I/O and memory hierarchy Support workflow visualization Exploit NVRAM to store data themselves (Workload modelling) June 2nd, 2017 LUG17 9

Vampir & Score-P June 2nd, 2017 LUG17 10

Tapping I/O Layers I/O layers Lustre File System Client side Server side Kernel POSIX MPI-I/O HDF5 NetCDF PNetCDF June 2nd, 2017 LUG17 11

Lustre I/O performance data Event data AND aggregated numbers Open/Create/Close operations (meta data) Data transfer operations Client side procfs via PAPI-Components counters, Score-P (user space) Server side Data base (custom made, system space) Mapping to nodes/applications June 2nd, 2017 LUG17 12

What the NVM Library Tells Us Allocation and free events Information Memory size (requested, usable) High Water Mark metric Size and number of elements in memory NVRAM health status Not measurable at high frequencies Individual NVRAM load/stores Remain out of scope (e.g. memory mapped files) June 2nd, 2017 LUG17 13

I/O Operations over Time Individual I/O Operation I/O Runtime Contribution June 2nd, 2017 LUG17 14

I/O Data Rate over Time I/O Data Rate of single thread June 2nd, 2017 LUG17 15

I/O Summaries with Totals Other Metrics: IOPS I/O Time I/O Size I/O Bandwidth June 2nd, 2017 LUG17 16

I/O Summaries per File Aggregated datafor specific resource June 2nd, 2017 LUG17 17

I/O Operations per File Focus on specific resource Show all resources June 2nd, 2017 LUG17 18

Taken from My Daily Work... Bringing the Lustre system I/O down with a single (serial) application Higher I/O demand than IOR benchmark Why? Starting point: Too much I/O (Bandwidth) on Lustre FS!!! June 2nd, 2017 LUG17 19

Coarse Grained Time Series Reveal Some Clue, but... Single Thread Lustre I/O June 2nd, 2017 LUG17 Bandwidth over time 20

Details Make a Difference A single NetCDF get_vara_float triggers... June 2nd, 2017...15! POSIX read operations LUG17 21

Approaching the Real Cause Even worse: NetCDF reads 136kb to provide just 2kb A single NetCDF get_vara_float triggers... June 2nd, 2017...15! POSIX read operations LUG17 22

Impact of Lustre Prefetching June 2nd, 2017 LUG17 23

Before and After June 2nd, 2017 LUG17 24

Summary NEXTGenIO developing a full hardware and software solution Solution includes Lustre Performance focus Consider complete I/O stack Incorporate new I/O paradigms Study implications of NVRAM Reduce I/O costs New usage models for HPC and HPDA June 2nd, 2017 LUG17 25

Supplementary Slides

NVRAM allocation over time June 2nd, 2017 LUG17 27

MAP Timeline June 2nd, 2017 LUG17 28

Vampir Timeline June 2nd, 2017 LUG17 29

Future Stats June 2nd, 2017 LUG17 30

Visualization of memory access statistics Currently, Vampir has many different views presenting counter metrics Future: Absolute number of memory accesses performed by an application for different types of memory June 2nd, 2017 LUG17 31

Future Stats June 2nd, 2017 LUG17 32