A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems. Zhiqi Tao, Sr. System Engineer Lugano, March

Similar documents
Architecting a High Performance Storage System

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

IBM Emulex 16Gb Fibre Channel HBA Evaluation

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Parallel File Systems. John White Lawrence Berkeley National Lab

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Parallel File Systems for HPC

High-Performance Lustre with Maximum Data Assurance

NetApp High-Performance Storage Solution for Lustre

Aziz Gulbeden Dell HPC Engineering Team

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Performance Report: Multiprotocol Performance Test of VMware ESX 3.5 on NetApp Storage Systems

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

RIGHTNOW A C E

Dell TM Terascala HPC Storage Solution

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Feedback on BeeGFS. A Parallel File System for High Performance Computing

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Isilon Performance. Name

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

libhio: Optimizing IO on Cray XC Systems With DataWarp

DELL Terascala HPC Storage Solution (DT-HSS2)

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

White Paper. File System Throughput Performance on RedHawk Linux

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Network Request Scheduler Scale Testing Results. Nikitas Angelinas

Lessons learned from Lustre file system operation

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

Deep Learning Performance and Cost Evaluation

Cisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

Xyratex ClusterStor6000 & OneStor

PRESENTATION TITLE GOES HERE

Deep Learning Performance and Cost Evaluation

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

The modules covered in this course are:

Application Performance on IME

CSCS HPC storage. Hussein N. Harake

InfiniBand Networked Flash Storage

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Implementing Storage in Intel Omni-Path Architecture Fabrics

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Using DDN IME for Harmonie

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

vsan 6.6 Performance Improvements First Published On: Last Updated On:

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

SFA12KX and Lustre Update

Extraordinary HPC file system solutions at KIT

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

An Overview of Fujitsu s Lustre Based File System

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

SoftNAS Cloud Performance Evaluation on AWS

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Experiences with HP SFS / Lustre in HPC Production

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Extremely Fast Distributed Storage for Cloud Service Providers

Milestone Solution Partner IT Infrastructure Components Certification Report

CMS experience with the deployment of Lustre

Bridging the peta- to exa-scale I/O gap

Deployment Planning and Optimization for Big Data & Cloud Storage Systems

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

DELL Reference Configuration Microsoft SQL Server 2008 Fast Track Data Warehouse

High Performance Computing. NEC LxFS Storage Appliance

JMR ELECTRONICS INC. WHITE PAPER

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Challenges in making Lustre systems reliable

Enterprise2014. GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Copyright 2003 VERITAS Software Corporation. All rights reserved. VERITAS, the VERITAS Logo and all other VERITAS product names and slogans are

Demonstration Milestone for Parallel Directory Operations

Running VMware vsan Witness Appliance in VMware vcloudair First Published On: April 26, 2017 Last Updated On: April 26, 2017

ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION

Cost and Performance benefits of Dell Compellent Automated Tiered Storage for Oracle OLAP Workloads

Progress on Efficient Integration of Lustre* and Hadoop/YARN

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

A Comparative Study of Microsoft Exchange 2010 on Dell PowerEdge R720xd with Exchange 2007 on Dell PowerEdge R510

ArcGIS Enterprise: Performance and Scalability Best Practices. Darren Baird, PE, Esri

The Fusion Distributed File System

Transcription:

A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems Zhiqi Tao, Sr. System Engineer Lugano, March 15 2013 1

Outline Introduction o Anatomy of a storage system o Performance Methodology o The top-down approach or the bottom-up approach o Pipeline approach Case Study Build up benchmarking profiles Conclusion 2

Introduction - Anatomy of a storage system A storage system consists of a good mix of hardware and software. Many aspects to be taken into account when architecting an enterprise storage system capacity, performance, reliability, scalability, managementability and often the most important factor COST. 3

Introduction - Questions Two questions are often associated with high performance storage systems o How to improve the performance of my storage system? o What is the bottleneck of a storage system? I m here to share a generic methodology of analyzing performance bottlenecks and examining the efficiency. o By no means to be the best practice o Simply share my personal experience o Hope to be useful for someone who are interested in the same topic o Appreciate any comments and suggestions 4

Introduction - Performance Review these questions o How to improve the performance of my storage system? o What is the bottleneck of a storage system? I often turn the question around how efficiently your storage system is? o We need to set a realistic expectation. o Too good to be true often brings some consequences. o Did I mention a well designed Lustre storage system can achieve 90% of underlining hardware bandwidth. Catch up with me after the talk 5

Performance bottle neck Performance is the most notice-able measurement. o IO modules are commonly the slowest component of a computing system, comparing with CPU, RAM etc al. o Wasting CPU cycle when there are lots of iowaits o High performance is an important factor of being cost-effective. o Achieving high efficiency is even harder. That s what I m here for. Contact Intel High Performance Data Divison at hpdd-info@intel.com It generally requires years experience to architect a well balanced high performance system. o It is generally easier to use proven open technologies or something you ve experienced with. o There is a scientific methodology we can follow. 6

Technical White Paper Architecting a High Performance Storage System By Zhiqi Tao, Andreas Dilger, Eric Barton, Bryon Neitzel Designing a large scale, high performance storage system presents significant challenges. This paper describes a step-by-step approach to designing a storage system and presents a design methodology based on an iterative approach that applies at both the component level and the overall system level. The paper includes a detailed case study in which a Lustre storage system is designed using the approach and methodology presented. http://www.whamcloud.com/resources/architecting-a-high-performance-storage-system/ 7

Performance bottle neck It is not uncommon to see a seemingly well designed storage system that does not deliver the expect performance. o Might be some factors we did not consider in the design o Might be some hardware were not as good as they claim to be. o Might be a bad luck we happened to receive a faulty batch o Might be the limitation in the software, for example Metadata performance before Lustre 2.3. o Might be some tune-ables required Troubleshooting performance bottleneck is what I do. o Contact Intel High Performance Data Divison at hpdd-info@intel.com o My methodology follows 8

Methodology Top-Down vs. Bottom-Up Top - Down Bo8om - Up 9

Methodology - Top-Down Top - Down Trace down the bo.leneck Like peeling onions or finding a rabbit in the forest O9en requires special tools In- depth knowledge of the en>re stack Hard to generalize the results Time- consuming react to an issue occurred Finger poin>ng 10

Methodology Bottom-Up Bo8om - Up Equally difficult and requires special tools, knowledge and >me- consuming 11

Methodology My motivation Proactive instead reactive o Take steady steps instead of rushing to the end and debugging from there. o There are few things we could do after the system is built. Or at least it would just take more time. I like generalize-able results and encourage collaboration. o An easy-to-follow methodology o Use generally available tools o Narrow down the bottleneck to a small scope without requiring indepth knowledge of the whole stack o Then engage with subject-matter expert (SME) Firstly, let s look at the storage system from a different analogy. 12

Methodology PipeLine The components of a storage system are aligned like a pipeline. For each IO operation, data flows through the pipeline. Obviously, the faster and more reliable the flow, the better pipeline is. The narrowest point in the pipeline determines the throughput of the pipeline. Disks and Enclosures Storage Controller SAN HBA/ NIC NAS Servers NIC Cluster Networ k Clients clients clients clients Storage Layers in the Pipeline 13

Methodology Pipe Line It is important to understand the specification of each component It is important to understand the overhead each layer introduces Disks and Enclosures Storage Controller SAN HBA/ NIC NAS Servers NIC Cluster Networ k Clients clients clients clients Storage Layers in the Pipeline Let us try the methodology on something I m familiar with. 14

Case Study a Lustre Storage System 10GbE 15

Case Study Backend Storage Object Storage Controller 1 60 Disk Enclosure 60 Disk Enclosure Sub-Controller Module A Sub-Controller Module B Lustre OSS 1 Cache Mirroring Link Controller 2 60 Disk Enclosure 60 Disk Enclosure Sub-Controller Module A Sub-Controller Module B Lustre OSS 2 16

Case Study Backend Storage SAS Connec>on to the storage controller 60x 3TB 7200rpm Every 10 disks as one RAID6 group 17

Case Study Backend Storage Lustre can work with any block device but Lustre has no knowledge or control on the backend storage. Hidden caches must be protected Lustre has built-in mechanism to protect caches visible to Lustre. Cache Mirroring and Battery backed cache on storage controllers Turn off HBA cache Turning off cache (disable cache and cache mirroring) sometimes gives us better performance. 18

Case Study Backend Storage Tool sets to analyze backend storage Vdbench Swiss army knife http://sourceforge.net/projects/vdbench/ sgpdd_survey (shipped with lustre-iokit rpm) thrlo=1 thrhi= 256 crglo=1 crghi=256 size= twice of system memory Section 24.2 Lustre Operations Manual http://wiki.whamcloud.com/display/pub/documentation DD would not be a good choice. We want to study how the storage responds to multiple IO threads 19

Case Study Storage Servers Effectively two systems would give us same performance How many OSTs would be the best fit for an OSS server? 20

Case Study Storage Servers Tool sets o Obdfilter-survey simulate Lustre work loads o thrlo: low counts of threads o thrhi: high counts of threads o nobjlo: low counts of objects to read/write o nobjhi: high counts of objects to read/write o size: Total IO size in MB o targets: names of obdfilter instances o Section 24.3 Lustre Operations Manual http://wiki.whamcloud.com/display/pub/documentation 21

Case Study - Network Consideration An un-optimized network architecture can potentially be a limiting factor. 2:1 oversubscribed InfiniBand Fabric 36-port IB switch 24 ports 12 ports 36-port IB switch 24 ports Non-oversubscribed InfiniBand Fabric 36-port IB switch 18 ports 18 ports 36-port IB switch 18 ports 22

Case Study - Network Consideration Tools Sets o ib_write_bw, ib_read_bw shipped in the perftest rpm o LNET Selftest - measure network throughput and RPC operations in Lustre environment o o o o o o o 1:1, 1:Multiple, Multiple:Multiple IO sizes (size=) The number of requests that are active at one time (concurrency=) Bulk data transfer (brw read/write) Small request message (ping) With and without data checksum (check=) Chapter 23 Lustre Operations Manual http://wiki.whamcloud.com/display/pub/documentation 23

Case Study Client application 10GbE It is the performance measured on clients that ma.ers 24

Case Study Client application Tools Sets o IOZone o Read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read, pread,mmap, aio_read, aio_write o http://www.iozone.org/ o IOR o Support POSIX, MPI_IO, HDF5 or NCMPI api for IO. o http://sourceforge.net/projects/ior-sio/ o Client Applications 25

Conclusion Proactive - Take steady steps to build performance profiles before performance issues occurred. o Understand what each component is capable of and their overhead Use generally available tool sets Look out for System Utilizations and Saturations and possibly errors o iostat, top, mpstat, sar, etc al. o Intel Manager for Lustre o All-in-one dash board, CPU, RAM, File system IO (both MetaData and Read/Write workloads) etc al o Aggregated logs o Syntax highlighted alerts. 26

Thank You zhiqi.tao@intel.com 27