Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

Similar documents
Lustre / ZFS at Indiana University

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Lustre Metadata Fundamental Benchmark and Performance

Demonstration Milestone for Parallel Directory Operations

OST data migrations using ZFS snapshot/send/receive

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Comet Virtualization Code & Design Sprint

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

Isilon Performance. Name

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Experiences with HP SFS / Lustre in HPC Production

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

High-Performance Lustre with Maximum Data Assurance

Pexip Infinity Server Design Guide

Parallel File Systems for HPC

Red Hat Enterprise 7 Beta File Systems

SFA12KX and Lustre Update

Power Systems with POWER8 Scale-out Technical Sales Skills V1

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

Challenges of High-IOPS Enterprise-level NVMeoF-based All Flash Array From the Viewpoint of Software Vendor

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

MILC Performance Benchmark and Profiling. April 2013

CPMD Performance Benchmark and Profiling. February 2014

SONAS Best Practices and options for CIFS Scalability

Magento Performance Testing

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Analytics of Wide-Area Lustre Throughput Using LNet Routers

VMmark V1.0.0 Results

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

An Exploration of New Hardware Features for Lustre. Nathan Rutman

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Emerging Technologies for HPC Storage

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Lustre overview and roadmap to Exascale computing

Number of Hosts: 2 Uniform Hosts [yes/no]: yes Total sockets/cores/threads in test: 4/32/64

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Architecting a High Performance Storage System

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

NetApp High-Performance Storage Solution for Lustre

Evolving HPC Solutions Using Open Source Software & Industry-Standard Hardware

Introduction to the Cisco ASAv

5.4 - DAOS Demonstration and Benchmark Report

NAMD Performance Benchmark and Profiling. January 2015

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

vsan 6.6 Performance Improvements First Published On: Last Updated On:

The next step in Software-Defined Storage with Virtual SAN

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

NEMO Performance Benchmark and Profiling. May 2011

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

Jetstream: A science & engineering cloud

ABySS Performance Benchmark and Profiling. May 2010

Number of Hosts: 2 Uniform Hosts [yes/no]: yes Total sockets/cores/threads in test: 4/32/64

Roads to Zester. Ken Rawlings Shawn Slavin Tom Crowe. High Performance File Systems Indiana University

CSCS HPC storage. Hussein N. Harake

Data Movement & Storage Using the Data Capacitor Filesystem

Application Acceleration Beyond Flash Storage

Metadata Performance Evaluation LUG Sorin Faibish, EMC Branislav Radovanovic, NetApp and MD BWG April 8-10, 2014

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

Illinois Proposal Considerations Greg Bauer

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Technical Paper. Performance and Tuning Considerations for SAS on Dell EMC VMAX 250 All-Flash Array

Remote Directories High Level Design

Consulting Solutions WHITE PAPER Citrix XenDesktop XenApp 6.x Planning Guide: Virtualization Best Practices

Number of Hosts: 2 Uniform Hosts [yes/no]: yes Total sockets/cores/threads in test: 4/64/64

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Chapter 3 Virtualization Model for Cloud Computing Environment

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

The Oracle Database Appliance I/O and Performance Architecture

Configuring SR-IOV. Table of contents. with HP Virtual Connect and Microsoft Hyper-V. Technical white paper

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

An Introduction to the SPEC High Performance Group and their Benchmark Suites

Multi-tenancy: a real-life implementation

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

Configuration Section Configuration

InfiniBand Networked Flash Storage

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

Discriminating Hierarchical Storage (DHIS)

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

EMC VSPEX FOR VIRTUALIZED MICROSOFT EXCHANGE 2013 WITH MICROSOFT HYPER-V

EsgynDB Enterprise 2.0 Platform Reference Architecture

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Indiana University s Lustre WAN: The TeraGrid and Beyond

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems. Zhiqi Tao, Sr. System Engineer Lugano, March

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Transcription:

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms Research Technologies High Performance File Systems hpfs-admin@iu.edu Indiana University

Indiana University Metadata Current Status Multiple Compute Clusters Over 150 Disciplines served Mixed workloads, various I/O patterns Current Metadata Challenge - Single MDS/MDT comprised from 24 SAS drives (RAID-10) - over 1B inodes - Lustre 2.1.6 with plans to move to 2.5.X soon. More metadata performance please - SSD + DNE2 = goodness? Very heavy metadata workflows can harm other users - Can we use multiple Virtual MDS to isolate unique users?

Distributed Namespace Environment (DNE) DNE Phase 1 Lustre 2.4 Enables deployment of multiple MDTs on one or more MDS nodes create directories on a specific remote MDT DNE Phase 2 preview in Lustre 2.6/2.7 to be released in 2.8 Enables deployment of striped directories on multiple MDS nodes Improved versatility

Distributed NamespacE (DNE) Remote Directory Parent Remote File.0 File.1 Subdir.0 MDT0 Subdir.1 MDT1

Distributed NamespacE (DNE) Striped Directory Dir.0 Dir.0 Dir.0 File.0 File.1 File.2 Subdir.1 MDT0 Subdir.1 MDT1 Subdir.1 MDT2

Building Blocks (6) Servers, identical specs HP ProLiant DL380p Generation8 (Gen8) Dual socket Intel(R) Xeon(R) 2x E5-2667v2 "Ivy Bridge-EP" @ 3.30GHz 8 core 128GB - (16) 8GB @ 1866MHz memory HP Smart Array P830 controller with 4GB battery backed cache (6) Intel SSD DC S3500 drives (800GB drives) (1) SAS drive (146GB, 15,000 RPM) Mellanox ConnectX-3

Building Blocks (cont) (6) HP DL380p G8 Servers HP Smart Array 830p controller Intel SSD DC S3500

Building Blocks (cont) x 20

Logical Setup

Logical Setup Block Devices 50GB LUNs were provisioned from each drive, preserving 1:1 layout» 50GB LUNs allowed mkfs to complete in a reasonable time File System Options 8GB journal lazy_itable_init=0» Enabled by default resulting in file system activity directly following mkfs/mount

Methodology Software - mdsrate lustre aware metadata benchmark in Lustre test suite - operation - mknod (create with no OST object allocation) Wide parameter sweep 20 clients, 32 mounts each, for 640 mounts simulating 640 clients Varied number of directories from 1 to 128 by powers of 2 4 threads per directory, each on a separate mount point Directory stripe count increased matching MDT count Hardware Configurations Tested - Single MDS, multiple MDTs - Multiple MDSs, single MDT per MDS - Multiple MDSs, multiple MDTs per MDS

Results - mknod scaling with increasing MDS count 256 threads

Results - mknod scaling with increasing MDS count 256 threads 1 MDS x 1 MDT 1 MDS x 2 MDT 1 MDS x 4 MDT

Results - mknod scaling with increasing MDS count 256 threads 6 MDS x 1 MDT 6 MDS x 2 MDT 6 MDS x 4 MDT

Results - mknod scaling with increasing MDS count 256 threads CPU = 80% SSD% < 50%

Summary - DNE2 works - Metadata performance improves by increasing MDTs - Metadata performance improves by increasing MDSs - Adding MDSs (physical cores) overshadows adding MDTs - Performance in a single directory increases - Diminishing returns beyond 4 MDTs per MDS - Peak performance costs 80% CPU - Peak performance is only driving 50% of disk subsystem Could we increase performance using VMs to drive hardware harder?

Virtual? Really? For HPC? Well, talented people have looked at it Suichi Ihara Virtualizing Lustre LUG 2011 Robert Read Lustre on Amazon Web Services LUG 2013 Marc Stearman Per User Lustre File Systems LUG 2015 Virtual is taking up less and less overhead and is flexible: Resize the guest: bigger/smaller. Duplicate the guest. Snapshot the guest. Migrate the guest. All before your morning coffee break

A Possible Use Case Stearman articulated IU s situation pretty well at LUG 2015 Some people are bad actors where metadata are concerned and don t know it Some people have to run code that is metadata intense Some people want to have their own separate file system Why not use ZFS as Stearman described and create (if not user) project based file systems that put caps on size and performance. If needs are extreme or would burden other members of the research community, give them their own space.

Performance? Try SR-IOV Warned that performance would be terrible SR-IOV Single Root I/O Virtualization It allows a device to appear in the virtual world as multiple separate physical PCIe devices. A virtual guest thinks it has its own IB card Lnet Self Test over IB 3 physical - > 1 virtual VS 3 physical 1 physical

Virtual Configura`on 8 MDS 8 vcpu per guest oversubscribing physical cores 2:1 Same clients / servers / OS / Lustre as before Same tests as before with fewer clients All guests run through single ConnectX- 3 via SR- IOV LVM pass through allows flexibility

Virtual MDS (KVM) - multiuple unique MDS, same physical Hardware 1,2,4 and 8MDS - individual filesystems 250000 200000 150000 mknod/sec 100000 50000 0 np=4 ndirs=1 np=8 ndirs=2 np=16 ndirs=4 np=32 ndirs=8 np=64 ndirs=16 np=128 ndirs=32 np=256 ndirs=64 np=512 ndirs=128 mdsrate number processes / number directories 1 mds 2 mds 4 mds 8 mds

mknod/sec Virtual vs Physical MDS - DNE2 multiple MDS - single MDT per 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 np=4 ndirs=1 np=8 ndirs=2 np=16 ndirs=4 np=32 ndirs=8 np=64 ndirs=16 np=128 ndirs=32 np=256 ndirs=64 mdsrate number processes / number directories 2 virtual mds, 8 clients 4 virtual mds, 8 clients 8 virtual mds, 8 clients 2 physical mds, 20 clients

Conclusions Greater aggregate performance can be achieved using VMs 20% greater than best 1 MDS numbers Increase in service threads? Increase on bare metal showed no significant improvement Possibly a good fit for creating separate file systems for users DNE2 performance is worse on VMs No magic bullet here

Future Work MDS work Always more data to be taken and sifted through - Adding file creation to the mix - Mdsrate create in lieu of mknod Application testing - Trinity Bio code for example VM Work Testing and Creation of a pilot at IU

Acknowledgments IU s High Performance File System Team IU Scientific Application and Performance Tuning Team Matrix Integration Intel HP IU s Wrangler grant (NSF 13-528) partners TACC and ANL This material is based in part upon work supported by the National Science Foundation under Grant No. NSF 13-528. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Thank You! Ques`ons?