CMS experience with the deployment of Lustre

Similar documents
Online data handling and storage at the CMS experiment

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Xyratex ClusterStor6000 & OneStor

Introduction to NetApp E-Series E2700 with SANtricity 11.10

Architecting Storage for Semiconductor Design: Manufacturing Preparation

NetApp High-Performance Storage Solution for Lustre

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Dell PowerVault NX Windows NAS Series Configuration Guide

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Experiences with HP SFS / Lustre in HPC Production

An Overview of Fujitsu s Lustre Based File System

VX3000-E Unified Network Storage

How to Speed up Database Applications with a Purpose-Built SSD Storage Solution

Challenges in making Lustre systems reliable

Microsoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Technical Computing Suite supporting the hybrid system

Efficient Object Storage Journaling in a Distributed Parallel File System

Hardware Owner s Manual

White Paper. EonStor GS Family Best Practices Guide. Version: 1.1 Updated: Apr., 2018

Rapid database cloning using SMU and ZFS Storage Appliance How Exalogic tooling can help

Architecting a High Performance Storage System

Exam : Title : Storage Sales V2. Version : Demo

Parallel File Systems for HPC

2012 HPC Advisory Council

Cisco MCS 7845-H1 Unified CallManager Appliance

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

IBM System Storage DS5020 Express

The Virtualized Server Environment

Surveillance Dell EMC Isilon Storage with Video Management Systems

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Introducing Panasas ActiveStor 14

High Performance Storage Solutions

DATA PROTECTION IN A ROBO ENVIRONMENT

Dell Storage NX Windows NAS Series Configuration Guide

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Dell PowerVault MD Family. Modular Storage. The Dell PowerVault MD Storage Family

VX1800 Series Unified Network Storage

Oracle RAC 10g Celerra NS Series NFS

INFINIDAT Storage Architecture. White Paper

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

IBM Storwize V7000 Unified

DELL STORAGE NX WINDOWS NAS SERIES CONFIGURATION GUIDE

Quest DR Series Disk Backup Appliances

DELL EMC NX WINDOWS NAS SERIES CONFIGURATION GUIDE

Current Topics in OS Research. So, what s hot?

DELL Terascala HPC Storage Solution (DT-HSS2)

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

POWERMAX FAMILY. Appliance-based packaging. PowerMax 2000 and Specifications. Specification Sheet. PowerMax Arrays

STOREONCE OVERVIEW. Neil Fleming Mid-Market Storage Development Manager. Copyright 2010 Hewlett-Packard Development Company, L.P.

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

DELL EMC DATA DOMAIN DEDUPLICATION STORAGE SYSTEMS

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

IBM System Storage DS4800

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

EMC SYMMETRIX VMAX 10K

Accelerate Applications Using EqualLogic Arrays with directcache

Storageflex HA3969 High-Density Storage: Key Design Features and Hybrid Connectivity Benefits. White Paper

The advantages of architecting an open iscsi SAN

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

vstart 50 VMware vsphere Solution Specification

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

Lustre overview and roadmap to Exascale computing

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

Functional Testing of SQL Server on Kaminario K2 Storage

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Feedback on BeeGFS. A Parallel File System for High Performance Computing

21 TB Data Warehouse Fast Track for Microsoft SQL Server 2014 Using the PowerEdge R730xd Server Deployment Guide

DELL EMC DATA DOMAIN DEDUPLICATION STORAGE SYSTEMS

Lessons learned from Lustre file system operation

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary

Architecting High Performance Computing Systems for Fault Tolerance and Reliability

Shared Object-Based Storage and the HPC Data Center

Using DDN IME for Harmonie

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

Lustre / ZFS at Indiana University

Selecting Server Hardware for FlashGrid Deployment

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

Cisco MCS 7835-H2 Unified Communications Manager Appliance

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

DELL EMC DATA DOMAIN DEDUPLICATION STORAGE SYSTEMS

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

SUN ZFS STORAGE 7X20 APPLIANCES

NetApp SolidFire and Pure Storage Architectural Comparison A SOLIDFIRE COMPETITIVE COMPARISON

The ClusterStor Engineered Solution for HPC. Dr. Torben Kling Petersen, PhD Principal Solutions Architect - HPC

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

Quest DR Series Disk Backup Appliances

Transcription:

CMS experience with the deployment of Lustre Lavinia Darlea, on behalf of CMS DAQ Group MIT/DAQ CMS April 12, 2016 1 / 22

Role and requirements CMS DAQ2 System Storage Manager and Transfer System (SMTS) in the DAQ chain SMTS and DAQ input: output of the Data AcQuisition chain Lustre FS: ensure safe temporary storage output: transfer to Tier0 2 / 22

Role and requirements Data Flow into Lustre: overview FU FU FU nfs nfs FU nfs nfs FU nfs FU nfs BU Minimerger BU Minimerger lustre lustre 40GbE IB (56Gb) Lustre FS 3 / 22

Role and requirements Data Flow into Lustre: minimergers BU BU lustre mount, normal data lustre mount, DQM histograms HLT rates lustre mount, DQM histograms HLT rates lustre mount, normal data Lustre FS 40GbE IB (56Gb) 4 / 22

Role and requirements Data Flow into and out of Lustre: macromergers and transfer mv MRG macromerger fasthadd, etc Lustre FS MRG transfer xrdcp open area closed area nfs 40GbE IB (56Gb) DQM BU Ecal area Etc. EOS 5 / 22

Role and requirements Storage and Transfer System Requirements Storage and Transfer Requirement In Out Total Space 250TB Mergers Bandwidth 3GB/s 0.3GB/s 3.3GB/s Transfers Bandwidth - 3GB/s 3GB/s Total Bandwidth 3GB/s 3.3GB/s 6.3GB/s Nb of files* 2840 files/ls 2780 files/min 2840 files/min *In: create; Out: destroy computation: (20 streams x 1 data file)/ls, (20 streams x 2 jsns x 70 BU)/LS, (1 lock file x 20 streams)/ls 6 / 22

Lustre FS Implementation Lustre File System Implementation Lustre FS architecture current Intel Enterprise Edition for Lustre version: 2.2.0.2 servers: 6 DELL R720 2 MDS nodes in active/passive failover mode 4 OSS nodes, each controls 6 OSTs in pairs of active/passive failover mode Rack view MDT (low), 1 OST controller and 1 disk shelves expansion enclosure 7 / 22

Lustre FS Implementation Lustre File System Implementation Meta Data Configuration 16 drives of 1TB in 1 volume group, 8 hot spares only 10% of the disks capacity is used in order to increase performance partitions: 10GB for MGT (special partition which serves as entry point for the clients connections), 1TB for MDT redundancy: RAID6 MDT: NetApp E2724 front and rear view 8 / 22

CMS experience with Lustre FS Lustre FS Implementation Lustre File System Implementation Object Storage Configuration 2 OST controllers: NetApp E5560 each controller manages one disk expansion enclosure DE6600 each disk shelf enclosure contains 60 disks of 2TB each Front OST total raw disk space: 240 disks x 2TB = 480 TB physical installation: 2 racks, 1 controller and its expansion enclosure per rack Disk shelves 9 / 22

Lustre FS Implementation Lustre File System Implementation OST: Volume Configuration each controller/expansion shelf is organized in 6 RAID6 volume groups (8+2 disks) the volume groups are physically allocated vertically to ensure resilience to single shelf damage total usable space: 349TB Volumes configuration 10 / 22

Lustre FS Implementation Lustre File System Implementation High Availability volumes distribution provides full shelf failure redundancy all volumes are RAID6 all devices (controllers, shelves, servers) are dual powered (normal and UPS) all servers configured in active/passive failover mode via corosync/pacemaker: MDS in neighbouring racks, OSS within the same rack LFS nominal availability: 40GbE and InfiniBand (56Gb) data networks* * not in failover mode 11 / 22

Lustre FS Implementation Lustre File System Control and Monitoring SANtricity GUI monitoring bandwidth usage per controller reports detailed text bandwidth usage per volume provides useful information and alerts on hardware status 12 / 22

Lustre FS Implementation Lustre File System Control and Monitoring IML: Intel Management for Lustre (+) mostly used for control and base FS operations (failover, startup, shutdown) (+) the dashboard provides useful information for debugging an overloaded system (-) painful installation procedure (-) not fully reliable: fake BMC monitoring warnings, false status reports upon major FS failures 13 / 22

Bandwidth Validation A few numbers The plotted values are per controller. The 2 controllers were perfectly balanced. Commissioning Acceptance Proven steady 10GB/s rate in r/w mode Merger emulation Proven steady 7.5GB/s rate 14 / 22

A few numbers Validation LFS bandwidth benchmarking Emulation tests using the production computing cluster tests performed using different fractions of the available computing farm obvious non linear behaviour with the number of BUs transfer system (read operations) were not considered during the tests saturation is expected around 8.5GB/s 15 / 22

Production Usage CMS experience with Lustre FS A few numbers DAQ tests, March 2016 Heavy Ions runs, December 2015 16 / 22

Operational Observations Important lessons Sensitive points Lustre is extremely sensitive to network glitches. It needs a very stable and reliable network. Adverse effects can go from individual clients being evicted to the entire FS shutting down Lustre is very greedy in terms of resources on the clients unless nominally limited it will take up to 75% of the total RAM for its caching unless nominally limited it will take a huge amount of slab memory to cache its objects (ldlm locks) Lustre is very greedy in terms of resources on the servers MDS ideal setup: the MDT should fit entirely in the RAM OSS: by default cache everything. They should be prevented from doing so 17 / 22

Weekend operation CMS experience with Lustre FS Operational Observations 18 / 22

Conclusion Conclusion SMTS team interaction with Lustre IML can be misleading, but provides very intuitive ways of controlling the FS sub optimal application architecture can artificially increase the load on the FS. Continuous tuning is being performed both at the application and FS level a few FS issues have been identified, but they have been mostly fixed clients recover pretty fast and painlessly after FS unavailability lustre and NetApp s E-Series seem to play nicely together and they deliver the required bandwidth performance Intel Lustre support team is reliable, knowledgeable and patient. But located mostly on a different continent 19 / 22

Conclusion Conclusion Mergers monitoring sample Mergers delays sample SMTS Behaviour mostly stable behaviour in 1 year of production running mode general latencies within the requirements a few notable glitches, have been followed up and mostly solved 20 / 22

Conclusion Questions? 21 / 22

Stories... CMS experience with Lustre FS Backup and stories Event display of one of the first particle splashes seen in CMS during Run2 22 / 22

Stories... CMS experience with Lustre FS Backup and stories Event display of one of the first particle splashes seen in CMS during Run2... only a few minutes before one of the OSS servers crashed... 22 / 22

Stories... CMS experience with Lustre FS Backup and stories Event display of one of the first particle splashes seen in CMS during Run2... only a few minutes before one of the OSS servers crashed...... and the failover mechanism failed... 22 / 22