Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Similar documents
Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Extraordinary HPC file system solutions at KIT

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Lustre overview and roadmap to Exascale computing

Challenges in making Lustre systems reliable

Feedback on BeeGFS. A Parallel File System for High Performance Computing

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Lessons learned from Lustre file system operation

InfiniBand Networked Flash Storage

Integration Path for Intel Omni-Path Fabric attached Intel Enterprise Edition for Lustre (IEEL) LNET

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Parallel File Systems. John White Lawrence Berkeley National Lab

SDSC s Data Oasis Gen II: ZFS, 40GbE, and Replication

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Parallel File Systems for HPC

Introduction to Lustre* Architecture

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Vendor must indicate at what level its proposed solution will meet the College s requirements as delineated in the referenced sections of the RFP:

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS)

IOPStor: Storage Made Easy. Key Business Features. Key Business Solutions. IOPStor IOP5BI50T Network Attached Storage (NAS) Page 1 of 5

Xyratex ClusterStor6000 & OneStor

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Sheldon D Paiva, Nimble Storage Nick Furnell, Transform Medical

QNAP OpenStack Ready NAS For a Robust and Reliable Cloud Platform

Application Performance on IME

SurFS Product Description

Parallel File Systems Compared

Hitachi HH Hitachi Data Systems Storage Architect-Hitachi NAS Platform.

Clearswift Hosting Options

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

Veritas NetBackup on Cisco UCS S3260 Storage Server

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

From an open storage solution to a clustered NAS appliance

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange Enabled by MirrorView/S

IBM Storwize V7000 Unified

Network Storage Appliance

White Paper. EonStor GS Family Best Practices Guide. Version: 1.1 Updated: Apr., 2018

HIGH PERFORMANCE STORAGE SOLUTION PRESENTATION All rights reserved RAIDIX

Refining and redefining HPC storage

Robin Hood 2.5 on Lustre 2.5 with DNE

Assistance in Lustre administration

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

SFA12KX and Lustre Update

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Cloud-Oriented Converged Storage

Installation and Cluster Deployment Guide

Cloud-Oriented Converged Storage

Introducing Panasas ActiveStor 14

"Software-defined storage Crossing the right bridge"

DELL STORAGE NX WINDOWS NAS SERIES CONFIGURATION GUIDE

EMC VPLEX Geo with Quantum StorNext

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

DELL EMC NX WINDOWS NAS SERIES CONFIGURATION GUIDE

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

The ClusterStor Engineered Solution for HPC. Dr. Torben Kling Petersen, PhD Principal Solutions Architect - HPC

Architecting a High Performance Storage System

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Lustre usages and experiences

Disaster Recovery-to-the- Cloud Best Practices

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

HP D2D & STOREONCE OVERVIEW

MAHA. - Supercomputing System for Bioinformatics

<Insert Picture Here> Oracle Storage

The Fastest Scale-Out NAS

CMS experience with the deployment of Lustre

Emerging Technologies for HPC Storage

HPC NETWORKING IN THE REAL WORLD

Dell Storage NX Windows NAS Series Configuration Guide

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

HP StorageWorks D2D Backup Systems and StoreOnce

OceanStor 5300F&5500F& 5600F&5800F V5 All-Flash Storage Systems

EMC Integrated Infrastructure for VMware. Business Continuity

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST

Datura The new HPC-Plant at Albert Einstein Institute

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

EMC CLARiiON CX3-80 EMC Metropolitan Recovery for SQL Server 2005 Enabled by Replication Manager and MirrorView/S

Multi-Rail LNet for Lustre

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

The Spider Center-Wide File System

The BioHPC Nucleus Cluster & Future Developments

Transcription:

Mission-Critical Lustre at Santos Adam Fox, Lustre User Group 2016

About Santos One of the leading oil and gas producers in APAC Founded in 1954 South Australia Northern Territory Oil Search Cooper Basin Largest employer in South Australia Unix team support Geoscience operations Seismic data processing Trying to find new gas and oil pockets Challenging low oil price environment 2

Why Lustre? Initial Requirements for Lustre Geoscience storage nearing capacity and hardware end-of-life High overhead to maintain SAN storage allocated in 16TB LUNs, shared via NFS Over 9000 automount entries in LDAP to map storage structure to user-friendly filesystem layout NFS service availability SAN performance under high load Wanted a storage solution that could scale for performance and capacity, using commodity components Geoscientists connect to HPC via TurboVNC for full 3D interactivity Red Hat Innovation winner 2011 3

Why Lustre? Consulting Process Thorough tender process for choosing a vendor and partner to implement Lustre Initial Consulting Contract As-Is, To-Be to review environment Partnership of Datacom / Dell / Intel was successful Pilot: 700TB Lustre filesystem Production: Grow to 1.3PB DR: 530TB Lustre filesystem for replicated production data Intel Enterprise Edition for Lustre* (IEEL) 2.2.0.0 Intel Manager for Lustre (IML) Mellanox InfiniBand switches and network cards Clustered Samba servers for CIFS gateway Utilise existing NFS servers 4

Implementation Pilot MDS Server 12x 800GB SSD, RAID 10 2x OSS Server pairs 24x OSTs 10x 4TB SAS HDD, RAID 6 700TB total usable space FDR 56Gb InfiniBand Fat tree, non blocking Get some scratch space available for Geoscientists quickly Get a feel for Lustre in Santos environment 5

Implementation Pilot Implementation of pilot Lustre storage went really smooth User acceptance testing Performance Availability Plan was to re-use existing NFS servers as Lustre clients InfiniBand cards installed, Lustre client software installed Lustre was slower on existing NFS servers Older CPU generation Purchased 2x Dell PowerEdge R630 s for NFS and use CTDB for clustering NFS NFS stale filehandles Worked with Intel Support to resolve upgrade to IEEL 2.4 6

Implementation Samba & NFS CTDB Dell / Intel reference architecture Previous bad experiences with Linux HA solutions /lustre/.ctdb/recovery_lock Lustre Storage Heartbeat, pacemaker InfiniBand CTDB handles failover of IP between nodes Round-robin DNS 192.168.1.1 192.168.1.2 x-over Samba Servers So far so good, no clustering fails Samba 4 vs Samba 3 10Gb Ethernet samba1.santos.com 10.1.1.10 10.1.1.11 samba2.santos.com No CTDB support in Samba 4 on RHEL 6 samba.santos.com. IN A 10.1.1.10 samba.santos.com. IN A 10.1.1.11 CTDB also used for NFS serviing 1Gb Ethernet Samba clients 7

Implementation Puppet Puppet for configuration management IML still does Lustre server configuration Mainly used for Lustre client configuration LNET module options Enable InfiniBand Mount options 8

Implementation Collectd Monitoring and Graphing Stacked graph of per-ost read/write traffic Stacked graph of per Lustre client OSC read/write Handy for troubleshooting what Lustre client is causing most IO 9

Implementation Collectd Monitoring and Graphing Disk usage per OST/MDT device Lustre client metadata operations MDT filled up went into reserved space Counters from /proc/fs/lustre/llite/stats 10

Implementation Production Install additional 2x OSS pairs 24 OSTs 48 OSTs 700TB 1.3PB IML made for easy expansion Configured Lustre servers on Ethernet network 40GB/s read 24GB/s write 11

Implementation Disaster Recovery Separate Lustre installation 1x OSS server pair 12x OSTs 10x 6TB SAS HDD RAID 6 520TB total usable space Only production data Replicated from prod site DR Samba and NFS servers 12

Replication Initial plan was to use rsnapshot / rsync for replication between sites Hard link files that do not change Lots of unstructured data Most data does not change, but many files Hitting limit of metadata updates on MDS Settled on 6 concurrent rsync processes Perl script drive fpart and rsync processes Make it rsnapshot-like snapshot retention 30 million files, 380TB of data Rsync not point-in-time consistent Rsnapshot too slow to replicate fpart Sort files and pack them into partitions https://github.com/martymac/fpart fpsync wrapper script for fpart and rsync Use fpart to walk filesystem tree, feed partitions into a set number of rsync processes 13

Data Centre Switch All going to plan, and then Swap the data centres around! All went really smooth Added delay to final cutover Server room cooling failure Secondary site more reliable Lustre first to move Move before data migration Risk mitigation Use COPE Sensitive Freight to move whole racks 14

Production Cutover Finally get to do final migration and cutover to Lustre Cutover process fairly straightforward: Stop all NFS clients Perform final rsync Update automounts in LDAP Restart autofs Wait for Monday morning rush! No major issues 32-bit applications running over NFS did not like 64-bit inodes Resolved by moving to direct Lustre client handles 32-bit system calls 15

Production Cutover Outstanding Issues Application performance with large amount of small IO or metadata operations E.g.: 6.6 million stat system calls (4002 files stat ed 1661 times) Using NFS in front of Lustre seems to mask the effect, not ideal though 16

Future Improvements Upgrade to IEEL 3 Parallel metadata updates Set LNET peer credits cat /proc/sys/lnet/peers large negative numbers Need Lustre outage to set credits and peer_credits on Lustre clients Implement Robinhood Manage unstructured data Could feed into replication process don t need to walk whole filesystem each time 17