LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

Similar documents
Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

RobinHood Project Status

Zhang, Hongchao

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Nathan Rutman SC09 Portland, OR. Lustre HSM

MANAGING LUSTRE & ITS CEA

Robin Hood 2.5 on Lustre 2.5 with DNE

Lustre overview and roadmap to Exascale computing

Data Movement & Tiering with DMF 7

Lustre usages and experiences

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

ROBINHOOD POLICY ENGINE

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Parallel File Systems Compared

Application Performance on IME

Parallel File Systems for HPC

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Administering Lustre 2.0 at CEA

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Coordinating Parallel HSM in Object-based Cluster Filesystems

RobinHood Policy Engine

Introduction to Lustre* Architecture

SFA12KX and Lustre Update

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Parallel File Systems. John White Lawrence Berkeley National Lab

Enhancing Lustre Performance and Usability

LLNL Lustre Centre of Excellence

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Shared Object-Based Storage and the HPC Data Center

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Data Movement & Storage Using the Data Capacitor Filesystem

DL-SNAP and Fujitsu's Lustre Contributions

Fujitsu's Lustre Contributions - Policy and Roadmap-

An Overview of Fujitsu s Lustre Based File System

I/O at the Center for Information Services and High Performance Computing

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Andreas Dilger High Performance Data Division RUG 2016, Paris

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Challenges in making Lustre systems reliable

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

FROM RESEARCH TO INDUSTRY. RobinHood v3. Robinhood User Group Thomas Leibovici 16 septembre 2015

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Project Quota for Lustre

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Is it a good idea to create an HSM without tapes?

Fujitsu s Contribution to the Lustre Community

RobinHood Project Update

Andreas Dilger, Intel High Performance Data Division SC 2017

Improving overall Robinhood performance for use on large-scale deployments Colin Faber

Extraordinary HPC file system solutions at KIT

StorNext SAN on an X1

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Moving 6PB of data 2.5m West - NCI's migration from LTO-5 to TS1150

Andreas Dilger, Intel High Performance Data Division LAD 2017

Windows Server 2016 MCSA Bootcamp

Lustre Update. Brent Gorda. CEO Whamcloud, Inc. Linux Foundation Collaboration Summit San Francisco, CA April 4, 2012

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

LCOC Lustre Cache on Client

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

White paper Version 3.10

Filesystems on SSCK's HP XC6000

SDSC s Data Oasis Gen II: ZFS, 40GbE, and Replication

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

Course Outline 20740B. Module 1: Installing, upgrading, and migrating servers and workloads

Feedback on BeeGFS. A Parallel File System for High Performance Computing

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Lustre 2.12 and Beyond. Andreas Dilger, Whamcloud

Upstreaming of Features from FEFS

Lessons learned from Lustre file system operation

Using file systems at HC3

Community Release Update

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

Using DDN IME for Harmonie

COURSE 20740B: INSTALLATION, STORAGE AND COMPUTE ITH WINDOWS SERVER 2016

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-001 contract.

Experiences with HP SFS / Lustre in HPC Production

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

IBM ProtecTIER and Netbackup OpenStorage (OST)

Storage Supporting DOE Science

Data storage services at KEK/CRC -- status and plan

The JANUS Computing Environment

Lustre at the OLCF: Experiences and Path Forward. Galen M. Shipman Group Leader Technology Integration

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

Lustre 2.8 feature : Multiple metadata modify RPCs in parallel

Architecting Storage for Semiconductor Design: Manufacturing Preparation

5.4 - DAOS Demonstration and Benchmark Report

Community Release Update

Transcription:

LUG 2012 From Lustre 2.1 to Lustre HSM Lustre @ IFERC (Rokkasho, Japan) Diego.Moreno@bull.net

From Lustre-2.1 to Lustre-HSM - Outline About Bull HELIOS @ IFERC (Rokkasho, Japan) Lustre-HSM - Basis of Lustre-HSM - HSM patches and new layout lock - Testing Lustre-HSM - Lustre-HSM Roadmap Lustre backup - Robinhood-backup architecture @ IFERC - From robinhood-backup to Lustre-HSM Conclusion 2 Bull, 2012 From Lustre 2.1 to Lustre HSM

About Bull In the HPC: - 1 st European manufacturer - 3 petaflop systems in the last 18 months 16 TOP500 Bull's systems 14 12 10 8 #Systems in TOP500 6 4 2 0 Nov#08 Nov#09 Nov#10 Nov#11 3 Bull, 2012 From Lustre 2.1 to Lustre HSM

Bull & Lustre In Lustre community: - EOFS member - First Lustre 2.0 adopter - First Lustre 2.1 installation in a petaflop machine - Other lustre contributions: - NUMIOA architectures - Multi-attachment infiniband configuration - Multipath tuning patch - Redhat 6 kernel adaptations (2.6.32) 4 Bull, 2012 From Lustre 2.1 to Lustre HSM

Helios@IFERC (Rokkasho, Japan) International Fusion Energy Research Centre More than 1.5 Petaflops Memory: 280 TB 245 bullx B-chassis: 245 bullx B chassis 2205 blades B510 4410 compute nodes 8820 sockets Intel Sandy Bridge 2.7GHz Would have been #6 in TOP500 November11 5 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre in Helios L1, scratch filesystem at HELIOS: High IO throughput: 110 GB/s Moderate storage capacity: 5.7 PB 4 MDS Bull R423-E2 48 OSS Bull R423-E2 DDN SFA - 10K DDN SFA - 10K 12 x DDN SFA 10K (300 x 2 TB) DDN SFA - 10K 6 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre in Helios Second Lustre level, L2: - L2, Lustre-DMF filesystem: Moderate IO throughput: 20 GB/s High Lustre capacity: 8.6 PB filesystem Connected to SGI's DMF Edge Servers: 40 PB of extra storage in tape drives 2 MDS Bull R423-E2 12 OSS Bull R423-E2 DDN SFA - 10K DDN SFA - 10K DDN SFA - 10K DDN SFA - 10K 7 Bull, 2012 From Lustre 2.1 to Lustre HSM

Archive Lustre L2 DMF (HSM) Helios cluster robinhood Lustre client InfiniBand 20 GB/s OST OST OST OST MDT MDT Lustre client CXFS client Lustre client CXFS client DMF edge servers SFA-10K X 3 SFA-10K 8 PB DMF 40 PB 8 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM use case 1) File my_file on Lustre: Data Objects 2) Data archived on backend: MD object /lustre/my_file: 3 TB... archive /lustre/my_file: restore MD object archived 0 B Backend Storage (HSM) my_file Data (3 TB) 3) my_file restored on read: Data Objects Backend Storage (HSM) /lustre/my_file: MD object... my_file Data (3 TB) File my_file on Lustre restore 9 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM Developed by CEA Features - Migrate data to an external storage (HSM) - Free disk space when needed - Bring back data on cache-miss - Policy management (migration, purge, soft rm, ) - Import from existing backend - Disaster recovery (restore Lustre filesystem from backend) New components - Coordinator - Archiving tool (backend specific user-space daemon) - Policy Engine (user-space daemon) 10 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM - Architecture The coordinator gathers archiving requests and dispatches them to agents Agent is a client which runs a copytool which transfers data between Lustre and the HSM Policy engine (robinhood-hsm) manages pre-migration and purge policies 11 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM - Patches Lustre HSM brought by several patches developed by CEA : - Adaptation and bugfixes patches (over lustre 2.1.1): LU-810 - Fix helpers for extracting information from HSM changelog records LU-787 fftruncate blocks when grouplock is done LU - 1072 Locking bug in grouplock glimpse callback Landed Under inspection - Feature patches: Add hsm requests Add hsm flags HSM Posix copytool HSM coordinator Add release feature LU - 827 Implement a per file data version LU - 941 Manage dirty flag for hsm-archived files LU - 169 New layout lock 12 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM New Layout lock LU 169 - Add a layout lock, a reference counter lsm and a layout generation number Patch currently being reworked and split in 4 patches: - Layout generation - Basic infrastructure for layout lock - LSM refcount - Core layout lock Layout lock opens the doors to: - HSM support: releasing and recovering a released file - OST rebalancing: move objects between OSTs - OST emptying - Restriping (Dynamic layouts): allow file layouts to change as the file grows or access patterns change - Dynamic layout for subset of a file: restore a part of a file to speed access to critical data - Async mirroring: create multiple copies of a file within the same fs namespace 13 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM Testing Currently being tested at: - Cines / Prace WP9 - SGI - CEA - Bull / HPC R&D labs - Bull / IFERC 14 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM Bull tests At Bull's R&D HPC labs, phase 1: - Functional tests: Several OSTs: useful testing the new layout lock patch Backend over a local disk (ext4) With robinhood-hsm (policy engine) - Helping CEA developers on debugging: Some bugs with the restore functionality and the layout lock Minor bugs in archiving Minor bugs with the archiving tool Some WA on place but system fully operational now 15 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM Bull tests At Helios (Japan), phase 2: - Functional and robustness under high IO load tests: 4 OSS, 60 OSTs High IO load with clients 2 copy agents, backend over storage array 1 robinhood node, mysql db over storage array - Debugging: Changelog bugs Statahead issues with Lustre 2.1 Copytool load balancing No major bugs found but changelog feature needs to be intensively tested - Robinhood-hsm tests: Load tests: 3M files Robinhood error recovery Robinhood policies 16 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM Bull tests At Bull's R&D HPC labs, phase 3: - Functional, robustness, transition and HA tests: 16 standard clients Multi copy agents configuration over storage array Robinhood HA configuration - Validation tests of the Lustre HSM jira tickets - Changelog tests - Transition tests: 1) We have a backed-up Lustre filesystem 2) We want to install Lustre HSM in our already running Lustre filesystem 3) We do not want to recopy all the data already backed-up 17 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM Status & Roadmap Lustre HSM compatible client may be supported in Lustre 2.3 Full Lustre HSM Client (agent and robinhood support) more likely in Lustre 2.4 Lustre HSM Server targeted to be supported in Lustre 2.4 18 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre HSM Status & Roadmap Detail Lustre HSM compatible client may be supported in Lustre 2.3 LU - 827 Implement a per file data version LU - 941 Manage dirty flag for hsm-archived files LU - 169 New layout lock Full Lustre HSM Client (agent and robinhood support) more likely in Lustre 2.4 Add hsm flags Add hsm requests HSM Posix copytool Lustre HSM Server targeted to be supported in Lustre 2.4 LU - 1333 - Add release feature HSM coordinator Landed in master Ongoing work, landing not confirmed Patch still to be submitted 19 Bull, 2012 From Lustre 2.1 to Lustre HSM

Temporary HSM alternative: Lustre backup Need to regularly backup Lustre files Standard Lustre 2.1.1 No need of releasing files at mid-term (high storage capacity on lustre fs) The solution: robinhood-backup, developed by CEA 20 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre Backup: Robinhood-backup Thanks to Lustre changelog, modifications are automatically detected: - no need to regularly scan Lustre fs Policy engine (Robinhood-backup) automatically copying files: - Migration policies are defined Soft remove on Lustre files - Removed files on Lustre are not removed on the backend (also delayed removal) Soft transition to Lustre HSM - Files are already migrated to the HSM device 21 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre Backup Robinhood-backup architecture Robinhood-backup: - Sees Lustre and DMF contents - Registered as changelog reader - Manage mysql database with the state of every file - Use wrappers on copy nodes (distant cp and rm commands) 22 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre Backup Copy agents architecture Copy agents: - Access to SGI's DMF (CXFS) and Lustre - Two or more copy agents (CXFS clients) - Robinhood wrapper tool: cp and rm commands sent by robinhood 23 Bull, 2012 From Lustre 2.1 to Lustre HSM

Lustre Backup - Architecture Helios cluster robinhood Lustre client mysql InfiniBand 20 GB/s OST OST OST OST MDT MDT robinhood requests (cp, rm) Lustre client CXFS client Lustre client CXFS client DMF edge servers SFA-10K X 3 SFA-10K 8 PB DMF 40 PB 24 Bull, 2012 From Lustre 2.1 to Lustre HSM

Upgrading to Lustre HSM What will we have in the future? - The need to release some files: from 8 PB to more than 40 PB - All the Lustre filesystem already backed-up on DMF - Upgrade time limited by the system in production Backed-up Lustre fs Soft upgrade Lustre HSM fs Transition based on update of lustre_mdt_attrs for every archived lustre file (kind of new LINK feature): - Update flags (OK), archive_number (OK) and archived_sum (to be developed) - User command allowing to do this (to be developed). Example: lhsmtool_posix link <lustre_file> <backend_migrated_file> 25 Bull, 2012 From Lustre 2.1 to Lustre HSM

Conclusion Lustre HSM is really on the way: landed code + planned landings Lustre HSM tests already running on some sites The exascale is coming, Hierarchical IO solution with Lustre on top Community development model: EOFS & OpenSFS deeply implied Want to see Lustre-HSM in action? See a proof of concept in one of the LUG breaks 26 Bull, 2012 From Lustre 2.1 to Lustre HSM

27 Bull, 2012 From Lustre 2.1 to Lustre HSM