Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

Similar documents
RobinHood Policy Engine

Zhang, Hongchao

MANAGING LUSTRE & ITS CEA

Nathan Rutman SC09 Portland, OR. Lustre HSM

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

ROBINHOOD POLICY ENGINE

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

FROM RESEARCH TO INDUSTRY. RobinHood v3. Robinhood User Group Thomas Leibovici 16 septembre 2015

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

RobinHood Project Status

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

Coordinating Parallel HSM in Object-based Cluster Filesystems

Lustre overview and roadmap to Exascale computing

Project Quota for Lustre

Data Movement & Tiering with DMF 7

Administering Lustre 2.0 at CEA

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Imperative Recovery. Jinshan Xiong /jinsan shiung/ 2011 Whamcloud, Inc.

LCOC Lustre Cache on Client

RobinHood Project Update

NFS-Ganesha w/gpfs FSAL And Other Use Cases

Remote Directories High Level Design

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Lustre usages and experiences

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

Robin Hood 2.5 on Lustre 2.5 with DNE

Demonstration Milestone for Parallel Directory Operations

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

DNE2 High Level Design

Andreas Dilger High Performance Data Division RUG 2016, Paris

Toward a Windows Native Client (WNC) Meghan McClelland LAD2013

5.4 - DAOS Demonstration and Benchmark Report

Data life cycle monitoring using RoBinHood at scale. Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September LAD

Filesystems on SSCK's HP XC6000

RobinHood v3. Integrity check & thoughts on local VFS changelogs. Robinhood User Group Dominique Martinet

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2

LLNL Lustre Centre of Excellence

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Introduction to Lustre* Architecture

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Fan Yong; Zhang Jinghai. High Performance Data Division

Enhancing Lustre Performance and Usability

CERN Lustre Evaluation

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Robinhood v2. Temporary Filesystem Manager. Admin Guide

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Commit-On-Sharing High Level Design

CLIO. Nikita Danilov Senior Staff Engineer Lustre Group

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Experiences with HP SFS / Lustre in HPC Production

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

Efficient Object Storage Journaling in a Distributed Parallel File System

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Data storage services at KEK/CRC -- status and plan

Lustre 2.8 feature : Multiple metadata modify RPCs in parallel

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

Buffer Management for XFS in Linux. William J. Earl SGI

Using file systems at HC3

Network Request Scheduler Scale Testing Results. Nikitas Angelinas

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

CMS experience with the deployment of Lustre

DL-SNAP and Fujitsu's Lustre Contributions

Andreas Dilger, Intel High Performance Data Division LAD 2017

Lustre Parallel Filesystem Best Practices

Improving overall Robinhood performance for use on large-scale deployments Colin Faber

Parallel File Systems for HPC

OpenZFS Performance Improvements

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Andreas Dilger, Intel High Performance Data Division SC 2017

Lustre High availability configuration in CETA-CIEMAT

Lustre HPCS Design Overview. Andreas Dilger Senior Staff Engineer, Lustre Group Sun Microsystems

Johann Lombardi High Performance Data Division

Overcoming Obstacles to Petabyte Archives

High Availability and Disaster Recovery features in Microsoft Exchange Server 2007 SP1

A Simple Mass Storage System for the SRB Data Grid

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Chapter 11: Implementing File

<Insert Picture Here> Lustre Development

Dell DR4100. Disk based Data Protection and Disaster Recovery. April 3, Birger Ferber, Enterprise Technologist Storage EMEA

Lustre 2.12 and Beyond. Andreas Dilger, Whamcloud

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

SFA12KX and Lustre Update

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Chapter 11: Implementing File Systems

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

1 Quantum Corporation 1

File Systems Management and Examples

Archiving and Retrieving Data with the Lustre TSM Copytool and LTSM

Software Engineering at VMware Dan Scales May 2008

Transcription:

Lustre/HSM Binding Aurélien Degrémont aurelien.degremont@cea.fr Aurélien Degrémont LUG 2011 12-14 April 2011 1

Agenda Project history Presentation Architecture Components Performances Code Integration Aurélien Degrémont LUG 2011 12-14 April 2011 2

Project History 2007 CFS times Never ending Architecture 2008-2009 Sun era Designing and Lustre internals learning 2010 Oracle times Coding, hard landing 2011 Nowadays Debugging, Testing, Improving Aurélien Degrémont LUG 2011 12-14 April 2011 3

HSM seamless integration Presentation (1/2) HSM Backend Takes the best of each world: Lustre: high-performance disk cache in front of the HSM parallel cluster filesystem high I/O performance, POSIX access HSM: long-term data storage Manage large number of disks and tapes Huge storage capacity Aurélien Degrémont LUG 2011 12-14 April 2011 4?

Presentation (2/2) Features Migrate data to the HSM Free disk space when needed Bring back data on cache-miss Policy management (migration, purge, soft rm, ) Import from existing backend Disaster recovery (restore Lustre filesystem from backend) New components Coordinator Archiving tool (backend specific user-space daemon) Policy Engine (user-space daemon) Aurélien Degrémont LUG 2011 12-14 April 2011 5

Architecture (1/2) MDS Coordinator s OSS OSS Agent Agent Agent Archiving Archiving tool tool Copy tool HSM protocols Lustre world HSM world New components: Coordinator, Agent and copy tool The coordinator gathers archiving requests and dispatches them to agents. Agent is a client which runs a copytool which transfers data between Lustre and the HSM. Aurélien Degrémont LUG 2011 12-14 April 2011 6

Architecture (2/2) MDS Coordinator s OSS OSS PolicyEngine PolicyEngine manages archive and release policies. A user-space tool which communicates with the MDT and the coordinator. Watch the filesystem changes. Trigger actions like archive, release and removal in backend. Aurélien Degrémont LUG 2011 12-14 April 2011 7

Component: Copytool It is the interface between Lustre and the HSM. It reads and writes data between them. It is HSM specific. It is running on a standard Lustre client (called Agent). 2 of them are already available: HPSS copytool. (HPSS 7.3+). CEA development which will be freely available to all HPSS sites. Posix copytool. Could be used with any system supporting a posix interface, like SAM/QFS. More supported HSM to come DMF Enstore Aurélien Degrémont LUG 2011 12-14 April 2011 8

Component: PolicyEngine Robinhood PolicyEngine is the specification Robinhood is an implementation: Is originately an user-space daemon for monitoring and purging large filesystems. CEA opensource development: http://robinhood.sf.net Policies: File class definitions, associated to policies Based on files attributes (path, size, owner, age, xattrs ) Rules can be combined with boolean operators LRU-based migr./purge policies Entries can be white-listed Aurélien Degrémont LUG 2011 12-14 April 2011 9

Component: Coordinator MDS thread which "coordinates" HSM-related actions. Centralizes HSM-related requests. Ignore duplicate request. Control migration flow. Dispatch request to copytools. Requests are saved and replayed if MDT crashes. Aurélien Degrémont LUG 2011 12-14 April 2011 10

File states View file states lfs hsm_state <FILE> $ lfs hsm_state /mnt/lustre/foo /mnt/lustre/foo states: (0x00000009) exists archived New file states Archived A copy exists in the HSM Dirty File in Lustre was modified since it was archived Released No more data, same size, no block # stat /mnt/lustre/foo File: `/mnt/lustre/foo Size: 409600 Blocks: 0 IO Block: 2097152 Aurélien Degrémont LUG 2011 12-14 April 2011 11

Early performance Standard use Based on Lustre MDT Changelogs. Low impact on MDT. Introduce Layout lock. Very low RPC overhead. Simple testbed Plateform HW: 2 Dell R710 4 Xeon X5650 @ 2.6 GHz 40 GB 3 VM 12 CPU 8GB Filesystem 1 server: 1 MGS/MDS, 2 OSS Few clients HSM Backend Local /tmp filesystem Aurélien Degrémont LUG 2011 12-14 April 2011 12

Early performance Release Synchronous data purge 140 File release rate 120 100 Files/sec 80 60 40 20 0 0 500 1000 1500 2000 # files Aurélien Degrémont LUG 2011 12-14 April 2011 13

Early performance Archiving Copy files from a Lustre client to a local ext3 filesystem More than 1 million archives per hour 450 File archiving rate 400 350 300 Files/sec 250 200 150 100 50 0 0 500 1000 1500 2000 # files Aurélien Degrémont LUG 2011 12-14 April 2011 14

Learn Lustre Lustre/HSM project uses a lot of different parts of Lustre code. To develop this project, we needed to learn each of them HSM patches: Infrastructure RPC Layout lock and grouplock LDLM, CLIO Release LDLM, CLIO, OST object management, MDT layering Coordinator Llog Aurélien Degrémont LUG 2011 12-14 April 2011 15

Code Integration Oracle 2.1 branch (R.I.P.) Few first patches were landed in Oracle 2.1 branch All landing effort was stopped due to Oracle Lustre policy Lustre 2.1 Whamcloud/community release Target is to released to a stable version quickly Do not take risk to delay it due to HSM patches So when? Will start code landing as soon as 2.2 branch is available Aurélien Degrémont LUG 2011 12-14 April 2011 16

Thank you. Questions? Aurélien Degrémont LUG 2011 12-14 April 2011 17

Robinhood: example of migration policy File classes: Filesets { FileClass small_files { definition { tree == "/mnt/lustre/project and size < 1MB } migration_hints = "cos=12" ; } } Policy definitions: Migration_Policies { ignore { size == 0 or xattr.user.no_copy == 1 } ignore { tree == /mnt/lustre/logs and name== *.log } } policy migr_small { target_fileclass = small_files; condition { last_mod > 6h or last_copyout > 1d } } policy default { condition { last_mod > 12h } migration_hints = cos=42 ; } Aurélien Degrémont LUG 2011 12-14 April 2011 18

Triggers: Robinhood: example of purge policy Purge_trigger { trigger_on = ost_usage; high_watermark_pct = 80%; low_watermark_pct = 70%; } Policy definitions: Purge_Policies { ignore { size < 1KB } ignore { xattr.user.no_release = 1 or owner == root } policy purge_quickly{ target_fileclass = classx; condition { last_access > 1min } } } policy default { condition { last_access > 1h } } Aurélien Degrémont LUG 2011 12-14 April 2011 19