RobinHood Project Status

Similar documents
RobinHood Project Update

FROM RESEARCH TO INDUSTRY. RobinHood v3. Robinhood User Group Thomas Leibovici 16 septembre 2015

RobinHood v3. Integrity check & thoughts on local VFS changelogs. Robinhood User Group Dominique Martinet

MANAGING LUSTRE & ITS CEA

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

RobinHood Policy Engine

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

Next Generation CEA Computing Centres

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Modules v4. Pushing forward user environment management. Xavier Delaruelle FOSDEM 2018 February 4th 2018, ULB, Bruxelles

Slurm at CEA. status and evolutions. 13 septembre 2013 CEA 10 AVRIL 2012 PAGE 1. SLURM User Group - September 2013 F. Belot, F. Diakhaté, M.

Future of IO: a long and winding road

Nathan Rutman SC09 Portland, OR. Lustre HSM

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

Administering Lustre 2.0 at CEA

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

ROBINHOOD POLICY ENGINE

Modules v4. Yes, Environment Modules project is not dead. Xavier Delaruelle

CEA Site Report. SLURM User Group Meeting 2012 Matthieu Hautreux 26 septembre 2012 CEA 10 AVRIL 2012 PAGE 1

Lustre overview and roadmap to Exascale computing

Improving overall Robinhood performance for use on large-scale deployments Colin Faber

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Challenges in making Lustre systems reliable

Robin Hood 2.5 on Lustre 2.5 with DNE

GOING ARM A CODE PERSPECTIVE

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Experiences with HP SFS / Lustre in HPC Production

Parallel File Systems Compared

Lustre usages and experiences

Data Movement & Tiering with DMF 7

Extraordinary HPC file system solutions at KIT

Overcoming Obstacles to Petabyte Archives

The Future of Archiving

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Lessons learned from Lustre file system operation

CODE ANALYSES FOR NUMERICAL ACCURACY WITH AFFINE FORMS: FROM DIAGNOSIS TO THE ORIGIN OF THE NUMERICAL ERRORS. Teratec 2017 Forum Védrine Franck

Zhang, Hongchao

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Modules. Help users managing their shell environment. Xavier Delaruelle UST4HPC May 15th 2018, Villa Clythia, Fréjus

Filesystems on SSCK's HP XC6000

OTC Tools Development and Release process. Igor Stoppa & Eduard Bartosh & JF Ding V May 2013

Data Movement & Storage Using the Data Capacitor Filesystem

Data life cycle monitoring using RoBinHood at scale. Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September LAD

Fast Forward I/O & Storage

Remote Directories High Level Design

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

What s New in Gerrit 2.14 Gerrit User Summit London 2017

Testing Lustre with Xperior

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

NFS-Ganesha w/gpfs FSAL And Other Use Cases

Application Performance on IME

Lustre Update. Brent Gorda. CEO Whamcloud, Inc. Linux Foundation Collaboration Summit San Francisco, CA April 4, 2012

Using the Mass Storage System HPSS. Mitchell Griffith. Oak Ridge Leadership Computing Facility (OLCF) Presented by:

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Embedded Filesystems (Direct Client Access to Vice Partitions)

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Enhancing Lustre Performance and Usability

OVERVIEW OF MPC JUNE 24 TH LLNL Meeting June 15th, 2015 PAGE 1

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Coordinating Parallel HSM in Object-based Cluster Filesystems

Parallel File Systems. John White Lawrence Berkeley National Lab

A More Realistic Way of Stressing the End-to-end I/O System

Monitoring and evaluating I/O performance of HPC systems

EECS 482 Introduction to Operating Systems

Deep Learning on SHARCNET:

CS 550 Operating Systems Spring File System

PAPYRUS FUTURE. CEA Papyrus Team

Using file systems at HC3

Parallel File Systems for HPC

Lustre Metadata Fundamental Benchmark and Performance

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

LFSCK High Performance Data Division

5.4 - DAOS Demonstration and Benchmark Report

Notedb What? Why? How? Gerrit User Summit 2015 Dave Borowitz

Idea of Metadata Writeback Cache for Lustre

File System Consistency

SC17 - Overview

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

EXASCALE COMPUTING ROADMAP IMPACT ON LEGACY CODES MARCH 17 TH, MIC Workshop PAGE 1. MIC workshop Guillaume Colin de Verdière

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

HPSS Treefrog Summary MARCH 1, 2018

Copy Data From One Schema To Another In Sql Developer

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

Oracle Database 18c and Autonomous Database

CERN Lustre Evaluation

An Overview of Fujitsu s Lustre Based File System

Assistance in Lustre administration

Prof. Konstantinos Krampis Office: Rm. 467F Belfer Research Building Phone: (212) Fax: (212)

Shifter at CSCS Docker Containers for HPC

Developing and Testing Java Microservices on Docker. Todd Fasullo Dir. Engineering

Thomas Pelaia II, Ph.D. XAL Workshop 2012 December 13, 2012 Managed by UT-Battelle for the Department of Energy

Lustre Clustered Meta-Data (CMD) Huang Hua Andreas Dilger Lustre Group, Sun Microsystems

HPC at UZH: status and plans

Moving 6PB of data 2.5m West - NCI's migration from LTO-5 to TS1150

Introduction to Lustre* Architecture

Transcription:

FROM RESEARCH TO INDUSTRY RobinHood Project Status Robinhood User Group 2015 Thomas Leibovici <thomas.leibovici@cea.fr> 9/18/15 SEPTEMBER, 21 st 2015

Project history... 1999: simple purge tool for HPC filesystems 1 scan thread, 1 purge thread, entry list in memory 2005: massively multi-threaded (robinhood v1) Multi-threaded scan algorithm Purge queue with multiple worker threads Basic policies (whitelist/blacklist of a user, group, directory...) Aware of Lustre OSTs List in memory Feb. 2009: Robinhood made open source (v1.0.6) CeCILL-C (LGPL compatible) 2009: development of Robinhood v2 Based on a SQL database - Provides memory cache management, persistence, robustness, convenient query language,... Lustre v2 ready (fid, changelogs...) Support complex expressions on entry attributes - e.g. {(path == /lustre/foo/bar* or owner == foo) and size > 1GB)}

...until today 2009-2015: a growing community Continuous improvements and new features, support of latest Lustre versions and features, responsive support, vendor integration and contributions,... Robinhood: downloads on sourceforge Downloads per minor release 4000 700 3500 3000 2500 2000 1500 1000 V2.0, v2.1 V2.2 V2.3 V2.4 V2.5 600 500 400 300 200 500 100 0 0 2009 2010 2011 2012 2013 2014 September 2015: first Robinhood User Group Thanks for attending! 2013 2014 2015 + other channels (vendor distributions...)

Robinhood Modes and Supported Filesystems

Robinhood flavors (v2) Scratch filesystem management (tmpfs) Monitoring/accounting, purge (=unlink), rmdir (=rmdir or rm -rf) Package: robinhood tmpfs Supported: All POSIX filesystems All Lustre versions from 1.8 to 2.7 Backup Monitoring/accounting, archiving, undelete Package: robinhood backup Supported: All Lustre versions from 2.0 to 2.7 Lustre HSM (lhsm) Monitoring/accounting, migration (=archive), purge (=release), HSM rm Package: robinhood lhsm Supported: All Lustre versions from 2.5 to 2.7

[spoiler] Robinhood v3 1 single instance for all purposes Available for POSIX filesystems and all Lustre versions from 1.8 to 2.7 and later Plugins for specific feature support backup plugin available for Lustre >= 2.0 lhsm plugin available for Lustre >= 2.5 [teaser] Other plugins... or your own! A single robinhood instance can handle multiple plugins e.g. manage Lustre/HSM policies + delete old files +... (Cf. next talk about robinhood v3)

Validation & Release Process

Gerrit Code review: Gerrithub https://review.gerrithub.io Project: cea-hpc/robinhood

Test Suite & Test Platform Test suite Bash scripts + related config files 171 tests for Lustre filesystems 89 tests for POSIX filesystems Each test includes multiple cases and operations (up to 1min per test) Test platform Jenkins on a private cluster at CEA Test all Lustre major versions from 1.8 to 2.7+ (master) Test on POSIX filesystems with various OS Test all modes (tmpfs, backup, lhsm) Test duration Run for 1 mode on a lustre config: 30min to 1h

Pre-production Platform Validating release candidate Release candidate is installed on a mid-range pre-production system at CEA 1 tmpfs instance to manage a scratch Lustre filesystem ClusterStor 9000 339TB, 420M inodes Lustre 2.5 servers Both Lustre 2.5 and 2.7 clients (300 clients) 1 lhsm instance to manage a Lustre/HPSS data migration SFA12K-E (4 OSS) + 1MDS 684TB, 125M inodes Lustre 2.5 servers Both Lustre 2.5 and 2.7 clients (300 clients) 3 HSM copy agents (Lustre 2.5 and 2.7) Last but not least: real bad users!!! High metadata load

Last Step Before Release Last step before GA: production! Installed on TERA100 global Lustre file system 11Po filesystem 200GB/s throughput 4500 clients Lustre 2.5 Manage Lustre to HPSS data migration Archive 100TB per day If everything is OK: the version is officially released

Versioning policy Versioning (X.Y.Z) Z: minor versions Minor features, bug fixes Upgrade is straight forward - No DB schema change (unless minor & compatible) - Config file compatibility Y: major versions Major features, DB changes DB schema may change (may require to rescan) Config file compatibility as much as possible X: in-depth changes Architectural changes Compatibility: best effort - Command line may change Conversion helpers can be provided

Git branches Master: maintenance branch master = last minor release + minor fixes master = next minor release Currently, master is v2.5.x (v2.5.5+) You can safely use 'master', it is stable Branches: old versions and next major release b_2.4: last 2.4.x version b_3.0: development branch of v3.0

Release status

Recent Robinhood versions Previous release: 2.5.4 (Dec, 2014) Update stripe info on layout change (Lustre 2.4+) Eviction-resilent scanning Improved DB request batching Configurable DB engine Last release: 2.5.5 (Jun, 2015) Lustre 2.7 support Unleashed DB performance with accounting off Policies: set default parameters for performance

Next versions Next 2.5.x versions Already scheduled for 2.5.6: Improved management of archive_id for Lustre/HSM Minor patches will still be integrated to branch 2.5 until v3 is widely adopted Possible next v2.5.x Development version: v3.0 In development since mid-2014 Main contributors: CEA and Cray Current status: stabilization, documentation,... More details in next presentations...

Summary We do our best to deliver a robust, efficient and featured software Community continuously grows (users and contributors) Your feedback is important Thanks for attending Robinhood User Group!

Thanks for your attention! Questions? Commissariat à l énergie atomique et aux énergies alternatives CEA / DAM Ile-de-France Bruyères-le-Châtel - 91297 Arpajon Cedex T. +33 (0)1 69 26 40 00 DAM Île-de-France Etablissement public à caractère industriel et commercial RCS Paris B 775 685 019