Project Quota for Lustre

Similar documents
Enhancing Lustre Performance and Usability

Lustre Metadata Fundamental Benchmark and Performance

Lustre overview and roadmap to Exascale computing

Using file systems at HC3

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

DNE2 High Level Design

T10PI End-to-End Data Integrity Protection for Lustre

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

Remote Directories High Level Design

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

An Exploration of New Hardware Features for Lustre. Nathan Rutman

Lustre Clustered Meta-Data (CMD) Huang Hua Andreas Dilger Lustre Group, Sun Microsystems

Feedback on BeeGFS. A Parallel File System for High Performance Computing

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

DL-SNAP and Fujitsu's Lustre Contributions

Status of SELinux support in Lustre

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Lustre & SELinux: in theory and in practice

High Level Architecture For UID/GID Mapping. Revision History Date Revision Author 12/18/ jjw

SFA12KX and Lustre Update

Multi-tenancy: a real-life implementation

Managing Lustre TM Data Striping

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-001 contract.

LIME: A Framework for Lustre Global QoS Management. Li Xi DDN/Whamcloud Zeng Lingfang - JGU

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Parallel File Systems for HPC

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Fujitsu's Lustre Contributions - Policy and Roadmap-

6.5 Collective Open/Close & Epoch Distribution Demonstration

Demonstration Milestone for Parallel Directory Operations

Lustre A Platform for Intelligent Scale-Out Storage

CMD Code Walk through Wang Di

Assistance in Lustre administration

An Overview of Fujitsu s Lustre Based File System

Lustre 2.8 feature : Multiple metadata modify RPCs in parallel

Idea of Metadata Writeback Cache for Lustre

LCOC Lustre Cache on Client

Extraordinary HPC file system solutions at KIT

Lustre / ZFS at Indiana University

Tutorial: Lustre 2.x Architecture

Filesystems on SSCK's HP XC6000

Administering Lustre 2.0 at CEA

Multi-Rail LNet for Lustre

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Lustre usages and experiences

ext4: the next generation of the ext3 file system

Metadata Performance Evaluation LUG Sorin Faibish, EMC Branislav Radovanovic, NetApp and MD BWG April 8-10, 2014

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Lessons learned from Lustre file system operation

Introduction to Lustre* Architecture

Olaf Weber Senior Software Engineer SGI Storage Software. Amir Shehata Lustre Network Engineer Intel High Performance Data Division

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

NetApp High-Performance Storage Solution for Lustre

W4118 Operating Systems. Instructor: Junfeng Yang

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Google File System

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Lustre SMP scaling. Liang Zhen

Efficient Object Storage Journaling in a Distributed Parallel File System

ECE 598 Advanced Operating Systems Lecture 14

Data Movement & Storage Using the Data Capacitor Filesystem

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Main Points. File layout Directory layout

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

File Systems. What do we need to know?

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

LLNL Lustre Centre of Excellence

Triton file systems - an introduction. slide 1 of 28

Andreas Dilger, Intel High Performance Data Division LAD 2017

Fujitsu s Contribution to the Lustre Community

Lustre 2.12 and Beyond. Andreas Dilger, Whamcloud

Nathan Rutman SC09 Portland, OR. Lustre HSM

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

Lustre Data on MDT an early look

RobinHood Policy Engine

Enterprise Filesystems

Lustre * 2.12 and Beyond Andreas Dilger, Intel High Performance Data Division LUG 2018

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Advanced Operating Systems

IBM Spectrum Scale Archiving Policies

There is a general need for long-term and shared data storage: Files meet these requirements The file manager or file system within the OS

Lab 4 File System. CS140 February 27, Slides adapted from previous quarters

Lustre Parallel Filesystem Best Practices

2011/11/04 Sunwook Bae

Understanding StoRM: from introduction to internals

Hadoop Map Reduce 10/17/2018 1

Active-Active LNET Bonding Using Multiple LNETs and Infiniband partitions

ROBINHOOD POLICY ENGINE

Configuring EMC Isilon

Upstreaming of Features from FEFS

Network Request Scheduler Scale Testing Results. Nikitas Angelinas

Transcription:

1 Project Quota for Lustre Li Xi, Shuichi Ihara DataDirect Networks Japan

2 What is Project Quota? Project An aggregation of unrelated inodes that might scattered across different directories Project quota A new quota type that supplements existing user/group quota types File systems supporting project quotas: XFS GPFS: fileset quota New: Ext4 Coming soon: Lustre! Group quota Usage Project quota User quota

3 Project Quota of ext4 All project quota patches for ext4 have been merged into the kernel mainline. (Since linux-4.5) 8b4953e ext4: reserve code points for the project quota feature --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -374,6 +374,7 @@ struct flex_groups { #define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */ #define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */ #define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */ +#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ 9b7365f ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support 689c958 ext4: add project quota support 040cb37 ext4: adds project ID support

200000 150000 100000 4 Performance: File creation on Ext4 Kernel: 3.16.0-rc5 Server: Dell R620 (2 x E5-2667 3.3GHz, 256GB memory) Storage: 10 x 15K SAS disks(raid10) Test tool: mdtest-1.9.3. Mdtest created 800K files in total. 50000 1thr 2thr 4thr 8thr 16thr xattr(quota off) xattr(user/group quota) xattr(user/group/project quota) internal field(quota off) internal field(user/group quota) internal field(user/group/project quota)

5 Performance results: File removal on Ext4 Kernel: 3.16.0-rc5 Server: Dell R620 (2 x E5-2667 3.3GHz, 256GB memory) Storage: 10 x 15K SAS disks(raid10) Test tool: mdtest-1.9.3. 450000 350000 250000 Mdtest created 800K files in total. xattr(quota off) xattr(user/group quota) xattr(user/group/project quota) 150000 internal field(quota off) 50000 1thr 2thr 4thr 8thr 16thr internal field(user/group quota)

6 Why Project Quota for Lustre? Per-user or per-group quota is no more sufficient to account for the use scenarios encountered by storage administrators (e.g. project based storage volume allocation) User/Group configurations usually don t change frequently Projects within a massive file system change a lot Quota of small parts of a file system helps administrator to make capacity plans of entire storage's volume Space accounting and limitation enforcement based on OSTs/sub-directories/file-sets/projects enables various use cases

7 Semantics of Project Quota Project ID Inodes that belong to the same project possess an identical identification, just like user/group ID Inherit flag An inode flag which defines the behavior related to projects Directory with inherit flag: All newly created sub-files inherit project IDs from the parent No renaming of an inode with different project ID to the directory is allowed (EXDEV returned) No hard-links from an inode with different project ID to the directory is allowed (EXDEV returned) The total block/inode number of a directory will be the minimum of the quota limits and the file system capacity

8 Architecture of Lustre Quota Quota master A centralized server hold the cluster wide limits Guarantees that global quota limits are not exceeded and tracks quota usage on slaves Stores the quota limits for each uid/gid/project ID Accounts for how much quota space has been granted to slaves Single quota master is running on MDT0 currently Quota slaves All the OSTs and MDT(s) are quota slaves Manage local quota usage/hardlimit acquire/release quota space from the master

9 Clients LOV Bulk write request/reply MDS(QMT) DQACQ request/reply OSS(QSD) Local quota check

10 Implementation Goals Integrated in current quota framework Ability to enforce both block and inode quotas Support hard and soft limits Support user/group/project accounting Full support of pools Dynamic change of pool definition Separate quotas for users/groups for each pool No significant performance impact

11 Design: Pool Definition in LLOG 2. Find/create pool ID through name Administrator 1. Modify pool MGS 3. Write and distribute LLOG MDS OSS Clients 4. MDT(OSD)/MDT0(QMT)/LOD: update pool 4. OFD(OSD): update pool 4. LOV: update pool

12 Design: Flow of a write request Client OSS MDS Send RPC to write to an object Write RPC completed Yes Lookup quota entry though pool ID Check whether local quota is sufficient Write data to object Send reply to client No Acquire more quota Grant space quota to the slave Lookup quota entry though pool ID

13 Use case #1 Usage accounting/restriction of subdirectories Pool/project IDs are inherited from parent directory by default A quicker replacements of du command on big directories: lfs quota # lfs setstripe --pool dir1 /lustre/dir1 # lfs setstripe --pool dir2 /lustre/dir2 # lfs quota -p dir1 # lfs quota -p dir2

14 Use case #2 Usage accounting/restriction of file sets according to attributes # lfs find /lustre xargs lfs setstripe --pool fileset1 # find /lustre xargs lfs setstripe --pool fileset2 # lfs quota -p fileset1 /lustre # lfs quota -p fileset2 /lustre Files in the same file set can be scattered in different locations The space/inode usage of file sets can be monitored in real time Requirement: Need to be able to change the pool name of a file without changing its layout!

15 Use case #3 Total usage limitations of collaboration projects Limit for Number of files DIR Project 1 10TB DIR Project 2 Lustre Filesystem 144TB 20TB Limit for Total file size Pool "Project-1" Pool "Project-2" MDT 10M inodes OST01 24TB OST02 24TB OST03 24TB OST04 24TB OST05 24TB OST06 24TB

16 Use case #4 Usage tracking of different OST types Pool for archive Pool for running jobs Users/Groups Pool for normal usage Cheap OSTs Fast OSTs Normal OSTs

17 Development Status Lustre-2.9 will be the initial branch for project quota All critical patches that change disk format or protocol will be pushed to Lustre-2.9 Backported mainline ext4 codes against ldiskfs Backported mainline e2fsprogs patches against lustre-master Pushed Lustre Quota cleanup patches to add new quota type OST Pool enchantment patches to support pool id All patches are tracked on LU-4017 and under active review Other Lustre quota patches are ready in DDN Lustre branch and will be pushed to master after Lustre-2.9 More review, tests and benchmarks are needed

18 Conclusion We proposed a implementation of project quota in Lustre Lustre project quota shares the same semantics with XFS and Ext4 The combination of project quota with OST pools enables more use cases than pure project quota Benchmark results show that project quotas cause very limited performance impact

19 19 Thank you!