Using file systems at HC3

Similar documents
Lessons learned from Lustre file system operation

Assistance in Lustre administration

Extraordinary HPC file system solutions at KIT

Filesystems on SSCK's HP XC6000

Challenges in making Lustre systems reliable

Lustre Parallel Filesystem Best Practices

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Experiences with HP SFS / Lustre in HPC Production

Overview of High Performance Input/Output on LRZ HPC systems. Christoph Biardzki Richard Patra Reinhold Bader

Parallel File Systems Compared

Triton file systems - an introduction. slide 1 of 28

Hands-On Workshop bwunicluster June 29th 2015

Parallel File Systems for HPC

Deep Learning on SHARCNET:

Lustre overview and roadmap to Exascale computing

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

Data storage on Triton: an introduction

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Project Quota for Lustre

Welcome! Virtual tutorial starts at 15:00 BST

The BioHPC Nucleus Cluster & Future Developments

Enhancing Lustre Performance and Usability

File Systems for HPC Machines. Parallel I/O

An Overview of Fujitsu s Lustre Based File System

The JANUS Computing Environment

Performance Monitoring in an HP SFS Environment

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

HPC Input/Output. I/O and Darshan. Cristian Simarro User Support Section

Progress on Efficient Integration of Lustre* and Hadoop/YARN

5.4 - DAOS Demonstration and Benchmark Report

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

FhGFS - Performance at the maximum

Fujitsu's Lustre Contributions - Policy and Roadmap-

Embedded Filesystems (Direct Client Access to Vice Partitions)

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

How To write(101)data on a large system like a Cray XT called HECToR

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Google File System 2

Architecting Storage for Semiconductor Design: Manufacturing Preparation

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

Scalable I/O. Ed Karrels,

CA485 Ray Walshe Google File System

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

NPTEL Course Jan K. Gopinath Indian Institute of Science

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Parallel File Systems. John White Lawrence Berkeley National Lab

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING

Coordinating Parallel HSM in Object-based Cluster Filesystems

CSE 124: Networked Services Lecture-16

Google File System. Arun Sundaram Operating Systems

Data Movement & Storage Using the Data Capacitor Filesystem

Lustre usages and experiences

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Using DDN IME for Harmonie

CSE 124: Networked Services Fall 2009 Lecture-19

Lustre Clustered Meta-Data (CMD) Huang Hua Andreas Dilger Lustre Group, Sun Microsystems

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Demonstration Milestone for Parallel Directory Operations

Data Movement and Storage. 04/07/09 1

File Systems. What do we need to know?

Google Disk Farm. Early days

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Practice of Software Development: Dynamic scheduler for scientific simulations

Robin Hood 2.5 on Lustre 2.5 with DNE

Introduction to Parallel I/O

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Xyratex ClusterStor6000 & OneStor

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

High-Performance Lustre with Maximum Data Assurance

DELL Terascala HPC Storage Solution (DT-HSS2)

CLOUD-SCALE FILE SYSTEMS

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Fujitsu s Contribution to the Lustre Community

Distributed Systems 16. Distributed File Systems II

PLFS and Lustre Performance Comparison

CS 4284 Systems Capstone

Inode. Local filesystems. The operations defined for local filesystems are divided in two parts:

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Datura The new HPC-Plant at Albert Einstein Institute

Google is Really Different.

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CNAG Advanced User Training

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

The Google File System

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

The Google File System

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

The Google File System (GFS)

Lustre and PLFS Parallel I/O Performance on a Cray XE6

Inventory File Data with Snap Enterprise Data Replicator (Snap EDR)

Dell TM Terascala HPC Storage Solution

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

Transcription:

Using file systems at HC3 Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu

Basic Lustre concepts Client Client Client Directory operations, file open/close metadata & concurrency File I/O & file locking Metadata Server Recovery, file status, file creation Object Storage Server Lustre componets: Clients (C) offer standard file system API Metadata servers (MDS) hold metadata, e.g. directory data Object Storage Servers (OSS) hold file contents and store them on Object Storage Targets (OSTs) All communicate efficiently over interconnects, e.g. with RDMA 2 29.11.2010 Roland Laifer Using file systems at hc3

Lustre file systems at HC3 IC1 cluster HC3 cluster C C... (213x) C C C C C... (342x) C C C InfiniBand DDR InfiniBand QDR InfiniBand DDR MDS MDS OSS OSS 2x OSS OSS 8x MDS MDS OSS OSS 2x OST OST OST OST File system $HOME $PFSWORK $WORK Capacity (TiB) 76 301 203 Storage hardware transtec provigo transtec provigo DDN S2A9900 # of OSTs 12 48 28 # of OST disks 192 768 290 3 29.11.2010 Roland Laifer Using file systems at hc3

File system properties Property Visibility Lifetime Capacity Quotas Backup Read perf. / node Write perf. / node Total read perf. Total write perf. $TMP local node batch job thin/med. fat login 129 673 825 GB no no thin/medium fat 70 250 MB/s thin/medium fat 80 390 MB/s n*70 n*250 MB/s n*80 n*390 MB/s $HOME HC3, IC1 permanent 76 TB not enforced yes for IC1 nodes 600 MB/s for IC1 nodes 700 MB/s 1700 MB/s 1500 MB/s $WORK HC3 > 7 days 203 TB not enforced no 1800 MB/s 1800 MB/s 4800 MB/s 4800 MB/s $PFSWORK HC3, IC1 > 7 days 301 TB not enforced no for IC1 nodes 600 MB/s for IC1nodes 800 MB/s 6200 MB/s 5500 MB/s 4 29.11.2010 Roland Laifer Using file systems at hc3

Which file system to use? Follow these recommendations: Whenever possible use $TMP for scratch data Use $WORK for scratch data and job restart files Use $PFSWORK to share scratch data between clusters Use $HOME for long living data and when backup is required Archive huge and unused data sets 5 29.11.2010 Roland Laifer Using file systems at hc3

How does striping work? file on $HOME file on $PFSWORK stripe size (default 1 MB) file on $WORK...... C C C C C C C C C C C C InfiniBand interconnect MDS MDS OSS OSS 2x OSS OSS 8x MDS MDS OSS OSS 2x OST OST stripe count 1 (default) stripe count 2 (default) stripe count 4 (default) Parallel data paths from clients to storage 6 29.11.2010 Roland Laifer Using file systems at hc3

Using striping parameters Important hints to use striping Striping parameters inherited from parent directory or file system default To change parameters for existing file, copy it and move it back to the old name Each new file is automatically created on new set of OSTs No need to adapt striping parameters if many files are used in similar way! Adapt striping parameters to increase performance OST performance is usually limited by storage subsystem At HC3 pretty similar for all 3 file systems: ~200 MB/s Increasing the stripe count might improve performance with few large files Adapt stripe size to match application I/O pattern Only makes sense in rare cases Show and adapt striping parameters Show parameters: lfs getstripe <file directory> Change the stripe count: lfs setstripe c <count> <file directory> Change the stripe size: lfs setstripe s <size> <file directory> 7 29.11.2010 Roland Laifer Using file systems at hc3

Write performance with default stripe count Throughput (MB/s) 7000 6000 5000 4000 3000 2000 1000 0 1 2 3 4 5 6 7 8 Number of nodes Max. $HOME $PFSWORK $WORK 8 29.11.2010 Roland Laifer Using file systems at hc3

Read performance with default stripe count Throughput (MB/s) 7000 6000 5000 4000 3000 2000 1000 0 1 2 3 4 5 6 7 8 Number of nodes Max. $HOME $PFSWORK $WORK 9 29.11.2010 Roland Laifer Using file systems at hc3

Metadata performance Operations / sec 45000 40000 35000 30000 25000 20000 15000 10000 5000 0 1 2 3 4 5 6 Number of nodes Max. Create Stat Delete 10 29.11.2010 Roland Laifer Using file systems at hc3

Best practises (1) Change nothing if you use few (< 100) small (< 10MB) files Increasing throughput performance: Use moderate stripe count (4 or 8) if only one task is doing IO Improves single file bandwidth per client To exploit complete file system bandwidth use several clients Different files from different clients are automatically distributed For one shared file use chunks with boundaries at stripe size If many tasks use few huge files set stripe count to -1 Use stripe count 1 if lots of files are used in the same way Collect large chunks of data and write them sequentially at once Avoid competitive file access e.g. writing to the same chunks or appending from different clients e.g. competitive read/write access Use $TMP whenever possible If data is used by one client and is small enough for local hard drives 11 29.11.2010 Roland Laifer Using file systems at hc3

Best practises (2) Increasing metadata performance: Avoid creating many small files For parallel file systems metadata performance is limited Avoid searching in huge directory trees Avoid competitive directory access e.g. by creating files for each task in separate subdirectories If lots of files are created use stripe count 1 Use $TMP whenever possible If data is used by one client and is small enough for local hard drives This also allows to reduce compilation times Change the default colorization setting of the ls command alias ls='ls -color=tty Otherwise ls command needs to contact OSS in addition to MDS 12 29.11.2010 Roland Laifer Using file systems at hc3

Understanding application I/O behaviour Application properties with impact on I/O: Number and size of files Access pattern, i.e. random or sequential Buffer size of file operations Type and number of operations (read, write, create, delete, stat) Number of clients executing the operations External factors with impact on application I/O: Overall usage of file system components Storage, OSS, MDS, networks, clients Administrators can tell the current status For help to analyse and improve I/O performance contact roland.laifer@kit.edu 13 29.11.2010 Roland Laifer Using file systems at hc3