TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

Similar documents
LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

Data Movement & Tiering with DMF 7

Using file systems at HC3

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

MANAGING LUSTRE & ITS CEA

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

RobinHood Project Status

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Challenges in making Lustre systems reliable

Lustre overview and roadmap to Exascale computing

Lessons learned from Lustre file system operation

Future of IO: a long and winding road

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

LLNL Lustre Centre of Excellence

HPSS Treefrog Summary MARCH 1, 2018

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Lustre TM. Scalability

ROBINHOOD POLICY ENGINE

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Data Movement & Storage Using the Data Capacitor Filesystem

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

What is a file system

Parallel File Systems. John White Lawrence Berkeley National Lab

Extraordinary HPC file system solutions at KIT

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

A GPFS Primer October 2005

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

CA485 Ray Walshe Google File System

An Overview of Fujitsu s Lustre Based File System

SC17 - Overview

White paper Version 3.10

Parallel File Systems for HPC

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Administering Lustre 2.0 at CEA

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Architecting a High Performance Storage System

HPSS RAIT. A high performance, resilient, fault-tolerant tape data storage class. 1

Lustre High availability configuration in CETA-CIEMAT

RobinHood Policy Engine

Coordinating Parallel HSM in Object-based Cluster Filesystems

Nathan Rutman SC09 Portland, OR. Lustre HSM

Robin Hood 2.5 on Lustre 2.5 with DNE

Zhang, Hongchao

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

HPSS Treefrog Introduction.

The Google File System

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

Data storage services at KEK/CRC -- status and plan

AFM Migration: The Road To Perdition

Lustre usages and experiences

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

Filesystems on SSCK's HP XC6000

Overcoming Obstacles to Petabyte Archives

An Introduction to GPFS

Filesystems in Linux. A brief overview and comparison of today's competing FSes. Please save the yelling of obscenities for Q&A.

Data storage on Triton: an introduction

IBM Spectrum Protect HSM for Windows Version Administration Guide IBM

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning

<Insert Picture Here> Filesystem Features and Performance

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

and the GridKa mass storage system Jos van Wezel / GridKa

CMS experience with the deployment of Lustre

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

TS7700 Technical Update TS7720 Tape Attach Deep Dive

Remote Directories High Level Design

Effizientes Speichern von Cold-Data

Parallel File Systems Compared

Insights into TSM/HSM for UNIX and Windows

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

The Google File System

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Fast Forward I/O & Storage

COS 318: Operating Systems. Journaling, NFS and WAFL

Chapter 10: Case Studies. So what happens in a real operating system?

XenData6 Server Software Administrator Guide

An ESS implementation in a Tier 1 HPC Centre

Fujitsu s Contribution to the Lustre Community

XenData SX-10. LTO Video Archive Appliance. Managed by XenData6 Server Software. Overview. Compatibility

SFA12KX and Lustre Update

Using the Mass Storage System HPSS. Mitchell Griffith. Oak Ridge Leadership Computing Facility (OLCF) Presented by:

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

XenData6 Server Software. USER MANUAL Version XenData Limited. All rights reserved.

CS 318 Principles of Operating Systems

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

W4118 Operating Systems. Instructor: Junfeng Yang

Lustre at Scale The LLNL Way

<Insert Picture Here> Btrfs Filesystem

CS 318 Principles of Operating Systems

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31

TS7700 Technical Update What s that I hear about R3.2?

Transcription:

STORAGE @ TGCC OVERVIEW CEA 10 AVRIL 2012 PAGE 1

CONTEXT Data-Centric Architecture Centralized storage, accessible from every TGCC s compute machines Make cross-platform data sharing possible Mutualized Storage Optimal use of resources High Performances Systems Manage more than 10PB of data Connected to computers via Lustre routeurs (100 Go/s) Hierarchical System HSM : Hierarchical Storage Manager LUSTRE filesystem for high performance access Automated migration to tapes (HPSS) PAGE 2

SOFTWARE COMPONENTS CEA 10 AVRIL 2012 PAGE 3

LUSTRE Parallel Filesystem Developed by Intel Data Division(formerly WhamCloud), with support form international labs and organizations OpenSource Product. Half of Top500 machines use it CEA is part of the development. Components MGS (Management Server) MDS (Metadata Server) OSS (Object Storage Server) Routers Clients GL-TGCC Two filesystems : work and store Interconnexion Infiniband QDR 1x metadata cells 2x MDS, 1x DDN SFA10K 10x cells I/O 4x OSS, 1x DDN SFA10K PAGE 4

HPSS High Performance HSM Developed by IBM Third party product, sold as service Components Core server movers disk (disks) movers tape (tapes) ST-TGCC 1x core server 2x cellules mover disk 4x servers, 1x DDN SFA10K 2x cellules mover tapes 3x servers, 4x LTO5 Tape Drive PAGE 5

STORAGE INSIDE TGCC CEA 10 AVRIL 2012 PAGE 6

ARCHITECTURE PAGE 7

FILESYSTEMS work LUSTRE File System Dedicated to short term data, and data sharing on the data-centric (core of the data-centric architecture) Quotas : 1To, 500k inodes per user Not correlated to the HSM store LUSTRE File System Dedicated to long term data Recommended file size : 1Go-100Go up to 1TB Quotas : 100k inodes per user, only inode quota, no quota on volume Automated migration/staging of data in the HSM PAGE 8

LUSTRE CEA 10 AVRIL 2012 PAGE 9

LUSTRE S ARCHITECTURE : INTRODUCTION Lustre : a parallel file system 1 Metadata Server (MDS) per defined filesystem Many Object Storage Servers(OSS) Can deal with thousands of clients Guarantees data and metadata coherency via LDLM (Lustre Distributed Lock Manager) Runs in the kernel space to be closer to hardware Open Source (GPL) PAGE 10

LES SYSTÈMES DE FICHIERS LUSTRE SERVIS PAR GL-TGCC work Core of the data centric architecture Quotas : 1TB, 500k inodes per user «Standalone» file system Mount point: /ccc/work ($CCCWORKDIR) Designed for throughput and performances store Should be used to store final results Connected to a HSM (see later slides) for bigger capacity Recommended file size : 1GB-100GB Quotas : 100k inodes per user, no quota on volume Automated migration and staging with the HSM (see later slides) Mount point: /ccc/store ($CCCSTOREDIR) Designed for data capacity PAGE 11

TGCC: STORAGE ARCHITECTURE PAGE 12

GL-TGCC ARCHITECTURE (DATA-CENTRIC) PAGE 13

GL-TGCC S METADATA CELL Pack of 2 MDS Failover Lustre on 2 MDS 1 MDS for work 1 MDS for store + MGS crossed active/passive failover Storage Backend Metadata records stored in RAID-6 (8+2) with double parity PAGE 14

CELLULE I/O LUSTRE GL-TGCC 4 OSS pack Failover Lustre on 4 OSS Backend storage 28 VDs de 16 To extensible to 24 To Total: 448 To (16 To) up to 672 To (24 To) 7 OSTs / OSS Performance I/O de la cellule 10 Go/s max (Lustre throughput) PAGE 15

LUSTRE S ARCHITECTURE : NETWORK TOPOLOGY GL-TGCC s Infiniband Network PAGE 16

HPSS CEA 10 AVRIL 2012 PAGE 17

WHAT IS A HIERARCHIAL STORAGE MANAGER? HPSS is a Hierarchical Storage Manager (HSM) Data do «sediment» from disks to tapes, via an age based policy New data are still on disks Old data are gone on tape This is not a backup or an archive : no disk/tape replica /ccc/store Niveau disque Niveau bande t 18

HPSS : A PARALLEL HSM Principaux composants : Core server + db2 database Disk movers + disk arrays Tapes movers + tapes drives Serverals separate servers make it possible to extend bandwidth via parallel streams Core server 19 Disk Disk Disk Disk mover mover mover mover Tape Tape mover mover

HSM BINDING CEA 10 AVRIL 2012 PAGE 20

DATA MIGRATION Basements of the HSM store is permanently watched by a Policy Engine (Robinhood) Eligible files for migration are automatically stored in HPSS The filesystem is saved in the HSM Possible recovery in case of crash, major hardware failure, FS been reformatted Older files are Still visible is store with their original size Their contents are out of store and kept in HPSS This is fully transparent to the end-user Space freed in store is available for new files Freed files are staged back at first access Transparent to the end-user The first IO call is blocked until the stage operation is completed PAGE 21

A FILE S LIFE Creation new Copied in HPSS Disk space is freed archived/ synchro Modification released Stage operation HPSS Copy modified/ dirty online 22 offline

USER INTERFACE Users s view: User has access to data via a standardized path: /ccc/store/contxxx/grp/usr ($STOREDIR) No direct access to HPSS, it s «hidden» behind store Regular commands apply to store Accessing a released file stages it back to LUSTRE. The IO is block until transfer is completed ccc_hsm command: ccc_hsm status : query file status (online, released, ) ccc_hsm get : prefetch files ccc_hsm ls : does «ls» but show hsm status (online, offline) too 23

FILES S POPULATION 24

AS A CONCLUSION CEA 10 AVRIL 2012 PAGE 25

COMPUTE RESULTS: BIG IS BEAUTYFUL HSM s point of view: small files suck As much as possible, users MUST avoid storing small files in HSM Smaller files mean more files : huge flat directories is something nothing wants to deal with - Waste of space of HSM s DB containers - Files will be spread on multiple tapes, each of them will require a tape to be mounted, with a big waste of time - They will produce pollution in caches - Kill advantages of «IO Pipelines» by producing «bubbles» IF YOU CAN, MAKE FILE AS BIG AS TAPES ARE (~10GB-100GB) Big Files are nice with you Accessing them results in a single tape mount Very good efficiency in pipelining Allows efficient streams Makes it possible to engage parallel mechanisms PAGE 26

TAR IS YOUR FRIEND TAR is dangerous only in cigarettes Using TAR is an easy (well OK, relatively easy ;-) ) way of packing files TAR as checksuming features to ensure data safety Protects you from silent corruption of data Tools exist to access tarballs from software Tarfiles follow a well known standard See libarchive for example TAR preserves metadata Permissions Owners/groupes TAR preserves symlinks I have an opened mind: you can use cpio if you prefer ;-) Thinking on a framework to perform IO in simulation code PAGE 27 13 février 2014is never a bad idea.

KEEP IN MIND WHAT THE RESOURCES ARE MADE FOR STOREDIR = CAPACITY WORKDIR = SHARING & PERFORMANCE SCRATCH = LOCALITY & PERFORMANCES PAGE 28

ENJOY THE STORAGE PAGE 29