LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Similar documents
LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

Administering Lustre 2.0 at CEA

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

MANAGING LUSTRE & ITS CEA

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

RobinHood Project Status

An Overview of Fujitsu s Lustre Based File System

Lustre at the OLCF: Experiences and Path Forward. Galen M. Shipman Group Leader Technology Integration

Lustre overview and roadmap to Exascale computing

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Efficient Object Storage Journaling in a Distributed Parallel File System

SFA12KX and Lustre Update

CSCS HPC storage. Hussein N. Harake

LLNL Lustre Centre of Excellence

Data Movement & Storage Using the Data Capacitor Filesystem

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Oak Ridge National Laboratory

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Parallel File Systems for HPC

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Xyratex ClusterStor6000 & OneStor

Parallel File Systems Compared

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

High Performance Storage Solutions

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

The Spider Center-Wide File System

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Extraordinary HPC file system solutions at KIT

Challenges in making Lustre systems reliable

The Spider Center Wide File System

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Lustre usages and experiences

NFS-Ganesha w/gpfs FSAL And Other Use Cases

Isilon Performance. Name

Architecting a High Performance Storage System

Lessons learned from Lustre file system operation

Lustre at Scale The LLNL Way

2012 HPC Advisory Council

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Voltaire Making Applications Run Faster

Datura The new HPC-Plant at Albert Einstein Institute

Coordinating Parallel HSM in Object-based Cluster Filesystems

HPC Hardware Overview

Introduction to Lustre* Architecture

Lustre / ZFS at Indiana University

Parallel File Systems. John White Lawrence Berkeley National Lab

Lustre TM. Scalability

ROBINHOOD POLICY ENGINE

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

File Systems for HPC Machines. Parallel I/O

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

Application Performance on IME

MAHA. - Supercomputing System for Bioinformatics

Robin Hood 2.5 on Lustre 2.5 with DNE

Emerging Technologies for HPC Storage

Assistance in Lustre administration

High Capacity network storage solutions

DELL Terascala HPC Storage Solution (DT-HSS2)

Enhancing Lustre Performance and Usability

Using file systems at HC3

Experiences with HP SFS / Lustre in HPC Production

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

Future of IO: a long and winding road

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

ABySS Performance Benchmark and Profiling. May 2010

HPC Storage Use Cases & Future Trends

Data Movement & Tiering with DMF 7

High Performance Computing. NEC LxFS Storage Appliance

CERN Lustre Evaluation

Outline. March 5, 2012 CIRMMT - McGill University 2

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

Cluster Setup and Distributed File System

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

Nathan Rutman SC09 Portland, OR. Lustre HSM

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Fujitsu's Lustre Contributions - Policy and Roadmap-

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

Blue Waters I/O Performance

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

New Storage Architectures

Analytics of Wide-Area Lustre Throughput Using LNet Routers

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

Lustre HPCS Design Overview. Andreas Dilger Senior Staff Engineer, Lustre Group Sun Microsystems

Embedded Filesystems (Direct Client Access to Vice Partitions)

HIGH PERFORMANCE COMPUTING FROM SUN

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

Lustre Metadata Fundamental Benchmark and Performance

LBRN - HPC systems : CCT, LSU

<Insert Picture Here> Lustre Development

NetApp High-Performance Storage Solution for Lustre

Transcription:

LCE: Lustre at CEA Stéphane Thiell CEA/DAM (stephane.thiell@cea.fr) 1

Lustre at CEA: Outline Lustre at CEA updates (2009) Open Computing Center (CCRT) updates CARRIOCAS (Lustre over WAN) project 2009-2010 R&D projects around Lustre Lustre 2.0 early evaluation Hardware: high-end storage systems prototypes Open source projects around Lustre 2010: Lustre and the TERA-100 project Data-centric architecture High-performances, multi-petabytes Lustre filesystems Lustre on TERA-100 2

Open Computing Center (CCRT) updates 2 production Linux compute clusters Platine: 50 Tflops (IB DDR) Titane: 150 Tflops (IB DDR/QDR mixed) Lustre: plenty of small files 2 Lustre filesystems per cluster: /scratch and /work 300 TB max per FS Up to 100 millions files seen on /scratch Accounting and monitoring managed by Robinhood 3

CARRIOCAS project Lustre over WAN 4 sites near Paris in France CEA/DIF Ter@tec, CEA/Saclay, EDF Clamart, Orsay University 40 Gbit/s (one channel) links between sites, 10 Gbit/s NICs One OST pool per site to control files localization 4

CARRIOCAS: Lustre configuration EDF 30 Gb/s CEA/Saclay 10 Gb/s 10 Gb/s Orsay/LAL OST[16-17] 30 Gb/s OST[18-21] CEA/DIF OST[0-15] MDS 5

CARRIOCAS project: some results Checkpoint/Restart 8 Lustre clients at EDF Clamart Servers at CEA/DIF (30 Gb/s max) LNET and TCP/IP tunings needed for WAN Results: 2880 MB/s write (22.4 Gb/s - 75% efficiency) 3120 MB/s read (24.9 Gb/s - 83% efficiency) Remote movie visualization Visualization wall at EDF Clamart Servers at CEA/DIF Hundred of GB per movie Result: 23,9 millions pixels at 40+ images/sec HD TV: 2 millions pixels at 25 images/sec 6

Early Lustre 2 evaluation at CEA/DIF Lustre 2.0 on TERA+ cluster CEA/DAM HPC R&D cluster 8 services nodes, 160 nodes (1280 compute cores) DDN S2A 9550 SATA Lustre storage Bug reporting made easy Lustre 2.0 on TERA-100 demonstrator 432 blade nodes cluster (Nehalem-EP) 2 x LSI XBB2 Lustre storage Some TERA+ Bull R422 nodes Lustre 2.0 on Global Lustre demonstrator DDN S2A 9550 Lustre storage 10 x Sun Fire X4270 Lustre servers Mounted on TERA-100 demonstrator through 3 LNET routers 7

DDN SFA10K High-end storage technologies prototyping Early DDN SFA10K test couplet (spring 2009) Features validation on TERA+ cluster with Lustre 2.0 LSI Pikes Peak Early LSI Pikes Peak SAS2 6 Gb/s prototype controller Multiple SAS 6Gb/s enclosures tests DDN SFA10K TERA+ test couplet LSI Pikes Peak SAS 6Gb/s controller (2009 prototype) LSI Pikes Peak SAS2 storage system (Camden enclosures) 8

Open source projects around Lustre at CEA Lustre/HSM binding Aurélien s presentation tomorrow morning Shine Lustre administration tool (Bull & CEA collab.) Latest version is 0.906, which adds router support and parallel fsck. http://lustre-shine.sourceforge.net Robinhood Monitor and purge large filesystems Now supports Lustre 2.0 changelogs http://robinhood.sourceforge.net NFS-Ganesha NFS server running in User Space Dedicated backend modules called FSAL (which stands for File System Abstraction Layer) eg. POSIX, HPSS, FSAL on top of Lustre 2.0 available since v0.99.52 http://nfs-ganesha.sourceforge.net 9

Lustre on TERA-100 10

TERA-100 data-centric architecture overview! = 100 GB/s Post processing, rendering clusters T100 compute nodes GL100 ST100 200 GB/s 20 GB/s Lustre HSM Lustre clients Global (shared) Lustre servers Storage servers, disks and tapes 300 GB/s T100 Private Lustre servers 11

Goal TERA-100 Private Lustre storage Provide enough bandwidth for checkpoint/restart and temporary files Requirements 300 GB/s global bandwidth on Lustre Part of TERA-100 machine (share the cluster interconnect) High disk density Delivery must start Q2 2010 (has started!) 12

TERA-100 Private Lustre storage architecture Metadata Cell 4 MDS (Bull Server MESCA 4S3U) 1 DDN SFA10K couplet (300000 IOPS) 16 I/O Cells of 4 OSS (Bull Server MESCA 4S3U) 2 DDN SFA10K couplets (10 GB/s each) DDN SFA10K SAS 3Gb backend cables 13

TERA Computing Center Global Lustre storage Data-centric architecture Zero-copy data access for post-processing clusters Create a very large HPSS cache filesystem Requirements 200 GB/s bandwidth with TERA-100 (on Lustre) 100 GB/s bandwidth with other clusters Total disk space >15PB High density Delivery by mid-2010 14

2 choices for Global Lustre storage system (GL100) DDN proposal Same as Private Lustre Storage (SFA10K) 60 disk slots DDN enclosure (SA6620) LSI proposal LSI Pikes Peak (with Wembley SAS2 enclosures) LSI Wembley SAS2 enclosure (60 disk slots) IB QDR FC-8 3 nodes IOCell with Pikes Peak 15

Global Lustre storage network architecture Infiniband QDR storage network Voltaire QDR Infiniband 4700 switch Lustre routers 42 LNET routers on TERA-100 (4 x IB QDR each) Networks separation Global filesystem QoS 16

Lustre on TERA-100 Lustre 2 on TERA-100 and Global Lustre Lustre/HSM binding readiness MDT changelogs (faster Robinhood!) Improved recovery Improved SMP scaling (useful for Bull MESCA nodes) ext4-ldiskfs (larger OSTs) includes Lustre 1.8 interesting features (OST pools, Adaptive timeouts, Version Based Recovery) Lustre administration Centralized with shine High Availability managed with shine (support smooth OST failover) and Bull tools (based on Pacemaker) 17

Questions? 18