Lustre at Scale The LLNL Way

Size: px
Start display at page:

Download "Lustre at Scale The LLNL Way"

Transcription

1 Lustre at Scale The LLNL Way D. Marc Stearman Lustre Administration Lead Livermore uting - LLNL This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA LLNL-PRES

2 Topics Project Structure LLNL uting Platforms Network Design and Topology Software Release Methodology Operation and Management Next Steps Hyperion Concerns 2

3 Project Structure - Who are all these people, and what are they doing here? Project Lead - Mark Gary Administration/Operations Team - Marc Stearman Responsible for daily system administration, cluster integration, upgrades, hardware repair, user application support 4 Full Time Employees + HW Repair Team Software Development - Jim Garlick Responsible for bug fixing, build, QA, and tool development 4 Full Time Employees 3

4 HPC ute/storage Pairing Philosophies Islands of Storage Many HPC clusters, with dedicated attached storage Data access internal to cluster High performance - Uses local high-speed interconnect Many copies of data running on multiple clusters Peninsulas of utation Many HPC clusters using shared network storage User convenience - Fewer copies of data Redundant - if one file system is down, others are still available Extra network latency Interactiv e Network Interactive Network Login Gateway Gateway Login Login Gateway..... High Speed Interconnect High Speed Interconnect Gateway Storage Network 4

5 Peninsulas Explained Login Gateway... High Speed Interconnect Gateway Edge Router Lustre: /p/lscratcha. All storage visible via storage network ute clusters have some Lustre file systems more local than others Lustre Network Core Routers Edge Router Gateway. Gateway.. Login.... High Speed Interconnect Lustre: /p/lscratchb 5

6 Livermore uting - Open uting Facility IGS (test) ALC 9 TFLOPS Atlas 44 TFLOPS Thunder 23 TFLOPS 95TB Zeus 11 TFLOPS Prism 1 TFLOP BGL-Dev 6 TFLOPS Lustre Federated Ethernet Core Network Yana 3 TFLOPS lscratcha lscratchb 1.2PB 12 GB/s Total: ~100 TFLOPS ~2PB 744TB 8 GB/s 6

7 Livermore uting - Open uting Facility IGS (test) ALC 9 TFLOPS Atlas 44 TFLOPS Thunder 23 TFLOPS 95TB Zeus 11 TFLOPS Prism 1 TFLOP BGL-Dev 6 TFLOPS Lustre Federated Ethernet Core Network Yana 3 TFLOPS lscratcha lscratchb 1.2PB Total: ~100 TFLOPS ~2PB 744TB 7

8 LNET view of the world - Open uting Facility tcp0 elan0,elan3 o2ib1 elan4 95TB o2ib0 tcp0 LNET tcp0 tcp0 tcp0 tcp0 tcp0 1.2PB 744TB 8

9 Livermore uting - Secure uting Facility Total: 665 TFLOPS 3.5 PB BGL 596 TFLOPS Minos 33 TFLOPS Rhea 22 TFLOPS Gauss 2 TFLOPS Hopi 3 TFLOPS Lustre Federated Ethernet Core Network Lilac 9 TFLOPS lscratch1 lscratch3 1PB 24 GB/s 2.5PB 26 GB/s 9

10 Livermore uting - Secure uting Facility Total: 665 TFLOPS 3.5 PB BGL 596 TFLOPS Minos 33 TFLOPS Gauss 2 TFLOPS 15GB/s 75GB/s Rhea 22 TFLOPS Hopi 3 TFLOPS 2.5GB/s 45GB/s Lustre Federated Ethernet Core Network 30GB/s Lilac 9 TFLOPS lscratch1 lscratch3 1PB 2.5PB 10

11 LNET view of the world - Secure uting Facility Total: 665 TFLOPS 3.5 PB tcp0 o2ib1 75GB/s o2ib0 tcp0 15GB/s tcp0 2.5GB/s Lustre Federated Ethernet Core Network 30GB/s elan0 45GB/s tcp0 tcp0 1PB 2.5PB 11

12 Livermore uting - Next Steps (LNET) TCP0 Lustre Federated Ethernet Core Network TCP1 TCP2 Edge Routers Gateway. Gateway Login High Speed Interconnect.... Lustre File System Expansion Servers Expansion Servers 12

13 Livermore uting - Next Steps (LNET) Benefits Reduced network traffic through the core switches Keeps traffic local to switch backplanes and reducess latency Saves Money! Issues Need dynamic NIDs Routes can be added dynamically via lctl on cluster with routers, but tcp-only clients on tcp0 need unload of lustre modules to add tcp1 and tcp2 NIDs With multiple lustre filesystems mounted, ALL filesystems must be unmounted so all jobs are killed limiting production work Makes LNET configuration more complex (But we already have that) 13

14 SLIC - Storage Lustre Interface Cluster Lustre FS Lustre Net SLIC Archive Lustre FS Storage Net Archive 14

15 Software Release - Testing Methodology Sanity test in Build Farm Small scale testing in Testbed Mid scale testing on BGL-Dev, and new 64 node cluster Large scale testing on ALC (~400 clients) Giant scale testing on Atlas (1100 clients) during DST (Dedicated System Time) Wide variety of tests - IOR, iozone, fsx, MIB, mdtest, simul, PIOS, various reproducers Developers are using Lustre for their /home file system We carry 150+ patches from base releases to make lustre work on our production systems - This requires a great deal of testing resources 15

16 Software Release - Rollout Methodology Midway through migration from 1.4 to 1.6 Servers before Clients All servers now running chaos ( patches) All x86_64 clients and BGL moving to chaos All i686 and ia64 clients will remain at chaos ( patches) until they retire (about 6-9 months from now) The 1.4 clients can mount the 1.6 file systems because they were created under 1.4, and migrated to 1.6 If we field a new file system we have three options for our legacy clients Create it at 1.4 and migrate to 1.6 Develop a tool that will read the 1.6 on-disk configuration, and write out 1.4 style config Not mount legacy clients 16

17 Operations and Management - Sameness Storage Scalable Units (SSUs) Concept borrowed from ute Clusters Building blocks used to build/expand file systems Makes deployment smooth and quick Hardware Repair team Operations Personnel handle all HW repair actions with minimal SysAdmin intervention Operator training Training courses developed locally Knowledge Base on local wiki Goal to reduce off-hours pages Numerous scripts for testing and problem determination LMT v2 17

18 LMT v2 - Start with xwatch-lustre functionality, then add: New views (OSS, Filesystem, Router Group, ) Plotting capability (historical trends, heart-beat, ) Customization features Full-system health at a glance 18

19 Livermore uting - Next Steps Filesystem Requirements Dawn (.5 PFLOPS): 96 GB/s, ~4PB Sequoia (10 PFLOPS): 512 GB/s, ~50PB Router Cluster Sequoia may have an Infiniband network Need to bridge Infiniband with legacy 10GE Net Evaluating ZFS (Both User Space and In Kernel) Failover 19

20 Hyperion Will replace ALC as large scale lustre test cluster Hyperion is a partnership with the vendor community, including Sun/CFS Will test and evaluate emerging technologies - OFED, Virtualization, Lustre, QDR Infiniband, 40/100 GE, etc 1152 nodes, 9K cores, 120 TFLOPS 1.6 PB >36 GB/s Two Lustre Networks Infiniband 10 Gigabit Ethernet 20

21 Concerns Space Management How do you manage a 50 PB file system? Quotas? Purges? Conventional tools do not scale ls, tar, cp, rsync, etc Moving data to Archive efficiently ZFS Performance: User space vs. Kernel space Migration from ldiskfs to ZFS Scaling concerns Metadata Performance Adaptive Timeouts Multi/Many core parallelization Seemless Failover 21

22 Questions 22

LLNL Lustre Centre of Excellence

LLNL Lustre Centre of Excellence LLNL Lustre Centre of Excellence Mark Gary 4/23/07 This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under

More information

An Overview of Fujitsu s Lustre Based File System

An Overview of Fujitsu s Lustre Based File System An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS

More information

Experiences with HP SFS / Lustre in HPC Production

Experiences with HP SFS / Lustre in HPC Production Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre

More information

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM LCE: Lustre at CEA Stéphane Thiell CEA/DAM (stephane.thiell@cea.fr) 1 Lustre at CEA: Outline Lustre at CEA updates (2009) Open Computing Center (CCRT) updates CARRIOCAS (Lustre over WAN) project 2009-2010

More information

The Hyperion Project: Collaboration for an Advanced Technology Cluster Testbed. November 2008

The Hyperion Project: Collaboration for an Advanced Technology Cluster Testbed. November 2008 1 The Hyperion Project: Collaboration for an Advanced Technology Cluster Testbed November 2008 Extending leadership to the HPC community November 2008 2 Motivation Collaborations Hyperion Cluster Timeline

More information

Lustre / ZFS at Indiana University

Lustre / ZFS at Indiana University Lustre / ZFS at Indiana University HPC-IODC Workshop, Frankfurt June 28, 2018 Stephen Simms Manager, High Performance File Systems ssimms@iu.edu Tom Crowe Team Lead, High Performance File Systems thcrowe@iu.edu

More information

The RAMDISK Storage Accelerator

The RAMDISK Storage Accelerator The RAMDISK Storage Accelerator A Method of Accelerating I/O Performance on HPC Systems Using RAMDISKs Tim Wickberg, Christopher D. Carothers wickbt@rpi.edu, chrisc@cs.rpi.edu Rensselaer Polytechnic Institute

More information

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013 Lustre on ZFS At The University of Wisconsin Space Science and Engineering Center Scott Nolin September 17, 2013 Why use ZFS for Lustre? The University of Wisconsin Space Science and Engineering Center

More information

TOSS - A RHEL-based Operating System for HPC Clusters

TOSS - A RHEL-based Operating System for HPC Clusters TOSS - A RHEL-based Operating System for HPC Clusters Supercomputing 2017 Red Hat Booth November 14, 2017 Ned Bass System Software Development Group Leader Livermore Computing Division LLNL-PRES-741473

More information

Extraordinary HPC file system solutions at KIT

Extraordinary HPC file system solutions at KIT Extraordinary HPC file system solutions at KIT Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State Roland of Baden-Württemberg Laifer Lustre and tools for ldiskfs investigation

More information

Fujitsu's Lustre Contributions - Policy and Roadmap-

Fujitsu's Lustre Contributions - Policy and Roadmap- Lustre Administrators and Developers Workshop 2014 Fujitsu's Lustre Contributions - Policy and Roadmap- Shinji Sumimoto, Kenichiro Sakai Fujitsu Limited, a member of OpenSFS Outline of This Talk Current

More information

DL-SNAP and Fujitsu's Lustre Contributions

DL-SNAP and Fujitsu's Lustre Contributions Lustre Administrator and Developer Workshop 2016 DL-SNAP and Fujitsu's Lustre Contributions Shinji Sumimoto Fujitsu Ltd. a member of OpenSFS 0 Outline DL-SNAP Background: Motivation, Status, Goal and Contribution

More information

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 DVS, GPFS and External Lustre at NERSC How It s Working on Hopper Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 1 NERSC is the Primary Computing Center for DOE Office of Science NERSC serves

More information

A More Realistic Way of Stressing the End-to-end I/O System

A More Realistic Way of Stressing the End-to-end I/O System A More Realistic Way of Stressing the End-to-end I/O System Verónica G. Vergara Larrea Sarp Oral Dustin Leverman Hai Ah Nam Feiyi Wang James Simmons CUG 2015 April 29, 2015 Chicago, IL ORNL is managed

More information

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Feedback on BeeGFS. A Parallel File System for High Performance Computing Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December

More information

Fujitsu s Contribution to the Lustre Community

Fujitsu s Contribution to the Lustre Community Lustre Developer Summit 2014 Fujitsu s Contribution to the Lustre Community Sep.24 2014 Kenichiro Sakai, Shinji Sumimoto Fujitsu Limited, a member of OpenSFS Outline of This Talk Fujitsu s Development

More information

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016 Mission-Critical Lustre at Santos Adam Fox, Lustre User Group 2016 About Santos One of the leading oil and gas producers in APAC Founded in 1954 South Australia Northern Territory Oil Search Cooper Basin

More information

The Spider Center-Wide File System

The Spider Center-Wide File System The Spider Center-Wide File System Presented by Feiyi Wang (Ph.D.) Technology Integration Group National Center of Computational Sciences Galen Shipman (Group Lead) Dave Dillow, Sarp Oral, James Simmons,

More information

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions Roger Goff Senior Product Manager DataDirect Networks, Inc. What is Lustre? Parallel/shared file system for

More information

Integration Path for Intel Omni-Path Fabric attached Intel Enterprise Edition for Lustre (IEEL) LNET

Integration Path for Intel Omni-Path Fabric attached Intel Enterprise Edition for Lustre (IEEL) LNET Integration Path for Intel Omni-Path Fabric attached Intel Enterprise Edition for Lustre (IEEL) LNET Table of Contents Introduction 3 Architecture for LNET 4 Integration 5 Proof of Concept routing for

More information

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November 2008 Abstract This paper provides information about Lustre networking that can be used

More information

Assistance in Lustre administration

Assistance in Lustre administration Assistance in Lustre administration Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu

More information

The modules covered in this course are:

The modules covered in this course are: CORE Course description CORE is the first course in the Intel Solutions for Lustre* training curriculum. You ll learn about the various Intel Solutions for Lustre* software, Linux and Lustre* fundamentals

More information

Lustre at the OLCF: Experiences and Path Forward. Galen M. Shipman Group Leader Technology Integration

Lustre at the OLCF: Experiences and Path Forward. Galen M. Shipman Group Leader Technology Integration Lustre at the OLCF: Experiences and Path Forward Galen M. Shipman Group Leader Technology Integration A Demanding Computational Environment Jaguar XT5 18,688 Nodes Jaguar XT4 7,832 Nodes Frost (SGI Ice)

More information

Lessons learned from Lustre file system operation

Lessons learned from Lustre file system operation Lessons learned from Lustre file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association

More information

HPC NETWORKING IN THE REAL WORLD

HPC NETWORKING IN THE REAL WORLD 15 th ANNUAL WORKSHOP 2019 HPC NETWORKING IN THE REAL WORLD Jesse Martinez Los Alamos National Laboratory March 19 th, 2019 [ LOGO HERE ] LA-UR-19-22146 ABSTRACT Introduction to LANL High Speed Networking

More information

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu Collaborators Organizations National

More information

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select

More information

Simplified Multi-Tenancy for Data Driven Personalized Health Research

Simplified Multi-Tenancy for Data Driven Personalized Health Research Simplified Multi-Tenancy for Data Driven Personalized Health Research Diego Moreno HPC Storage Specialist @ Scientific IT Services, ETH Zürich LAD 2018, Paris Agenda ETH Zurich and the Scientific IT Services

More information

5.4 - DAOS Demonstration and Benchmark Report

5.4 - DAOS Demonstration and Benchmark Report 5.4 - DAOS Demonstration and Benchmark Report Johann LOMBARDI on behalf of the DAOS team September 25 th, 2013 Livermore (CA) NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH

More information

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Architecting Storage for Semiconductor Design: Manufacturing Preparation White Paper Architecting Storage for Semiconductor Design: Manufacturing Preparation March 2012 WP-7157 EXECUTIVE SUMMARY The manufacturing preparation phase of semiconductor design especially mask data

More information

Outline. March 5, 2012 CIRMMT - McGill University 2

Outline. March 5, 2012 CIRMMT - McGill University 2 Outline CLUMEQ, Calcul Quebec and Compute Canada Research Support Objectives and Focal Points CLUMEQ Site at McGill ETS Key Specifications and Status CLUMEQ HPC Support Staff at McGill Getting Started

More information

Lustre overview and roadmap to Exascale computing

Lustre overview and roadmap to Exascale computing HPC Advisory Council China Workshop Jinan China, October 26th 2011 Lustre overview and roadmap to Exascale computing Liang Zhen Whamcloud, Inc liang@whamcloud.com Agenda Lustre technology overview Lustre

More information

Design and Evaluation of a 2048 Core Cluster System

Design and Evaluation of a 2048 Core Cluster System Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December

More information

LNET MULTI-RAIL RESILIENCY

LNET MULTI-RAIL RESILIENCY 13th ANNUAL WORKSHOP 2017 LNET MULTI-RAIL RESILIENCY Amir Shehata, Lustre Network Engineer Intel Corp March 29th, 2017 OUTLINE Multi-Rail Recap Base Multi-Rail Dynamic Discovery Multi-Rail performance

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John

More information

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

Lustre architecture for Riccardo Veraldi for the LCLS IT Team Lustre architecture for LCLS@SLAC Riccardo Veraldi for the LCLS IT Team 2 LCLS Experimental Floor 3 LCLS Parameters 4 LCLS Physics LCLS has already had a significant impact on many areas of science, including:

More information

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1 STORAGE @ TGCC OVERVIEW CEA 10 AVRIL 2012 PAGE 1 CONTEXT Data-Centric Architecture Centralized storage, accessible from every TGCC s compute machines Make cross-platform data sharing possible Mutualized

More information

Parallel File Systems Compared

Parallel File Systems Compared Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features

More information

Delivering HPC Performance at Scale

Delivering HPC Performance at Scale Delivering HPC Performance at Scale October 2011 Joseph Yaworski QLogic Director HPC Product Marketing Office: 610-233-4854 Joseph.Yaworski@QLogic.com Agenda QLogic Overview TrueScale Performance Design

More information

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,

More information

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS OVERVIEW When storage demands and budget constraints collide, discovery suffers. And it s a growing problem. Driven by ever-increasing performance and

More information

MySQL in the Cloud Tricks and Tradeoffs

MySQL in the Cloud Tricks and Tradeoffs MySQL in the Cloud Tricks and Tradeoffs Thorsten von Eicken CTO RightScale 1 MySQL & Amazon EC2 @RightScale Operating in Amazon EC2 since fall 2006 Cloud Computing Management System Replicated MySQL product

More information

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan) LUG 2012 From Lustre 2.1 to Lustre HSM Lustre @ IFERC (Rokkasho, Japan) Diego.Moreno@bull.net From Lustre-2.1 to Lustre-HSM - Outline About Bull HELIOS @ IFERC (Rokkasho, Japan) Lustre-HSM - Basis of Lustre-HSM

More information

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group The cluster system Introduction 22th February 2018 Jan Saalbach Scientific Computing Group cluster-help@luis.uni-hannover.de Contents 1 General information about the compute cluster 2 Available computing

More information

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible

More information

Shared Object-Based Storage and the HPC Data Center

Shared Object-Based Storage and the HPC Data Center Shared Object-Based Storage and the HPC Data Center Jim Glidewell High Performance Computing BOEING is a trademark of Boeing Management Company. Computing Environment Cray X1 2 Chassis, 128 MSPs, 1TB memory

More information

BeeGFS Solid, fast and made in Europe

BeeGFS Solid, fast and made in Europe David Ramírez Alvarez HPC INTEGRATOR MANAGER WWW.SIE.ES dramirez@sie.es ADMINTECH 2016 BeeGFS Solid, fast and made in Europe www.beegfs.com Thanks to Sven for info!!!, CEO, ThinkParQ What is BeeGFS? BeeGFS

More information

GPFS for Life Sciences at NERSC

GPFS for Life Sciences at NERSC GPFS for Life Sciences at NERSC A NERSC & JGI collaborative effort Jason Hick, Rei Lee, Ravi Cheema, and Kjiersten Fagnan GPFS User Group meeting May 20, 2015-1 - Overview of Bioinformatics - 2 - A High-level

More information

LBRN - HPC systems : CCT, LSU

LBRN - HPC systems : CCT, LSU LBRN - HPC systems : CCT, LSU HPC systems @ CCT & LSU LSU HPC Philip SuperMike-II SuperMIC LONI HPC Eric Qeenbee2 CCT HPC Delta LSU HPC Philip 3 Compute 32 Compute Two 2.93 GHz Quad Core Nehalem Xeon 64-bit

More information

Toward An Integrated Cluster File System

Toward An Integrated Cluster File System Toward An Integrated Cluster File System Adrien Lebre February 1 st, 2008 XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576 Outline Context Kerrighed and root file

More information

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012 Lustre on ZFS Andreas Dilger Software Architect High Performance Data Division September, 24 2012 1 Introduction Lustre on ZFS Benefits Lustre on ZFS Implementation Lustre Architectural Changes Development

More information

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division Andreas Dilger Principal Lustre Engineer High Performance Data Division Focus on Performance and Ease of Use Beyond just looking at individual features... Incremental but continuous improvements Performance

More information

Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work

Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work The Salishan Conference on High-Speed Computing April 26, 2016 Adam Moody

More information

<Insert Picture Here> Lustre Development

<Insert Picture Here> Lustre Development Lustre Development Eric Barton Lead Engineer, Lustre Group Lustre Development Agenda Engineering Improving stability Sustaining innovation Development Scaling

More information

HPC Downtime Budgets: Moving SRE Practice to the Rest of the World

HPC Downtime Budgets: Moving SRE Practice to the Rest of the World LA-UR-16-24361 HPC Downtime Budgets: Moving SRE Practice to the Rest of the World SREcon Europe 2016 Cory Lueninghoener July 12, 2016 Operated by Los Alamos National Security, LLC for the U.S. Department

More information

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl DDN s Vision for the Future of Lustre LUG2015 Robert Triendl 3 Topics 1. The Changing Markets for Lustre 2. A Vision for Lustre that isn t Exascale 3. Building Lustre for the Future 4. Peak vs. Operational

More information

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair Open SFS Roadmap Presented by David Dillow TWG Co-Chair TWG Mission Work with the Lustre community to ensure that Lustre continues to support the stability, performance, and management requirements of

More information

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś Presentation plan 2 Cyfronet introduction System description SLURM modifications Job

More information

SDSC s Data Oasis Gen II: ZFS, 40GbE, and Replication

SDSC s Data Oasis Gen II: ZFS, 40GbE, and Replication SDSC s Data Oasis Gen II: ZFS, 40GbE, and Replication Rick Wagner HPC Systems Manager San Diego Supercomputer Center Comet HPC for the long tail of science iphone panorama photograph of 1 of 2 server rows

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

Data Movement & Tiering with DMF 7

Data Movement & Tiering with DMF 7 Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019 Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2 Why

More information

Multi-tenancy: a real-life implementation

Multi-tenancy: a real-life implementation Multi-tenancy: a real-life implementation April, 2018 Sebastien Buisson Thomas Favre-Bulle Richard Mansfield Multi-tenancy: a real-life implementation The Multi-Tenancy concept Implementation alternative:

More information

Isilon Performance. Name

Isilon Performance. Name 1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.

More information

DELL Terascala HPC Storage Solution (DT-HSS2)

DELL Terascala HPC Storage Solution (DT-HSS2) DELL Terascala HPC Storage Solution (DT-HSS2) A Dell Technical White Paper Dell Li Ou, Scott Collier Terascala Rick Friedman Dell HPC Solutions Engineering THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

ACCRE High Performance Compute Cluster

ACCRE High Performance Compute Cluster 6 중 1 2010-05-16 오후 1:44 Enabling Researcher-Driven Innovation and Exploration Mission / Services Research Publications User Support Education / Outreach A - Z Index Our Mission History Governance Services

More information

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations Argonne National Laboratory Argonne National Laboratory is located on 1,500

More information

Oak Ridge National Laboratory Computing and Computational Sciences

Oak Ridge National Laboratory Computing and Computational Sciences Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman

More information

Parallel File Systems for HPC

Parallel File Systems for HPC Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical

More information

Filesystems on SSCK's HP XC6000

Filesystems on SSCK's HP XC6000 Filesystems on SSCK's HP XC6000 Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Overview» Overview of HP SFS at SSCK HP StorageWorks Scalable File Share (SFS) based on

More information

Deep Learning on SHARCNET:

Deep Learning on SHARCNET: Deep Learning on SHARCNET: Best Practices Fei Mao Outlines What does SHARCNET have? - Hardware/software resources now and future How to run a job? - A torch7 example How to train in parallel: - A Theano-based

More information

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski Lustre Paul Bienkowski 2bienkow@informatik.uni-hamburg.de Proseminar Ein-/Ausgabe - Stand der Wissenschaft 2013-06-03 1 / 34 Outline 1 Introduction 2 The Project Goals and Priorities History Who is involved?

More information

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW UKOUG RAC SIG Meeting London, October 24 th, 2006 Luca Canali, CERN IT CH-1211 LCGenève 23 Outline Oracle at CERN Architecture of CERN

More information

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc. UK LUG 10 th July 2012 Lustre at Exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Exascale I/O requirements Exascale I/O model 3 Lustre at Exascale - UK LUG 10th July 2012 Exascale I/O

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Dell TM Terascala HPC Storage Solution

Dell TM Terascala HPC Storage Solution Dell TM Terascala HPC Storage Solution A Dell Technical White Paper Li Ou, Scott Collier Dell Massively Scale-Out Systems Team Rick Friedman Terascala THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY,

More information

WLS Neue Optionen braucht das Land

WLS Neue Optionen braucht das Land WLS Neue Optionen braucht das Land Sören Halter Principal Sales Consultant 2016-11-16 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information

More information

Olaf Weber Senior Software Engineer SGI Storage Software. Amir Shehata Lustre Network Engineer Intel High Performance Data Division

Olaf Weber Senior Software Engineer SGI Storage Software. Amir Shehata Lustre Network Engineer Intel High Performance Data Division Olaf Weber Senior Software Engineer SGI Storage Software Amir Shehata Lustre Network Engineer Intel High Performance Data Division Intel and the Intel logo are trademarks or registered trademarks of Intel

More information

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2 Software

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

Toward Understanding Life-Long Performance of a Sonexion File System

Toward Understanding Life-Long Performance of a Sonexion File System Toward Understanding Life-Long Performance of a Sonexion File System CUG 2015 Mark Swan, Doug Petesch, Cray Inc. dpetesch@cray.com Safe Harbor Statement This presentation may contain forward-looking statements

More information

Comet Virtualization Code & Design Sprint

Comet Virtualization Code & Design Sprint Comet Virtualization Code & Design Sprint SDSC September 23-24 Rick Wagner San Diego Supercomputer Center Meeting Goals Build personal connections between the IU and SDSC members of the Comet team working

More information

HPC at UZH: status and plans

HPC at UZH: status and plans HPC at UZH: status and plans Dec. 4, 2013 This presentation s purpose Meet the sysadmin team. Update on what s coming soon in Schroedinger s HW. Review old and new usage policies. Discussion (later on).

More information

Lustre usages and experiences

Lustre usages and experiences Lustre usages and experiences at German Climate Computing Centre in Hamburg Carsten Beyer High Performance Computing Center Exclusively for the German Climate Research Limited Company, non-profit Staff:

More information

The State and Needs of IO Performance Tools

The State and Needs of IO Performance Tools The State and Needs of IO Performance Tools Scalable Tools Workshop Lake Tahoe, CA August 6 12, 2017 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National

More information

Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-001 contract.

Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-001 contract. Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-1 contract. Revision History Date Revision Author 26/2/14 Original R. Henwood 13/3/14

More information

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

BeeGFS.   Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting BeeGFS The Parallel Cluster File System Container Workshop ISC 28.7.18 www.beegfs.io July 2018 Marco Merkel VP ww Sales, Consulting HPC & Cognitive Workloads Demand Today Flash Storage HDD Storage Shingled

More information

Community Release Update

Community Release Update Community Release Update LAD 2017 Peter Jones HPDD, Intel OpenSFS Lustre Working Group OpenSFS Lustre Working Group Lead by Peter Jones (Intel) and Dustin Leverman (ORNL) Single forum for all Lustre development

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

NetApp High-Performance Storage Solution for Lustre

NetApp High-Performance Storage Solution for Lustre Technical Report NetApp High-Performance Storage Solution for Lustre Solution Design Narjit Chadha, NetApp October 2014 TR-4345-DESIGN Abstract The NetApp High-Performance Storage Solution (HPSS) for Lustre,

More information

Everyday* Lustre. A short survey on Lustre tools. Sven Trautmann Engineer, Lustre Giraffe Team, Sun Microsystems

Everyday* Lustre. A short survey on Lustre tools. Sven Trautmann Engineer, Lustre Giraffe Team, Sun Microsystems Everyday* Lustre A short survey on Lustre tools Sven Trautmann Engineer, Lustre Giraffe Team, Sun Microsystems Outline Motivation Setup Tools Management Tools Monitoring Conclusion Who am I, What am I

More information

Refining and redefining HPC storage

Refining and redefining HPC storage Refining and redefining HPC storage High-Performance Computing Demands a New Approach to HPC Storage Stick with the storage status quo and your story has only one ending more and more dollars funneling

More information

Disruptive Storage Workshop Hands-On Lustre

Disruptive Storage Workshop Hands-On Lustre Disruptive Storage Workshop Hands-On Lustre Mark Miller http://www.pinedalab.org/disruptive-storage-workshop/ Schedule - Setting up our VMs (5 minutes 15 minutes to percolate) - Installing Lustre (30 minutes

More information

A Comparative Study of High Performance Computing on the Cloud. Lots of authors, including Xin Yuan Presentation by: Carlos Sanchez

A Comparative Study of High Performance Computing on the Cloud. Lots of authors, including Xin Yuan Presentation by: Carlos Sanchez A Comparative Study of High Performance Computing on the Cloud Lots of authors, including Xin Yuan Presentation by: Carlos Sanchez What is The Cloud? The cloud is just a bunch of computers connected over

More information

Analytics of Wide-Area Lustre Throughput Using LNet Routers

Analytics of Wide-Area Lustre Throughput Using LNet Routers Analytics of Wide-Area Throughput Using LNet Routers Nagi Rao, Neena Imam, Jesse Hanley, Sarp Oral Oak Ridge National Laboratory User Group Conference LUG 2018 April 24-26, 2018 Argonne National Laboratory

More information

Improved Versioning, Building, and Distribution of Lustre

Improved Versioning, Building, and Distribution of Lustre Improved Versioning, Building, and Distribution of Lustre LUG 2016 Christopher J. Morrone, Giuseppe Di Natale April 5, 2016 This work was performed under the auspices of the U.S. Department of Energy by

More information