GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Similar documents
ALCF Argonne Leadership Computing Facility

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

Storage for HPC, HPDA and Machine Learning (ML)

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Data Movement & Tiering with DMF 7

ANL Site Update. ScicomP/SPXXL Summer Meeting Ray Loy Ben Allen Gordon McPheeters Jack O'Connell William Scullin Tim Williams

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Storage Supporting DOE Science

The Fusion Distributed File System

Data storage services at KEK/CRC -- status and plan

GPFS for Life Sciences at NERSC

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

Parallel File Systems. John White Lawrence Berkeley National Lab

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Data Movement & Storage Using the Data Capacitor Filesystem

Emerging Technologies for HPC Storage

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

IBM Spectrum Scale IO performance

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

An introduction to IBM Spectrum Scale

High Performance Computing Data Management. Philippe Trautmann BDM High Performance Computing Global Research

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

AFM Use Cases Spectrum Scale User Meeting

ALCF Operations Best Practices and Highlights. Mark Fahey The 6 th AICS International Symposium Feb 22-23, 2016 RIKEN AICS, Kobe, Japan

Organizational Update: December 2015

An introduction to GPFS Version 3.3

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

An ESS implementation in a Tier 1 HPC Centre

University at Buffalo Center for Computational Research

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Using Operations Dashboard for ArcGIS: An Introduction

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Transitioning NCAR MSS to HPSS

Present and Future Leadership Computers at OLCF

An Introduction to GPFS

Challenges in making Lustre systems reliable

HPC Storage Use Cases & Future Trends

EMC Symmetrix DMX Series The High End Platform. Tom Gorodecki EMC

IBM CORAL HPC System Solution

On the Role of Burst Buffers in Leadership- Class Storage Systems

Introduction to HPC Parallel I/O

PART I - Fundamentals of Parallel Computing

I/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

TS7700 Technical Update TS7720 Tape Attach Deep Dive

Consolidating servers, storage, and incorporating virtualization allowed this publisher to expand with confidence in a challenging industry climate.

HPC Growing Pains. IT Lessons Learned from the Biomedical Data Deluge

Data Challenges in Photon Science. Manuela Kuhn GridKa School 2016 Karlsruhe, 29th August 2016

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Shared Services Canada Environment and Climate Change Canada HPC Renewal Project

System Software for Big Data and Post Petascale Computing

How your network can take on the cloud and win. Think beyond traditional networking toward a secure digital perimeter

Trends in HPC (hardware complexity and software challenges)

Application Performance on IME

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Execution Models for the Exascale Era

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

High Performance Computing Resources at MSU

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

IBM Storwize V7000 Unified

Habanero Operating Committee. January

HPSS for Spectrum Scale

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

After the Attack. Business Continuity. Planning and Testing Steps. Disaster Recovery. Business Impact Analysis (BIA) Succession Planning

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

DDN About Us Solving Large Enterprise and Web Scale Challenges

Improved Solutions for I/O Provisioning and Application Acceleration

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Introduction to High Performance Parallel I/O

Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems?

Brand-New Vector Supercomputer

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

Experiences using a multi-tiered GPFS file system at Mount Sinai. Bhupender Thakur Patricia Kovatch Francesca Tartagliogne Dansha Jiang

outthink limits Spectrum Scale Enhancements for CORAL Sarp Oral, Oak Ridge National Laboratory Gautam Shah, IBM

Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Fast Forward I/O & Storage

Active Archive and the State of the Industry

HCI: Hyper-Converged Infrastructure

Lessons Learned From A Very Unusual Year

Lessons learned from Lustre file system operation

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

The Impact of Disk Fragmentation on Servers. By David Chernicoff

Transcription:

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Argonne National Laboratory Argonne National Laboratory is located on 1,500 acres, 25 miles southwest of downtown Chicago. 2

A Quick Summary of Argonne 3450 full-time employees, 1250 of which are scientists and engineers, 290 post-docs Currently 8 major initiatives: Five of them are applied areas, solving significant national issues Biological and Environmental Systems: climate change and its impacts, computational models of biological processes and their economic, social, and health impact. Alternative Energy and Efficiency: Solar, alternate vehicle fuels, advanced engines Nuclear Energy: advanced fuel-cycle technologies that maximize the efficient use of nuclear fuel, reduce proliferation concerns and make waste disposal safer and more economical. National Security: threat decision science, sensors and materials, infrastructure assurance, emergency response, cyber security and the nonproliferation and forensics of radiological and nuclear materials and equipment Energy Storage: high performance batteries for all-electric vehicles, storage for the national electric grid and manufacturing processes for these materials-intensive devices Three of them aim to develop enabling next generation tools: Hard X-ray Sciences: leading-edge high-energy X-ray techniques to study materials and chemical processes under real conditions in real time Materials and Molecular Design and Discovery: making revolutionary advancements in the science of materials discovery and synthesis predicting, understanding and controlling where and how to place individual atoms and molecules to achieve desired material properties Leadership Computing: High-performance computing is becoming increasingly important as more scientists and engineers use modeling and simulation to study complex chemical processes, exotic new materials, advanced energy networks, natural ecosystems and sophisticated energy technologies 3

Production Systems: ALCF-2 Mira BG/Q system 49,152 nodes / 786,432 cores 786 TB of memory Peak flop rate: 10 PF Linpack flop rate: 8.1 PF Vesta - BG/Q system 2,048 nodes / 32,768 cores 32 TB of memory Peak flop rate: 419 TF Cetus - BG/Q system 1,024 nodes / 16,384 cores 16 TB of memory Peak flop rate: 209 TF Tukey NVIDIA system 100 nodes / 1600 x86 cores 200 M2070 GPUs 6 TB x86 memory / 1.1TB GPU memory Peak flop rate: 220 TF Storage - Scratch: 28.8 PB raw capacity, 240 GB/s bw (GPFS); Home: 1.8 PB raw capacity 4

Production Systems: ALCF-1 Intrepid BG/P system 40,960 nodes / 163,840 cores 80 TB of memory Peak flop rate: 557 TF Challenger BG/P system 1,024 nodes / 4,096 cores 2 TB of memory Peak flop rate: 13.9 TF Surveyor BG/P system 1,024 nodes / 4,096 cores 2 TB of memory Peak flop rate: 13.9 TF Eureka NVIDIA S-4 cluster 100 nodes / 800 2.0 GHz Xeon cores 3.2 TB of memory 200 NVIDIA FZ5600 GPUs Peak flop rate: 100 TF Storage Scratch: 6+ PB capability, 80 GB/s bandwidth (GPFS and PVFS); Tape: 15.9 PB of archival storage, 15,906 volume tape archive (HPSS) 5

Storage at ALCF Our primary file system is GPFS We also run a 500TB PVFS file system on Intrepid, but are GPFS only on Mira We have different storage policies than most of the other DOE labs We do not purge at all On intrepid, it was the honor system and we nagged when things got full; It worked, but not well. On Mira, we have quotas in effect, which forces the users to manage their disk space. Weird side effect: We have more available disk capacity than tape capacity We basically had one file system for each turn of the crank and everything mounted that one file system Very convenient for the users; Very little time spent on data management and movement. However, single point of failure. We ran this way for 5 years without problems, then had a hard power failure and found disk corruption. For a variety of reasons, it took nearly 3 weeks to fsck our 4.5PB file system and get it back online Having a second file system (PVFS) saved our bacon, because we had a few users that could run from there. We had just configured Mira the same way when the corruption hit. You can imagine what happened next 6

Our storage upgrade path 2Q -3Q 2014 TODAY By year end A,B,C,D,E,F,G,H ~400 GB/s 30 GSS A,B,C,D,E,F,G,H ~240 GB/s 16 DDN ~90 GB/s 6 DDN A,C,D,G,H ~240 GB/s 16 DDN B,E,F ~90 GB/s 6 DDN A,C,D,G,H ~240 GB/s 16 DDN B,E,F ~90 GB/s 6 DDN ~400 GB/s 30 GSS ~400 GB/s 30 GSS IBM GPFS Storage Server (GSS) Uses GPFS Native RAID (Software RAID) No controllers; x3650 servers with attached JBODs Per GPFS Development, this is the fastest GPFS filesystem in the world 7

Burst buffer like solution For those of you in the DOE HPC space, you will know what I mean; For those of you who are not, think L1 cache for storage. Before anyone yells off with his head I said burst buffer like, but it is not a burst buffer by the most common proposal (they don t exist yet) It does, however, share some characteristics Fast, but not big enough to permanently keep the data there (a cache) Improves performance for most common scenarios (defensive I/O, possibly in-situ analysis) Forces us to deal with how to manage moving data in and out without user intervention It keeps the aspects of our storage system that we like, but avoids the ones we don t: To the user, it still looks like a single namespace visible to everything They will see better performance than today We have three file systems so if one is down, some of our users can still run Makes use of GPFS Active File Management (AFM) But not in the way GPFS developers planned They designed it primarily for WAN file systems, so performance characteristics are different, particularly the speed of migration into and out of the cache. Collaborating with GPFS development team; Specific features are being added to support the burst buffer scenario. I am not going to talk about them because they are NDA, but IBM can if they would like 8

We would like to take this one more step We already use HPSS (tape storage) GPFS HPSS Interconnect (GHI) Use HPSS as a hierarchical storage back end for GPFS Policy driven object store Make a copy of a file on disk to tape shortly after creation (wait a while in case they delete) If the file goes some period of time (weeks or months) without being accessed delete the copy on disk It still shows up in the GPFS metadata and will be automatically be migrated back to disk if accessed. We have requested a change to this functionality. We would like to be able to configure it to throw an error instead to avoid accidental major migrations. Gives us a backup of scratch, something most HPC centers (including us) do not do Ideally, users never have to touch their data; Policy will automatically take care of migrating to tape and removing data from disk that has not been used; Entry stays in GPFS metadata and can be explicitly recalled by command or file is automatically recalled from tape if it is accessed. This includes long term archival after a project ends. Use GPFS filesets; New feature allows filesets to have their own inode pool, so don t have to worry about inode pool getting too large (at least we hope, proof will be in the pudding) When a project ends, policy will move all the data from disk, but leave their fileset in place. If they every come back and ask for a file, it is a simple copy. 9