GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

Similar documents
GPFS on a Cray XT. 1 Introduction. 2 GPFS Overview

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Storage Supporting DOE Science

Parallel File Systems Compared

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract

Parallel File Systems for HPC

Community Release Update

Lustre at Scale The LLNL Way

Parallel File Systems. John White Lawrence Berkeley National Lab

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

An Introduction to GPFS

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

An introduction to GPFS Version 3.3

Community Release Update

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

The modules covered in this course are:

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

Analytics of Wide-Area Lustre Throughput Using LNet Routers

An Overview of Fujitsu s Lustre Based File System

Lustre at the OLCF: Experiences and Path Forward. Galen M. Shipman Group Leader Technology Integration

Using Resource Utilization Reporting to Collect DVS Usage Statistics

GPFS for Life Sciences at NERSC

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

New Storage Architectures

NOW and the Killer Network David E. Culler

System input-output, performance aspects March 2009 Guy Chesnot

Multi-tenancy: a real-life implementation

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

LLNL Lustre Centre of Excellence

Cray Operating System and I/O Road Map Charlie Carroll

A GPFS Primer October 2005

An ESS implementation in a Tier 1 HPC Centre

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

High Performance Storage Solutions

The Spider Center-Wide File System

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

The GUPFS Project at NERSC Greg Butler, Rei Lee, Michael Welcome. NERSC Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley CA USA

Introduction to HPC Parallel I/O

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

1 Title. 2 Team Information. 3 Abstract. 4 Problem Statement and Hypothesis. Zest: The Maximum Reliable TBytes/sec/$ for Petascale Systems

ACCRE High Performance Compute Cluster

InfiniBand Networked Flash Storage

Analytics in the cloud

Benoit DELAUNAY Benoit DELAUNAY 1

Voltaire Making Applications Run Faster

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Blue Waters System Overview. Greg Bauer

FUJITSU PHI Turnkey Solution

Lustre overview and roadmap to Exascale computing

Application and System Memory Use, Configuration, and Problems on Bassi. Richard Gerber

Lessons learned from Lustre file system operation

Fujitsu's Lustre Contributions - Policy and Roadmap-

HPC NETWORKING IN THE REAL WORLD

Implementing Storage in Intel Omni-Path Architecture Fabrics

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

Efficient Object Storage Journaling in a Distributed Parallel File System

Data Movement & Storage Using the Data Capacitor Filesystem

CISL Update. 29 April Operations and Services Division

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

HPSS Treefrog Summary MARCH 1, 2018

SurFS Product Description

Dr. John Dennis

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise

Feedback on BeeGFS. A Parallel File System for High Performance Computing

An introduction to IBM Spectrum Scale

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

Table 9. ASCI Data Storage Requirements

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

Lustre usages and experiences

The RAMDISK Storage Accelerator

Initial Performance Evaluation of the Cray SeaStar Interconnect

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

Steven Carter. Network Lead, NCCS Oak Ridge National Laboratory OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY 1

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

Storage and Storage Access

The Fastest And Most Efficient Block Storage Software (SDS)

Challenges in making Lustre systems reliable

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Hosted Microsoft Exchange Server 2003 Deployment Utilizing Network Appliance Storage Solutions

The Hyperion Project: Collaboration for an Advanced Technology Cluster Testbed. November 2008

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

Data Staging: Moving large amounts of data around, and moving it close to compute resources

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

The Spider Center Wide File System

NERSC. National Energy Research Scientific Computing Center

Transcription:

GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS Mounting GPFS on a Cray XT DVS Future Plans

NERSC Global File System NERSC Global File System (NGF) provides a common global file system for the NERSC systems. In Production since 2005 Currently mounted on all major systems IBM SP, Cray XT4, SGI Altix, and commodity clusters Currently provides Project space Targeted for files that need to be shared across a project and/or used on multiple systems.

NGF and GPFS NERSC signed a contract with IBM in July 2008 for GPFS Contract extends through 2014. Covers all major NERSC systems through NERSC6 including non- Leadership systems such as Bassi and Jacquard. Option for NERSC7

NGF Topology NGF Nodes Ethernet Network Franklin NGF Disk Bassi pnsd BASSI NGF SAN Planck pnsd PDSF Planck PDSF NGF- FRANKLIN SAN Jacq FRANKLIN SAN Login DVS Lustre Franklin Disk Compute Node

GPFS Overview Share disk model Distributed lock manager Supports SAN mode and Network Shared Disk modes (mixed) Primarily TCP/IP but supports RDMA and Federation for low overhead, high bandwidth Feature rich and very stable Largest deployment: LLNL Purple 120 GB/s, ~1,500 clients

Comparisons Design and Capability Design GPFS Lustre Storage Model Shared Disk Object Locking Distributed Central (OST) Transport TCP (w/ RDMA) LNET (Routable Multi-Network) Scaling (Demonstrated) Clients 1,500 25,000 Bandwidth 120 GB/s 200 GB/s

GPFS Architecture TCP TCP Client NSD Server NSD Server IB RDMA Client SAN SAN Client

Lustre Architecture Net1 TCP Client MDS/OSS Router Net2 RDMA Client

Comparisons Features GPFS Lustre Add Storage Remove Storage Rebalance Pools 1.8 (May) Fileset Quotas Disributed Metadata 3.0 (2010/11) Snapshots Failover -With user/third-party assistance

GPFS on Franklin Interactive Nodes Franklin has 10 Login Nodes and 6 PBS launch nodes Currently uses native GPFS client and TCP based mounts on login nodes Hardware is in place to switch to SAN based mount on Login nodes in near future

GPFS on Cray XT Mostly just worked Install in shared root environment Some modifications needed to point to the correct source tree Slight modifications to mmremote and mmsdrfsdef utility scripts (to use ip command to determine SS IP address)

Franklin Compute Nodes NERSC will use Cray s DVS to mount NGF file systems on Franklin compute nodes. DVS ships IO request to server nodes which have the actual target file system mounted. DVS has been tested with GPFS at NERSC at scale on Franklin during dedicated test shots Targeted for production in June time-frame Franklin has 20 DVS servers connected via SAN.

IO Forwarders IO Forwarder/Function Shipping Moves IO requests to a proxy server running file system client Advantages Less overhead on clients Reduced scale from FS viewpoint Potential for data redistribution (realign and aggregate IO request) Disadvantages Additional latency (for stack) Additional SW component (complexity)

Overview of DVS Portals based (low overhead) Kernel modules (both client and server) Support for striping across multiple servers (Future Release-tested at NERSC) Tune-ables to adjust behavior Stripe width (number of servers) Block size Both mount options and Env. variables

NGF Topology (again) NGF Nodes Ethernet Network Franklin NGF Disk Bassi pnsd BASSI NGF SAN Planck pnsd PDSF Planck PDSF NGF- FRANKLIN SAN Jacq FRANKLIN SAN Login DVS Lustre Franklin Disk Compute Node

Future Plans No current plans to replace Franklin scratch with NGF scratch or GPFS. However, we plan to evaluate this once the planned upgrades are complete. Explore Global Scratch This could start with smaller Linux cluster to prove feasibility Evaluate tighter integration with HPSS (GHI)

Long Term Questions for GPFS Scaling to new levels (O(10k) clients) Quality of Service in a multi-clustered environment (where the aggregate bandwidth of the systems exceed the disk subsystem) Support for other systems, networks and scale pnfs could play a role Other Options Generalized IO forwarding system (DVS) Routing layer with abstraction layer to support new networks (LNET)

Acknowledgements NERSC Matt Andrews Will Baird Greg Butler Rei Lee Nick Cardo Cray Terry Malberg Dean Roe Kitrick Sheets Brian Welty

Questions?

Further Information For Further Information: Shane Canon Data System Group Leader Scanon@lbl.gov