Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Similar documents
SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Massive High-Performance Global File Systems for Grid computing

A Simple Mass Storage System for the SRB Data Grid

Data Movement & Storage Using the Data Capacitor Filesystem

Indiana University s Lustre WAN: The TeraGrid and Beyond

Production Grids. Outline

Parallel File Systems Compared

Shaking-and-Baking on a Grid

The Center for Computational Research & Grid Computing

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Regional & National HPC resources available to UCSB

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Ian Foster, CS554: Data-Intensive Computing

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

Future of Enzo. Michael L. Norman James Bordner LCA/SDSC/UCSD

Distributed Data Management with Storage Resource Broker in the UK

San Diego Supercomputer Center: Best practices, policies

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

Deploying the TeraGrid PKI

Table 9. ASCI Data Storage Requirements

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

Exam : Title : Storage Sales V2. Version : Demo

Mitigating Risk of Data Loss in Preservation Environments

Storage Supporting DOE Science

An Introduction to GPFS

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

High-Energy Physics Data-Storage Challenges

EMC Symmetrix DMX Series The High End Platform. Tom Gorodecki EMC

Mass Storage at the PSC

Overview of the Texas Advanced Computing Center. Bill Barth TACC September 12, 2011

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

Providing a first class, enterprise-level, backup and archive service for Oxford University

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

The National Fusion Collaboratory

Cluster Setup and Distributed File System

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Distributed File Systems Part IV. Hierarchical Mass Storage Systems

Overview of HPC at LONI

OpenSees on Teragrid

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points

Long Term Data Preservation for CDF at INFN-CNAF

IBM řešení pro větší efektivitu ve správě dat - Store more with less

InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Data Deduplication Makes It Practical to Replicate Your Tape Data for Disaster Recovery

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

ACCRE High Performance Compute Cluster

EMC STORAGE FOR MILESTONE XPROTECT CORPORATE

HP D2D & STOREONCE OVERVIEW

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Ian Foster, An Overview of Distributed Systems

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

Benoit DELAUNAY Benoit DELAUNAY 1

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Vendor: IBM. Exam Code: Exam Name: Storage Sales V2. Version: DEMO

Big Data 2015: Sponsor and Participants Research Event ""

A data Grid testbed environment in Gigabit WAN with HPSS

An introduction to GPFS Version 3.3

Solutions for iseries

Transitioning NCAR MSS to HPSS

IBM Storwize V7000 Unified

High-Speed Data Transfer via HPSS using Striped Gigabit Ethernet Communications. _ Phil Andrews, Tom Sherwin, and Victor Hazlewood

Storage Systems Market Analysis Dec 04

IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform

Three Generations of Linux Dr. Alexander Dunaevskiy

Performance Analysis and Modeling of the SciDAC MILC Code on Four Large-scale Clusters

Analysis of CPU Pinning and Storage Configuration in 100 Gbps Network Data Transfer

SC17 - Overview

Lecture 9: MIMD Architectures

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

16GFC Sets The Pace For Storage Networks

SMART SERVER AND STORAGE SOLUTIONS FOR GROWING BUSINESSES

Architecting Storage for Semiconductor Design: Manufacturing Preparation

BlueGene/L. Computer Science, University of Warwick. Source: IBM

IBM ProtecTIER and Netbackup OpenStorage (OST)

Introduction to Grid Computing

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

A GPFS Primer October 2005

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Data storage services at KEK/CRC -- status and plan

Storage Resource Sharing with CASTOR.

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

<Insert Picture Here> Tape Technologies April 4, 2011

Unleashing Clustered ECMWF-Workshop 2004

Brocade Technology Conference Call: Data Center Infrastructure Business Unit Breakthrough Capabilities for the Evolving Data Center Network

Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date:

High Throughput WAN Data Transfer with Hadoop-based Storage

Microsoft Office SharePoint Server 2007

Sun N1: Storage Virtualization and Oracle

Knowledge-based Grids

IRODS: the Integrated Rule- Oriented Data-Management System

Cyberinfrastructure!

Transcription:

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users Phil Andrews, Bryan Banister, Patricia Kovatch, Chris Jordan San Diego Supercomputer Center University of California, San Diego andrews@sdsc.edu, bryan@sdsc.edu, pkovatch@sdsc.edu, ctjordan@sdsc.edu Roger Haskin IBM Almaden Research Center roger@almaden.ibm.com

Abstract At SC 04 we investigated scaling file storage to the very widest possible extents using IBM s GPFS file system, with authentication extensions developed by the San Diego Supercomputer Center and IBM. Close collaboration with IBM, NSF, and TeraGrid The file system extended across the United States, including Pittsburgh, Illinois, and San Diego, California, with the TeraGrid 40 Gb/s backbone providing the Wide Area Network connectivity. On a 30Gb/s connection, we sustained 24 Gb/s and peaked at 27 Gb/s Locally, we demonstrated 15 GB/s with new WR sort times Implementation of the production Global File System facility is underway: over 500 TB will be served up from SDSC within 2 weeks. Usage to come; plan to go to 1 PB by end of year

Background: National Science Foundation TeraGrid Prototype for CyberInfrastructure High Performance Network: 40 Gb/s backbone, 30 Gb/s to each site National Reach: SDSC, NCSA, CIT, ANL, PSC, TACC, Purdue/Indiana, ORNL Over 40 Teraflops compute power Approx. 2 PB rotating Storage

TeraGrid Network Geographically

What Users Want in Grid Data Unlimited data capacity. We can almost do this. Transparent, High Speed access anywhere on the Grid. We can do this. Automatic Archiving and Retrieval (maybe) No Latency. We can t do this. (Measured 60 ms roundtrip SDSC-NCSA, 80 ms SDSC-Baltimore)

Proposing Centralized Grid Data Approach SDSC is designated Data Lead for TeraGrid Over 1 PB of Disk Storage Large Investment in Database Engines (72 Proc. Sun F15K, Two 32-proc. IBM 690) 32 STK 9940B Tape Drives, 28 other drives, 5 STK silos, 6 PB Capacity Storage Area Network: Over 1400 2Gb/s ports Close collaboration with vendors: IBM, Sun, Brocade, etc.

SDSC Machine Room Data Architecture Philosophy: enable SDSC configuration to serve the grid as data center LAN (multiple GbE, TCP/IP) 1 PB disk 6 PB archive 1 GB/s disk-to-tape Optimized support for DB2 /Oracle Power 4 SAN (2 Gb/s, SCSI) Blue Horizon HPSS 200 MB/s per controller Local Disk (50TB) Linux Cluster, 4TF 30 MB/s per drive Sun F15K Power 4 DB WAN (30 Gb/s) SCSI/IP or FC/IP FC GPFS Disk (100TB) FC Disk Cache (400 TB) Database Engine Silos and Tape, 6 PB, 1 GB/sec disk to tape 32 tape drives Data Miner Vis Engine

Storage Area Network at SDSC

What do we use the TeraGrid Network for? We expect data movement to be major driver Conventional approach is to submit job to one or multiple sites, GridFTP the data there and back SDSC/IBM & TeraGrid are also exploring Global File System approach First demonstrated with a single 10 Gb/s link and local disk at SC 03 (Phoenix) Exported at high speed to SDSC and lower speed to NCSA

Access to GPFS File Systems over the Wide Area Network (SC 03) SDSC Compute Nodes NCSA Compute Nodes /SDSC GPFS File System /SDSC over SAN SDSC SAN SDSC NSD Servers /SDSC over WAN Goal: sharing GPFS file systems over the WAN WAN adds 10-60 ms latency but under load, storage latency is much higher than this anyway! Typical supercomputing I/O patterns latency tolerant (large sequential read/writes) On demand access to scientific data sets No copying of files here and there Scinet Sc2003 Compute Nodes Visualization /NCSA over WAN Sc03 NSD Servers Sc03 SAN /NCSA over SAN NCSA SAN NCSA NSD Servers /Sc2003 GPFS File System /NCSA GPFS File System Roger Haskin, IBM

On Demand File Access over the Wide Area with GPFS Bandwidth Challenge 2003 L.A. Hub SCinet Booth Hub TeraGrid Network Southern California Earthquake Center data NPACI Scalable Visualization Toolkit Gigabit Ethernet Gigabit Ethernet Myrinet SAN SDSC 128 1.3 GHz dual Madison processor nodes 77 TB General Parallel File System (GPFS) 16 Network Shared Disk Servers Myrinet SDSC SC Booth 40 1.5 GHz dual Madison processor nodes GPFS mounted over WAN No duplication of data

First Demonstration of High Performance Global File System over standard TCP/IP Visualized application output data across 10 Gb/s link to SDSC from show floor at Phoenix SC 03 Saw approximately 9 Gb/s out of 10 Gb/s link Very encouraging, paved way for full prototype at SC 04 If successful, planned to propose production GFS facility for TeraGrid

Global TG GPFS over 10 Gb/sWAN (SC 03 Bandwidth Challenge Winner)

SC 04 SDSC StorCloud Challenge Entry 160 TB of IBM disk in StorCloud booth Connected to 40 Servers in SDSC booth via 120 FC links Saw 15 GB/s from GPFS on show floor Exporting GPFS file system across TeraGrid to NCSA and SDSC using SCinet 30 Gb/s link to TeraGrid backbone Saw 3 GB/s from show floor to NCSA & SDSC These are all direct file system accesses

StorCloud 2004 Major initiative at SC2004 to highlight the use of storage area networking in high-performance computing ~1 PB of storage from major vendors for use by SC04 exhibitors StorCloud Challenge competition to award entrants that best demonstrate the use of storage (similar to Bandwidth Challenge) StorCloud Demo SDSC-IBM StorCloud Challenge A workflow demo that highlights multiple computation sites on a grid sharing storage at a storage siteibm computing and storage hardware and the 30 Gb/s communications backbone of the Teragrid 40-node GPFS server cluster Installing the DS4300 Storage IBM DS4300 Storage in StorCloud booth

SC 04 demo purposely designed to prototype possible future production 30 Gb/s connection to TeraGrid from Pittsburgh show floor identical to production setup GPFS mounted on IBM StorCloud disks on show floor to both SDSC and NCSA Local sorts demonstrated File System capability Production Application (Enzo) ran at SDSC and wrote to Pittsburgh disks (~ 1TB/hour) Output data visualized at SDSC and NCSA (~3GB/s), was real production data

L.A. SC 04 Demo IBM-SDSC-NCSA TG network 1 SDSC Gigabit Ethernet Federation SP switch 176 176 8-Power4+ processor p655 AIX nodes Mount /gpfs-sc04 DataStar 11 32-Power4+ processors p690 AIX nodes Possible: 7 nodes with 10 GE adapters SDSC booth 4 racks of nodes 1 rack of networking StorCloud booth 15 racks of disks 1.5 racks of SAN switch SCinet Gigabit Ethernet 1 2 3 11 1 2 3 40 SAN Disks TeraGrid 40 1.3 GHz dual Itanium2 processor Linux nodes 4 racks of nodes 1 GE/node 3 FC/node GPFS NSD Servers Export /gpfs-sc04 SAN Switches Brocade SAN switch 3 Brocade 24000 switches 128 ports each 360 ports total used 181 TB raw FastT600 disk 15 racks Gigabit Ethernet 4 controllers/rack 4 FC/controller 240 total FC from disks 2 Ethernet ports/controller 120 total Ethernet ports

GPFS Local Performance Tests ~ 15 GB/s achieved at SC 04 Won SC 04 StorCloud Bandwidth Challenge Set new Terabyte Sort WR, less than 540 Seconds!

Global File Systems over WAN Basis for some new Grids (DEISA) User transparency (TeraGrid roaming) On demand access to scientific data sets Share scientific data sets and results Access scientific results from geographically distributed instruments and sensors in real-time No copying of files to here and there and there What about UID, GID mapping? Authentication Initially use World Readable DataSets and common UIDs for some users. Initial GSI authentication in use (joint SDSC-IBM work) On demand Data Instantly accessible and searchable No need for local storage space Need network bandwidth

SC 04 Demo IBM-SDSC-NCSA 1. Nodes scheduled using GUR 2. ENZO computation on DataStar, output written to StorCloud GPFS served by nodes in SDSC s SC 04 booth 3. Visualization performed at NCSA using StorCloud GPFS and displayed to showroom floor L.A. TG network Chicago SDSC 10 Gigabit Ethernet SCinet Gigabit Ethernet NCSA DataStar Federation SP switch 176 8-Power4+ processor p655 AIX nodes 7 32-Power4+ processor p690 AIX nodes with 10 GE adapters /gpfs-sc04 mounted SC 04 SDSC booth SC 04 StorCloud booth Gigabit Ethernet Brocade SAN switch 160 TB FastT600 disk 15 racks, 2 controllers/rack TeraGrid 40 1.3 GHz dual Itanium2 processor Linux nodes GPFS NSD Servers 3 Brocade 24000 switches TeraGrid 40 1.5 GHz dual Itanium2 processor nodes Visualization /gpfs-sc04 mounted

GSI Authentication Essential Conventional authentication is via UID Several resource providers combine on a grid Only very tightly coupled grids have a unified UID space On general grids, users will have multiple UIDs across the Grid Need to recognize GSI certificates to handle data ownership, etc., across Grid IBM & SDSC worked on GSI-UID mapping

Applications demo at SC 04/StorCloud Large (1024 processors) Enzo application running at SDSC in San Diego Writes ~50 TB directly to SC 04 StorCloud disks mounted as Global File System across Country Data analyzed and visualized at NCSA in Illinois and at SDSC True Grid application, running across USA, saw ~3 GB/s

WAN-GFS POC continued at SC 04

SDSC installing 0.5 PB of IBM disk now Plan to start hosting large datasets for the scientific community First will be NVO, ~50 TB of Night Sky information; Read-Only dataset available for computation across TeraGrid Extend rapidly with other datasets Planning on 1 PB before end of year

SDSC Hosts the Global File System For the TeraGrid Juniper T640 30 Gbps to Los Angeles NCSA TeraGri d Network PSC ANL Force 10-12000 TeraGrid Linux 3 TeraFlops BG/L 6 TeraFlops 128 I/O Nodes IA-64 GPFS Servers DataStar 11 TeraFlops.5 PetaByte FastT100 Storage Parallel File System exported from San Diego Mounted at all TeraGrid sites

Lessons Learned For real impact, close collaboration with vendors is essential during both development and implementation There is a natural migration of capability from experimental middleware to entrenched infrastructure Middleware providers should welcome this, and respond with new innovations

Predictions (of the Future) Many sites will have lots of cheap disk Relatively few will invest in significant Hierarchical Storage Management systems DataSets will be locally cached from central GFS/HSM sites across the Grid Replication becomes part of Grid infrastructure Data integrity and ultimate backup will be the province of a few copyright library sites Data is the birthright and justification of the Grid