Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Similar documents
Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Parallel File Systems Compared

SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

An Introduction to GPFS

Parallel File Systems. John White Lawrence Berkeley National Lab

A GPFS Primer October 2005

Massive High-Performance Global File Systems for Grid computing

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

An introduction to GPFS Version 3.3

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

Exam : Title : Storage Sales V2. Version : Demo

COSC6376 Cloud Computing Lecture 17: Storage Systems

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Backup Solutions with (DSS) July 2009

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

StorNext 3.0 Product Update: Server and Storage Virtualization with StorNext and VMware

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Veritas NetBackup on Cisco UCS S3260 Storage Server

PeerStorage Arrays Unequalled Storage Solutions

Voltaire Making Applications Run Faster

IBM řešení pro větší efektivitu ve správě dat - Store more with less

Storage Area Network (SAN)

Snia S Storage Networking Management/Administration.

HP D2D & STOREONCE OVERVIEW

The General Parallel File System

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

IBM IBM Entry Level High Volume Storage and Sales.

Future Routing Schemes in Petascale clusters

Vendor: IBM. Exam Code: Exam Name: Storage Sales V2. Version: DEMO

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Data Movement & Storage Using the Data Capacitor Filesystem

SMART SERVER AND STORAGE SOLUTIONS FOR GROWING BUSINESSES

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

OpenSees on Teragrid

IBM TotalStorage FAStT900 Storage Server New Offering Expands IBM SAN Storage Solutions

HP0-S15. Planning and Designing ProLiant Solutions for the Enterprise. Download Full Version :

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Birds of a Feather Presentation

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

IBM CAUDIT/RDSI Briefing

Modular Platforms Market Trends & Platform Requirements Presentation for IEEE Backplane Ethernet Study Group Meeting. Gopal Hegde, Intel Corporation

High Performance Storage Solutions

The RAMDISK Storage Accelerator

Regional & National HPC resources available to UCSB

An introduction to IBM Spectrum Scale

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Delivery Method Chart: Replacement Parts and Installation of Integrated Software Updates

The GUPFS Project at NERSC Greg Butler, Rei Lee, Michael Welcome. NERSC Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley CA USA

Indiana University s Lustre WAN: The TeraGrid and Beyond

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

EMC VPLEX Geo with Quantum StorNext

Exam Name: Midrange Storage Technical Support V2

Storage Systems Market Analysis Dec 04

Realizing the Promise of SANs

Storage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek

CS370 Operating Systems

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Contingency Planning and Disaster Recovery

iscsi Unified Network Storage

Coordinating Parallel HSM in Object-based Cluster Filesystems

Deduplication Storage System

Cisco - Enabling High Performance Grids and Utility Computing

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Extending InfiniBand Globally

Real Parallel Computers

IBM Tivoli Storage Manager. Blueprint and Server Automated Configuration for Linux x86 Version 2 Release 3 IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Data Movement & Tiering with DMF 7

SAS workload performance improvements with IBM XIV Storage System Gen3

MAID for Archiving. Aloke Guha. COPAN Systems

1Z0-433

Server Networking e Virtual Data Center

Parallel File Systems for HPC

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

STOREONCE OVERVIEW. Neil Fleming Mid-Market Storage Development Manager. Copyright 2010 Hewlett-Packard Development Company, L.P.

Storage and Storage Access

IBM TotalStorage Enterprise Tape Library 3494

Assignment No. SAN. Title. Roll No. Date. Programming Lab IV. Signature

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

IBM TotalStorage SAN Switch F32

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

Storage Area Network (SAN)

Comparing File (NAS) and Block (SAN) Storage

Get More Out of Storage with Data Domain Deduplication Storage Systems

T e c h n o l o g y. LaserTAPE: The Future of Storage

IBM TotalStorage SAN Switch F08

IBM FileNet Content Manager and IBM GPFS

iscsi Technology Brief Storage Area Network using Gbit Ethernet The iscsi Standard

Heterogeneous Shared Access to Tape Libraries

Storage Supporting DOE Science

The advantages of architecting an open iscsi SAN

My First SAN solution guide

Filesystems on SSCK's HP XC6000

10 Gbit/s Challenge inside the Openlab framework

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract

Table 9. ASCI Data Storage Requirements

iscsi Technology: A Convergence of Networking and Storage

Backing Up and Restoring Multi-Terabyte Data Sets

Transcription:

Beyond Petascale Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

GPFS Research and Development! GPFS product originated at IBM Almaden Research Laboratory! Research continues to be involved in prototyping and developing new GPFS features and related technology

GPFS Parallel File System! GPFS: IBM s Parallel Cluster File System Based on Shared Disk (SAN) Model! Cluster fabric-interconnected nodes (IP, SAN, )! Shared disk - all data and metadata on fabricattached block storage! Parallel - data and metadata flows from all of the nodes to all of the disks in parallel under control of distributed lock manager.! Runs on pseries, IA32, IA65; AIX and Linux! Installed on many Top 500 computers, including ASCI White, ASCI Blue, Blue Horizon, others! Applications include HPC, scalable file and Web servers, digital libraries, video streaming, OLAP, financial data management, engineering design! Present customers are approaching file systems of a petabyte! New applications will drive further demand for petascale storage! Technical challenges must be overcome GPFS File System Nodes Switching fabric (System or storage area network) Shared disks (SAN-attached or network block device)

Future trends! Clusters growing larger and cheaper! More nodes (Linux), larger SMP nodes (AIX)! More storage, lower-cost storage, larger files and file systems (multiple petabytes)! More sites with multiple clusters! Storage fabrics growing larger (more ports)! Variety of storage fabric technologies! Fibre channel, cluster switch, 1000bT, 10000bT, Infiniband,! Technology blurs distinction among SAN, LAN, and WAN! Why can t I use whatever fabric I have for storage?! Why do I need to separate fabrics for storage and communication?! Why can t I share files across my entire SAN like I can with Ethernet?! Why can t I access storage over long distances?

Data access beyond the cluster! Problem: In a large data center, multiple clusters and other nodes need to share data over the SAN! Solution: eliminate the notion of a fixed cluster! control nodes for admin, managing locking, recovery,! File access from client nodes! Client nodes authenticate to control nodes to mount a file system! Client nodes are trusted to enforce access control! Clients still directly access disk data and metadata! Issues:! Scalability (10000 nodes)! Can no longer base FT on other clustering software! Administration (parallel?) Cluster 1 Cluster 2 Control Nodes IP SAN Shared Storage Visualization System

File I/O over Fibre Channel SAN Throughput - GPFS at SDSC SDSC Teragrid Cluster! 128 IA64 compute nodes 1600000 1400000 1200000! 48 Sun StorEdge RAID LUNs! 14 TB file system! 4 Brocade Fibre Channel KB/sec 1000000 800000 600000 Write Read switches! Flow control would help further scaling 400000 200000 0 0 20 40 60 80 100 120 Nodes

GPFS and Teragrid! Teragrid /SDSC GPFS File System /SDSC over SAN! NCSA, SDSC, ANL, CalTech, PSC,! Shared computing grid! 40+ GB/s backbone! Goal: sharing data over the backbone! GPFS data center solution, scaled over the WAN! IP for storage access adds 10-60 ms! but under load, storage latency is much higher than this anyway!! Additional issues:! Decentralized administration (UIDs)! Globus security! Single name space! Joint work in progress! Pluggable access control! Name space! Technology demo at SC03 SDSC SAN SDSC NSD Servers SDSC Compute Nodes /SDSC over WAN Scinet Sc2003 Compute Nodes NCSA Compute Nodes Visualization /NCSA over WAN Sc03 NSD Servers Sc03 SAN /NCSA over SAN NCSA SAN NCSA NSD Servers /Sc2003 GPFS File System /NCSA GPFS File System

File I/O over WAN gpfsperf read and write 10G file on N nodes - /sdsc SDSC SC03 Demo Cluster! 40 IA64 compute nodes 1200000 1000000! 60 Sun StorEdge RAID controllers! 75 TB file system! 4 Brocade Fibre Channel 800000 switches KB/sec 600000 400000 Write Read! 16 IA64 NSD servers! 1FC and 1 GE each! 10 Gb SCinet WAN link 200000 0 0 4 8 12 16 20 24 28 32 36 Nodes! Inconsistent write results because link was being shared! GPFS I/O parallelism successfully hides WAN latency! TCP flow control appears to adequately prevent throughput fall-off

Issues for future storage! Tertiary storage! Tape $/MB/sec >> disk $/MB/sec! Petabyte file systems present problems for backup, archive, HSM! Disk cost, longevity not yet quite sufficient to replace tape! Low-cost storage! ATA drives much cheaper than server (SCSI, FC) drives BUT! ATA drives are NOT the same as server drives! MTBF spec at low duty cycle! Vibration sensitivity! ATA hard error rate 10X that of server drives