Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Similar documents
Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Parallel File Systems Compared

Parallel File Systems for HPC

Exploration of Parallel Storage Architectures for a Blue Gene/L on the TeraGrid

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Storage and Storage Access

An Introduction to GPFS

HPC In The Cloud? Michael Kleber. July 2, Department of Computer Sciences University of Salzburg, Austria

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Mission-Critical Enterprise Linux. April 17, 2006

System input-output, performance aspects March 2009 Guy Chesnot

ACCRE High Performance Compute Cluster

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Coordinating Parallel HSM in Object-based Cluster Filesystems

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

Crossing the Chasm: Sneaking a parallel file system into Hadoop

InfoBrief. Platform ROCKS Enterprise Edition Dell Cluster Software Offering. Key Points

Unleashing Clustered ECMWF-Workshop 2004

Linux Clustering Technologies. Mark Spencer November 8, 2005

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

Structuring PLFS for Extensibility

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

IBM Storwize V7000 Unified

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Comparing Linux Clusters for the Community Climate System Model

Assessing performance in HP LeftHand SANs

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Red Hat Global File System

IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform

The Optimal CPU and Interconnect for an HPC Cluster

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

The MOSIX Scalable Cluster Computing for Linux. mosix.org

The Last Bottleneck: How Parallel I/O can improve application performance

Overcoming Obstacles to Petabyte Archives

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

Lustre A Platform for Intelligent Scale-Out Storage

CyberStore DSS. Multi Award Winning. Broadberry. CyberStore DSS. Open-E DSS v7 based Storage Appliances. Powering these organisations

Symantec NetBackup PureDisk Compatibility Matrix Created August 26, 2010

Toward An Integrated Cluster File System

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Shared Object-Based Storage and the HPC Data Center

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

REMEM: REmote MEMory as Checkpointing Storage

SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

SCS Distributed File System Service Proposal

Experiences with HP SFS / Lustre in HPC Production

High-Performance Lustre with Maximum Data Assurance

MAHA. - Supercomputing System for Bioinformatics

Emerging Technologies for HPC Storage

p5 520 server Robust entry system designed for the on demand world Highlights

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points

Manufacturing Bringing New Levels of Performance to CAE Applications

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

A Global Operating System for HPC Clusters

The RAMDISK Storage Accelerator

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law

BlueGene/L. Computer Science, University of Warwick. Source: IBM

An introduction to GPFS Version 3.3

The Leading Parallel Cluster File System

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

Horizontal Scaling Solution using Linux Environment

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

QLogic TrueScale InfiniBand and Teraflop Simulations

HPC Architectures. Types of resource currently in use

ECE7995 (7) Parallel I/O

Microsoft Office SharePoint Server 2007

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Data storage services at KEK/CRC -- status and plan

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Building an HPC Watering Hole for Boulder Area Computational Science

Comparing File (NAS) and Block (SAN) Storage

Introducing SUSE Enterprise Storage 5

Cisco Prime Home 6.X Minimum System Requirements: Standalone and High Availability

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U

Data Movement & Storage Using the Data Capacitor Filesystem

An ESS implementation in a Tier 1 HPC Centre

GFS: The Google File System

Readme for Platform Open Cluster Stack (OCS)

TOSS - A RHEL-based Operating System for HPC Clusters

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

InfiniBand based storage target

The benefits of. Clustered storage offers advantages in both performance and scalability, but users need to evaluate three different architectures.

Redbooks Paper. GPFS/NSD on Linux/xSeries using a 100 Megabit Ethernet Network. Raymond L. Paden

SONAS Best Practices and options for CIFS Scalability

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

RAIDIX 4.5. Product Features. Document revision 1.0

How то Use HPC Resources Efficiently by a Message Oriented Framework.

The advantages of architecting an open iscsi SAN

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Future Trends in Hardware and Software for use in Simulation

Enterprise Volume Management System Project. April 2002

Transcription:

LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu

Collaborators Organizations National Center for Atmospheric Research (NCAR) University of Colorado, Boulder (CU) Researchers Jason Cope Michael Oberg Henry Tufo 26 April 2005 2

Outline Motivation Parallel Filesystem Products and Experiences PVFS2, Lustre, GPFS, TerraGrid FS Results Single-Node Performance Parallel Bandwidth and Metadata Performance Future Work 26 April 2005 3

LLNL NCSA SDSC Related Work Directly involved with Cluster File Systems and Lustre Goal: Create a filesystem not restricted to specific hardware Exploring the breadth of available parallel filesystems Integration with Mass Storage and TeraGrid systems Examined GPFS, GFS, Panasas, SamFS, SGI CXFS, an ADIC solution, Lustre, and IBRIX LCI 2004: Examined PVFS, Lustre, and GPFS on IA-64 Focused on a homogeneous architecture with fibre-channel equipped storage servers 26 April 2005 4

Motivation NCAR Storage Systems Supercomputers and Clusters and local working storage Archival Storage Tape silo system Archive Management and disk cache controller Visualization Systems and local working storage Grid Gateway GridFTP Server DataMover Server Shared Storage Cluster with shared filesystem 26 April 2005 5

Motivation Current CU Boulder Systems NFS Servers Home Directories Shared Software Working Space Compute Clusters Xeon Cluster (64) PPC970 Cluster (28) 26 April 2005 6

Motivation Future CU Boulder Systems CU Storage Cluster NCAR Experimental Platforms CU Compute Clusters Xeon Cluster (64) PPC970 Cluster (28) Opteron Cluster (128) 26 April 2005 7

Parallel Filesystem Features Typical desired features Availability (no downtime) Reliability (no data loss) Performance (no waiting) Scalability (no limits) Affordability (no cost) We re starting small, but we want our filesystem to grow with us! Two storage servers at the present time Support expansion and external connectivity 26 April 2005 8

Research Objectives Find a high performance parallel filesystem Minimum of specialized hardware Commodity servers with directly attached disk Filesystem access over Ethernet Support heterogeneous cluster client environment Servers: Xeon, Xeon EMT64 Clients: PPC970, Xeon, and eventually Opteron Examine features and requirements Functionality Performance Administrative overhead 26 April 2005 9

Filesystems Overview and Experience We examined cluster-based filesystems PVFS2 Lustre GPFS TerraFS We did not examine SAN solutions SAN solutions require expensive hardware NCAR uses SANs as collective storage among hosts but not between supercomputers Separate NCAR evaluation team At SC2004, neither GPFS nor Lustre supported Xeon and PPC970 heterogeneous client environments 26 April 2005 10

Experience PVFS2 Installation and configuration Compile kernel module on clients only, no restrictions Two storage servers and one metadata server Very easy to install and configure Worked on our original systems with no kernel changes Stable and reliable parallel filesystem in our environment 26 April 2005 11

Experience in phases Experience Lustre Phase 1: Trying to build our own kernel patches Phase 2: Using the pre-built Lustre kernels Phase 3: Using a custom Lustre kernel Final configuration Required changing Xeon cluster to SLES9 Custom Lustre PPC-enabled kernels using SLES 9 Two object store targets and one metadata server Final phase worked on all systems in our environment Very reliable on Xeon Less reliable with performance variances on PPC970 26 April 2005 12

Experience GPFS Installation and Configuration Compile kernel module on all machines, very restricted Quick and pleasant out-of-box experience Exceptionally well documented and robust Final Configuration Required removing LVM on storage servers Required changing Xeon cluster to SLES9 Two NSD storage servers Worked on all of the clusters in our environment 26 April 2005 13

TerraGrid (TerraFS) iscsi Linux VFS Linux md iscsi Components iscsi initiator on clients Cache coherent iscsi target daemon Linux md (multi-device) software SCSI Kernel level extz derivative filesystem TerraFS Daemon Linux file system Word sense disambiguation Official product name is TerraGrid Frequently abbreviated as TerraFS 26 April 2005 14

Experience TerraFS Installation and configuration Initial install performed by TerraScale engineers Replicated on additional nodes Documentation and software differ slightly (md RAID support) Final configuration Required custom TerraScale built kernel Two storage targets, no metadata server Final phase worked only on the Xeon cluster Generates lots of error messages in failure conditions No current support for PPC970 26 April 2005 15

Table of Administrator Pain and Agony Intel x86-64 Metadata server Intel Xeon Storage server PPC970 Client Intel Xeon Client GPFS 2.3 Lustre 1.4.0 PVFS2 TerraFS Not Used Restricted SLES.111 Restricted SLES.111 Restricted SLES.111 Restricted SLES.141 Restricted SLES.141 Restricted SLES.141 Restricted SLES.141 No Change No Change All (Module Only) All (Module Only) Our Xeon cluster is only 2.5 years old Not Used No Change N/A Custom 2.4.26 Patch GPFS required a commercial OS and a specific kernel version Lustre required a commercial OS and a specific kernel patch TerraFS required a custom kernel Systems already running SuSE required less effort Original goal was to fit filesystem in environment 26 April 2005 16

Performance Experimental Setup Storage Servers Dual Xeon 3.06 GHz 2.5 GB RAM SCSI320 disk array 4 400GB LVM partitions Metadata Server (optional) Dual Xeon EMT64 3.4 GHz 8 GB RAM Xeon Cluster (7 or 14) Core Switch Dual 1Gbps Trunked/Bonded PPC970 Cluster (14 or 27) Other impromptu independent variables Impact of Linux channel bonding servers (PVFS2, Lustre) Impact of Linux Logical Volume Management (Lustre) One disclaimer: GPFS was not run with LVM 26 April 2005 17

Performance Results I Single Node Bandwidth CU workload characteristics Clusters utilized as a compute farm 75% jobs are serial (33% compute time) Used iozone to measure single node performance 26 April 2005 18

Single Node Read Performance NFS PVFS2 Lustre TerraFS GPFS Local 26 April 2005 19

Single Node Write Performance NFS PVFS2 Lustre TerraFS GPFS Local 26 April 2005 20

Performance Results II Aggregate Bandwidth NCAR caggreio benchmark Used by NCAR for previous procurements Writes 30 x 128MB files: separate file per process Does not measure concurrent writer performance Measures average aggregate bandwidth Each process runs independently and is timed Average time is used to produce bandwidth Examined channel bonding variants Lustre No improvement PVFS2 Substantial improvement 26 April 2005 21

Xeon Cluster Aggregate Read Rate 26 April 2005 22

Xeon Cluster Aggregate Write Rate 26 April 2005 23

PPC970 Cluster Aggregate Read Rate 26 April 2005 24

PPC970 Cluster Aggregate Write Rate 26 April 2005 25

Performance Results III - Metadata Testing NCAR metarates benchmark Used by NCAR for previous procurements Writes 10,000 files per task Places file in a single directory or unique directories Measures average file creation rate No GPFS results on the PPC Cluster GPFS was functional and was tested Unable to select balanced nodes for testing 26 April 2005 26

Metadata Creation Rate Same Directory NFS PVFS2 Lustre TerraFS GPFS 26 April 2005 27

Metadata Creation Rate Unique Directories NFS PVFS2 Lustre TerraFS GPFS 26 April 2005 28

GPFS Metarates File creations per second in a unique directory for each task 26 April 2005 29

Linux Logical Volume Management (LVM) There s always something GPFS was the last system we tested GPFS cannot run on top of LVM devices We used LVM with every other filesystem Lustre and GPFS demonstrated close bandwidth results Conclusion Did Linux Logical Volume Management affect Lustre s performance? LVM has no statistically significant impact on Lustre reads (the 95% confidence intervals overlap) Xeon cluster writes are faster without LVM on servers PPC970 cluster writes are inconclusive 26 April 2005 30

Future Work Production filesystem installation Dedicate 2/3 server space to GPFS Reserve 1/3 server space for Lustre Subject filesystem to our user community MPI-IO concurrent write performance testing Wide area network filesystem access Examine higher performance and heterogeneous interconnects Infiniband, 10Gbps Ethernet, Gigabit Ethernet Single network solution not possible 26 April 2005 31

Desired Features in a Production Filesystem Remain responsive even in failure conditions Filesystem failure should not interrupt standard UNIX commands used by administrators ls la /mnt or df should not hang the console Zombies should respond to kill s 9 Support clean normal and abnormal termination Support both service start and shutdown commands Provide an Emergency Stop feature Never hang Linux reboot command Cut losses and let the administrators fix things 26 April 2005 32

Conclusions Heterogenous client support is a recent feature Expect full out-of-box capabilities in the next calendar year with GPFS, PVFS2, and Lustre Specific kernel dependencies and custom kernel patch implementations are a substantial inconvenience Parallel filesystem selection depends on individual site requirements and capabilities. Increased cost (operating system support contracts) Decreased research flexibility Delay when applying security patches Looking forward to Steve Woods Lustre presentation 26 April 2005 33

Acknowledgements Cluster File Systems (Lustre) Jeffrey Denworth, Phil Schwan, and Jacob Berkman IBM (GPFS) Ray Paden, Gautam Shah, Barry Bolding, and Rajiv Bendale NCAR Bill Anderson, Pam Gillman, George Fuentes, and Rich Loft Terrascale Technologies (TerraFS) Tim Wilcox and Dave Jensen University of Colorado, Boulder Theron Voran 26 April 2005 34

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Questions? Matthew Woitaszek matthew.woitaszek@colorado.edu