Parallel File Systems John White Lawrence Berkeley National Lab
Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation
What is a File System? Simply, a method for ensuring A Unified Access Method to Data Organization (in a technical sense ) Data Integrity Efficient Use of Hardware
The HPC Application (our application) Large Node Count High IO Code (small file operations) High Throughput Code (large files fast) You Can Never Provide Too Much Capacity
What s the Problem With Tradition? NFS/CIFS/AFP/NAS is slow Single point of contact for both data and metadata Protocol Overhead File based locking We want parallelism from the application to disk We Need a Single Namespace We Need Truly Massive Aggregate Throughput (stop thinking MB/s) Bottlenecks are Inherent to Architecture Most Importantly:
Researchers Just Don t Care They want their data available everywhere They hate transferring data (this bears repeating) Their code wants the data several cycles ago If they have to learn new IO APIs, they commonly won't use it, period An increasing number aren t aware their code is inefficient
Performance in Aggregate: A Specific Case File System capable of Performance of 5GB/s Researcher running an analysis of past stock ticker data 10 independent processes per node, 10+ nodes, sometimes 1000+ processes Was running into performance issues In Reality, code was hitting 90% of peak performance 100s of processes choking each other Efficiency is key
Parallel File Systems A File System That Provides Access to Massive Amounts of Data at Large Client Counts Simultaneous Client Access at Sub-File Levels Striping at Sub-File Levels Massive Scalability A Method to Aggregate Large Numbers of Disks
Popular Parallel File Systems Lustre Purchased by Intel Support offerings from Intel, Whamcloud and numerous vendors Object based Growing feature list Information Lifecycle Management Wide Area mounting support Data replication and Metadata clustering planned Open source Large and growing install base, vibrant community Open compatibility
Popular Parallel File Systems GPFS IBM, born around 1993 as Tiger Shark multimedia file system Support direct from vendor AIX, Linux, some Windows Ethernet and Infiniband support Wide Area Support ILM Distributed metadata and locking Matured storage pool support Replication
Licensing Landscape GPFS (A Story of a Huge Feature Set at a Huge Cost) Binary IBM licensing Per Core Site-Wide Lustre Open Paid Licensing available tied to support offerings
Striping Files
SAN All nodes have access to storage fabric, all LUNs
Direct Connect A separate storage cluster hosts and exports via common fabric
Berkeley Research Computing Current Savio Scratch File System Lustre 2.5 210TB of DDN 9900 ~10GB/s ideal throughput Accessible on all nodes Future Lustre 2.5 or GPFS 4.1 ~1PB+ Capacity ~20GB/s throughput Vendor yet to be determined
Berkeley Research Computing Access Methods Available on every node POSIX MPIIO Data Transfer Globus Online Ideal for large transfers Restartable Tuned for large networks and long distance Easy to use graphical interface online SCP/SFTP Well known Suitable for quick and dirty transfers
Current Technological Landscape Tiered Storage (Storage Pools) When you have multiple storage needs within a single namespace SSD/FC for for jobs, metadata (Tier0) SATA for capacity (Tier1) Tape for long-term/archival (Tier2) ILM Basically, perform actions on data per a rule set Migration to Tape Fast Tier 0 storage use case Purge Policies Replication Dangers of metadata operations Long term storage
Further Information Berkeley Research Computing http://research-it.berkeley.edu/brc HPCS At LBNL http://scs.lbl.gov/ Email: jwhite@lbl.gov