Lustre A Platform for Intelligent Scale-Out Storage

Similar documents
Linux Clustering & Storage Management. Peter J. Braam CMU, Stelias Computing, Red Hat

Lustre overview and roadmap to Exascale computing

Lustre Technical Project Summary (Attachment A to RFP B Response)

Horizontal Scaling Solution using Linux Environment

Object based Storage Cluster File Systems & Parallel I/O

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Parallel File Systems Compared

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Parallel File Systems for HPC

Cloud Computing CS

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

Coordinating Parallel HSM in Object-based Cluster Filesystems

An Introduction to GPFS

CA485 Ray Walshe Google File System

The File Systems Evolution. Christian Bandulet, Sun Microsystems

An Overview of The Global File System

The Google File System

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

Distributed File Systems II

The Google File System

Distributed Systems 16. Distributed File Systems II

Object Storage: Redefining Bandwidth for Linux Clusters

The Parallel NFS Bugaboo. Andy Adamson Center For Information Technology Integration University of Michigan

Lecture 2 Distributed Filesystems

Current Topics in OS Research. So, what s hot?

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Red Hat Enterprise 7 Beta File Systems

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

Feedback on BeeGFS. A Parallel File System for High Performance Computing

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

Red Hat Global File System

The Google File System

Proceedings of the Ottawa Linux Symposium

NFSv4.1 Using pnfs PRESENTATION TITLE GOES HERE. Presented by: Alex McDonald CTO Office, NetApp

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NPTEL Course Jan K. Gopinath Indian Institute of Science

Ceph. The link between file systems and octopuses. Udo Seidel. Linuxtag 2012

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

An Exploration of New Hardware Features for Lustre. Nathan Rutman

CLOUD-SCALE FILE SYSTEMS

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

System input-output, performance aspects March 2009 Guy Chesnot

SONAS Best Practices and options for CIFS Scalability

What is a file system

Filesystems on SSCK's HP XC6000

The Google File System

CSE 124: Networked Services Fall 2009 Lecture-19

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

The Google File System

The Google File System (GFS)

WITH THE LUSTRE FILE SYSTEM PETA-SCALE I/O. An Oak Ridge National Laboratory/ Lustre Center of Excellence Paper February 2008.

A GPFS Primer October 2005

Deep Dive: Cluster File System 6.0 new Features & Capabilities

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Deploying Scalable Switched NAS Infrastructures in the NFS Environment

OPERATING SYSTEM. Chapter 12: File System Implementation

The advantages of architecting an open iscsi SAN

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

GFS: The Google File System

Cluster-Level Google How we use Colossus to improve storage efficiency

IBM Active Cloud Engine/Active File Management. Kalyan Gunda

Chapter 11: Implementing File Systems

Google is Really Different.

CSE 124: Networked Services Lecture-16

February 15, 2012 FAST 2012 San Jose NFSv4.1 pnfs product community

Deploying Celerra MPFSi in High-Performance Computing Environments

EMC Celerra CNS with CLARiiON Storage

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

An introduction to GPFS Version 3.3

The Evolution of File Systems

Computer Systems Laboratory Sungkyunkwan University

Storage and Storage Access

SCSI Device Memory Export Protocol (DMEP) T10 Presentation

GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan

LLNL Lustre Centre of Excellence

Distributed File Systems. Directory Hierarchy. Transfer Model

HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS PARALLEL FILE I/O 1

Fast Forward I/O & Storage

OCFS2: Evolution from OCFS. Mark Fasheh Senior Software Developer Oracle Corporation

DISTRIBUTED FILE SYSTEMS & NFS

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

File System Internals. Jo, Heeseung

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Experience the GRID Today with Oracle9i RAC

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

GFS: The Google File System. Dr. Yingwu Zhu

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

Improved Solutions for I/O Provisioning and Application Acceleration

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

The MOSIX Scalable Cluster Computing for Linux. mosix.org

Object-based Storage Devices (OSD) T10 Standard

OCFS2 Mark Fasheh Oracle

NFS-Ganesha w/gpfs FSAL And Other Use Cases

IBM Storwize V7000 Unified

Transcription:

Lustre A Platform for Intelligent Scale-Out Storage Rumi Zahir, rumi. May 2003 rumi.zahir@intel.com

Agenda Problem Statement Trends & Current Data Center Storage Architectures The Lustre File System Project Intelligent Storage Discussion 2

Problem Statement Goals A scalable, shared, coherent, persistent file store built from commodity components that appears to its users as a single big system. Scale I/O bandwidth/latency, data availability and capacity Reduce admin cost by supporting on-line data migration, load balancing and incremental hot add/remove of sub-components Proactive intelligent storage to accelerate data retrieval and indexing functions 3

Enterprise Trend: Storage moving out of the box Because Fast processors need lots of disks to keep them busy, and lots of disks won t fit in the box More spindles? lower latency, higher bandwidth Storage access protocols are already message based (e.g. SCSI, NFS) Disk latency is such that increased distance does not hurt Specialized software on storage boxes can be optimized, and makes them easier to manage Storage needs to be shared 4

Current Storage Architectures Network-Attached Storage (NAS) file servers with specialized easy-to to-manage software Appl. Servers Storage Area Networks (SAN) pooled block storage, but no concurrent sharing Appl. Servers Ethernet (NFS, CIFS) NAS Storage Fibre Channel (FCP) SAN Disks 5

Storage Scalability Limiters Network Attached Storage (NAS) File sharing protocols (e.g. NFS, CIFS) Lack support for file striping across servers Write-through through caching (poor write performance) Synchronous meta-data updates serialize directory ops e.g. update access time on every read could result in a file server op (NFS relaxes this) Name space (server:disk:partition:file) encodes location NAS file servers are performance and management bottlenecks Storage Area Networks (SAN) Block-based storage abstraction shares disk space But, no concurrent file sharing between clients Distributed databases use distributed lock managers for sharing access to block-level level storage devices 6

The Lustre Project Goal Develop scalable object-based based file system with cluster-wide POSIX semantics for Linux Scalability: 10,000 clients, 1,000 OST, 10 MDS 7 Collaborative 3-year 3 Linux open-source project Cluster File Systems, Inc. Peter Braam is Lustre Architect & Technical Project Lead Strong Linux team with ext3 file system experience HP Network Storage Systems Operation Project Management, Testing & Productization Intel contributes to Instrumentation & performance analysis, storage targets National Labs (Livermore, Los Alamos, Sandia) 3 year R&D funding, large clusters

Cluster File Systems (Gradual Evolution) Symmetrical Block-Based Based Block-oriented oriented with distributed lock manager for coherence Peer-2-peer coherence protocols Focus: Fibrechannel SANs,, typical single OS (except Veritas) Examples: Sistina (GFS), IBM (GPFS), Veritas (CFS) Asymmetrical Block-Based Based Block-oriented oriented with out-of of-data-path meta-data servers Focus: : SAN, Multi-OS (Windows, Linux, ), Management Examples: Polyserve (Matrix Server), Veritas (SanPoint), IBM (StorageTank), EMC (Highroad) Asymmetrical Object-Based (Emerging) Block allocation & security fcts migrate to disk/storage contr. Examples: CMU/NASD, Cluster File Systems (Lustre), IBM (StorageTank( StorageTank), Panasas 8

Scalable Shared Storage IPC (small messages) Clients = App. Servers Meta-Data Servers Coherence Management Data Transfers (Bulk) Storage Management Object Storage Targets 9

Lustre Scalability Enablers Object Storage Disk block allocation abstracted from clients & metadata servers? fewer items to keep coherent File I/O Protocol Choice of Clients cache file data w/ write-behind (good for small files) Direct I/O without caching (good for large files) Vectored zero-copy bulk data transfers Preposted receive buffers, protocol is RDMA/DDP enabled Support TCP/IP, Quadrics, Myrinet Stripe single file across multiple servers Client-side logical object volumes enable concurrent I/O between a client and multiple servers on a single file Different files can have different striping patterns. 10

Lustre Scalability Enablers 2 Metadata Protocol Metadata can be cached on clients Allow caching when no contention Write-behind caching with recoverable journal Uses Intermezzo style server replay log Revert to client/server model in case of heavy sharing Intent-based VFS lookups reduce #RPCs# Tightly coupled distributed lock manager Modeled after the VAX cluster DLM Multiple lock namespaces Metadata locks: {P}R, {P}W, EX on file ids {inode{ inode/gen#} Extent locks: byte range locking on objects Distribute extent locks over object storage targets 11

Intelligent Storage Unlike disks or block-level level storage arrays, Lustre Object Storage Targets (OSTs( OSTs) ) have knowledge of logically contiguous file chunks Use OST intelligence to: Improve response times Prefetch based on file content or file-type Optimize disk data layout & caching policies Add new proactive functionality Snapshot/Versioning through copy-on on-write Helps solve backup/restore problem Proactive indexing and/or pattern matching engine Move computation to data instead of data to computation Opportunistically take advantage of unused storage device bandwidth/cycles to pro-actively build indices 12

User POSIX / VFS Interface Lustre Client Lustre MDS Query Processing Service Object Storage Targets Lustre OST OST OST OST OST OST networking Object Based Disk Server (OBD server) Lock Server OST Indexer OST Query Engine Object Based Disk (OBD) alternatives OST pre-computes & stores contentbased indices Ext2 OBD OBD Filter File system XFS, JFS, Ext3, Lock Server indicates change stability 13

Content-Based Indexing Query Processing Service Receives user queries Parallelizes queries across OSTs Aggregates query results Communicates with MDS for queries with pathnames & striped objects OST - Indexer / Query Engine User-defined indexing & query functions Opportunistic indexing & change tracking Support simple query aggregation Support for Live Queries (notification on match) 14

Aggregated Query Index Function Index Function Indexer Live Query List Live Queries Query Query Engine Query OST Interface - Maintains list of changed, unindexed objects 15

Proactive Indexing & Query Summary Query interface orthogonal to file system Proactive, opportunistic indexing integrated into OST User definable indexing / query functions Distributed query processing Push & pull model for queries 16

Intelligent Storage Research Projects @ Intel Self-Tuning Optimized data placement & caching policies (with CMU PDL ) Content-Based Indexing Image Matching (Intel( CMU Lablet) Integration with Lustre (Intel( Santa Clara) Using Lustre as Object Storage Research Platform For more information Intel R&D: http://www.intel intel.com/labs/storage Lustre: http://www.lustre lustre.org Work @ CMU: http://www.ece ece.cmu.edu/~ /~mmesnier 17