The Parallel NFS Bugaboo. Andy Adamson Center For Information Technology Integration University of Michigan

Similar documents
GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan

February 15, 2012 FAST 2012 San Jose NFSv4.1 pnfs product community

pnfs and Linux: Working Towards a Heterogeneous Future

pnfs Update A standard for parallel file systems HPC Advisory Council Lugano, March 2011

dcache NFSv4.1 Tigran Mkrtchyan Zeuthen, dcache NFSv4.1 Tigran Mkrtchyan 4/13/12 Page 1

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

System input-output, performance aspects March 2009 Guy Chesnot

pnfs, parallel storage for grid and enterprise computing Joshua Konkle, NetApp, Inc.

Lustre A Platform for Intelligent Scale-Out Storage

CITI - ARC Joint Project Status Report

pnfs: Blending Performance and Manageability

NFS-Ganesha w/gpfs FSAL And Other Use Cases

NFS Version 4 Open Source Project. William A.(Andy) Adamson Center for Information Technology Integration University of Michigan

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

What is a file system

Internet Engineering Task Force (IETF) Request for Comments: 8434 Updates: 5661 August 2018 Category: Standards Track ISSN:

Reasons to Migrate from NFS v3 to v4/ v4.1. Manisha Saini QE Engineer, Red Hat IRC #msaini

Parallel NFS Dr. Oliver Tennert, Head of Technology. sambaxp 2008 Göttingen,

Cloud Computing CS

NFSv4.1 Using pnfs PRESENTATION TITLE GOES HERE. Presented by: Alex McDonald CTO Office, NetApp

Parallel File Systems for HPC

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Advanced Operating Systems

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR

Research on Implement Snapshot of pnfs Distributed File System

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

pnfs and GPFS Dean Hildebrand Storage Systems IBM Almaden Research Lab 2012 IBM Corporation

Coordinating Parallel HSM in Object-based Cluster Filesystems

NFSv4 extensions for performance and interoperability

Where to next? NFSv4 WG IETF99. Trond Myklebust Tom Haynes

NFSv4 extensions for performance and interoperability

pnfs BOF SC

NFS Version 4.1. Spencer Shepler, Storspeed Mike Eisler, NetApp Dave Noveck, NetApp

Crossing the Chasm: Sneaking a parallel file system into Hadoop

NFSv4.1 Plan for a Smooth Migration

NFSv4 and rpcsec_gss for linux

Distributed File Systems

Crossing the Chasm: Sneaking a parallel file system into Hadoop

NFSv4 Open Source Project Update

NFS version 4 LISA `05. Mike Eisler Network Appliance, Inc.

Parallel NFS (pnfs) Garth Gibson CTO & co-founder, Panasas Assoc. Professor, Carnegie Mellon University.

Developments in GFS2. Andy Price Software Engineer, GFS2 OSSEU 2018

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

CS 416: Operating Systems Design April 22, 2015

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law

pnfs, parallel storage for grid, virtualization and database computing Joshua Konkle, Chair NFS SIG

COS 318: Operating Systems. File Systems. Topics. Evolved Data Center Storage Hierarchy. Traditional Data Center Storage Hierarchy

Internet Engineering Task Force (IETF) Request for Comments: 8435 Category: Standards Track. August 2018

Operating Systems Design Exam 3 Review: Spring 2011

Status of the Linux NFS client

The New World of NFS. Steve Dickson Consulting Software Engineer, Red Hat Tuesday, April 15

Storage and File System

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

The Last Bottleneck: How Parallel I/O can improve application performance

GlusterFS Distributed Replicated Parallel File System

NFS Design Goals. Network File System - NFS

Dell Fluid File System Version 6.0 Support Matrix

pnfs and Linux: Working Towards a Heterogeneous Future

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

Lecture 19. NFS: Big Picture. File Lookup. File Positioning. Stateful Approach. Version 4. NFS March 4, 2005

Today: Distributed File Systems. File System Basics

Today: Distributed File Systems

Glamour: An NFSv4-based File System Federation

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

NFS: The Next Generation. Steve Dickson Kernel Engineer, Red Hat Wednesday, May 4, 2011

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

pnfs support for ONTAP Unstriped file systems (WIP) Pranoop Erasani Connectathon Feb 22, 2010

Chapter 11 DISTRIBUTED FILE SYSTEMS

Parallel File Systems Compared

DISTRIBUTED FILE SYSTEMS & NFS

Surveillance Dell EMC Isilon Storage with Video Management Systems

Internet Engineering Task Force (IETF) Request for Comments: 8154 May 2017 Category: Standards Track ISSN:

An Overview of The Global File System

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Linux Support of NFS v4.1 and v4.2. Steve Dickson Mar Thu 23, 2017

Massively Scalable File Storage. Philippe Nicolas, KerStor

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

Cloud Filesystem. Jeff Darcy for BBLISA, October 2011

Massive Data Processing on the Acxiom Cluster Testbed

an Object-Based File System for Large-Scale Federated IT Infrastructures

Parallel File Systems. John White Lawrence Berkeley National Lab

NFSv4 as the Building Block for Fault Tolerant Applications

The State of NFSv4. Spencer Shepler Sun Microsystems 6 August 2001

Panzura White Paper Panzura Distributed File Locking

NFS with Linux: Current and Future Efforts. Chuck Lever, Network Appliance, Inc Steve Dickson, Red Hat Red Hat Summit 2006

The State of Samba (June 2011) Jeremy Allison Samba Team/Google Open Source Programs Office

Network File System (NFS)

Network File System (NFS)

The RAMDISK Storage Accelerator

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

Distributed File Storage in Multi-Tenant Clouds using CephFS

A GPFS Primer October 2005

Storage and File Hierarchy

<Insert Picture Here> End-to-end Data Integrity for NFS

Structuring PLFS for Extensibility

COS 318: Operating Systems

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Transcription:

The Parallel NFS Bugaboo Andy Adamson Center For Information Technology Integration University of Michigan

Bugaboo? Bugaboo a source of concern. the old bugaboo of inflation still bothers them [n] an imaginary monster used to frighten children NFS: the bugaboo of state will come back to haunt you

The Parallel NFS Bugaboo Statelessness allowed NFS Versions 2 and 3 servers to export shared storage in parallel NFSv4 servers don't have it so easy. They have their own state to manage -- like OPEN -- The protocol does not support distributing it among multiple servers, making it difficult to export shared storage in parallel The aggregate bandwidth demands of clustered clients surpass the bandwidth available with multiple parallel NFS service Bandwidth will be limited as long as access through the NFS protocol requires access to a single server

Outline pnfs Won't describe pnfs, see Brent Welch's talk First implementation: LAYOUTGET operation Security issues Parallel NFS version 4 Servers on Linux Which state New file system interfaces? Other methods

pnfs Prototype Work done by Dean Hildebrant, CITI NFSv4 on Linux 2.6.6 Export PVFS2 0.5.1 file system PVFS2 suited for prototype: Algorithmic file layout layout doesn't change if number or location of disks remain constant LAYOUTGET can be implemented without recall operations Simple file system with no meta data locking Used by tri-labs

Prototype Goals Verify crucial portion of the pnfs protocol Proposed LAYOUTGET parameters sufficient Validate opaque layout design Look at worst case: LAYOUTGET with each READ or WRITE Can direct access storage protocol can address the NFS 32KB limit?

PVFS2 0.5.1 Overview Elements: Clients (up to 10,000s) I/O storage nodes (up to 100s) One meta data server User space with OS-specific kernel module Algorithmic file layout Currently supports round robin striping Simple modular design No locking protocol, no caching

PVFS2 Architecture

pnfs Architecture

pnfs Prototype Architecture

pnfs Process Flow 1 Open request 2 LAYOUTGET request 3 Read/Write operations 4 Close operation I/O Storage Nodes 3 Read/Write pnfs Client pnfs Server 1 2 4 OPEN LAYOUTGET CLOSE

pnfs LAYOUTGET Operation Retrieves file layout information Typically a set of <device id, data id> pairs, one for each storage node Layout: Opaque to the pnfs client Layout: Understood by the driver module Valid until file close

LAYOUTGET Request Time (ms) Average LAYOUTGET request Average 32KB read Average 32KB write Average 1MB read Average 1MB write 1.26 2.5 2.8 28.9 29.1

Experiment Block Size

Write - 32K Block Size

Write 2MB Block Size

Read 32K Block Size

Read 2MB Block Size

pnfs Prototype: Whats next? Cache layouts Implement CB_LAYOUTRETURN Implement LAYOUTRELEASE Continue performance testing

pnfs What about Security? Goal: keep all the strong NFSv4.0 security properties ACL and authentication checks remain Still able to get a GSS context between pnfs user and pnfsd pnfsd on the meta data service still does access checks at OPEN

pnfs Architecture

pnfs: Storage Channel Security No issues with AUTH_NONE, AUTH_UNIX RPCSEC GSS: three issues RPCSEC GSS header checksum on all packets Mutual authentication: if GSS context is revoked, need to stop storage access for the user Data integrity and data privacy enforcement Each layout type will have a different security story pnfs working group just beginning to detail the possible solutions

OSD Channel Security OSD capabilities: Meta data service and storage nodes share keys (setup out of band) Keys sign a per object capability Capability signs each OSD command to storage similar to RPCSEC GSS header checksum Associate capability set to a GSS context If GSS context disappears, capabilities are revoked Data integrity, privacy: rely on underlying transport, perhaps IPSEC

Block Storage Channel Security Relies on SAN-based security: trusts that clients will only access the blocks they have been directed to use. Fencing techniques: Heavy weight per client operations not per user. Not expected to be a part of normal pnfs execution path Need to rely on underlying transport for all security features (IPSEC)

pnfs Security Many issues related to security pnfs working group is moving toward in depth security discussions

Parallel NFS version 4 Servers on Linux Problem: How to share NFSv4 state between NFSv4 servers Not part of the pnfs protocol Problem: Exporting NFSv4 on cluster file systems Problem: Supporting multiple meta data servers on parallel file systems

Cluster File Systems Symmetric Out-Of-Band Every node is a fully capable client, data server and meta data server Examples: IBM GPFS, Redhat GFS, Polyserve Matrix Server

Parallel File Systems Asymmetric Out-Of-Band Clients access storage directly, Separate meta data server(s) Object Based: Lustre, Panasas ActiveScale Block Based: EMC s High Road, IBM SAN FS File Based: PVFS2

NFSv4 Server State Sharing Methods Via new server to server protocol Could conflict with existing meta data sharing architecture Redirect all OPENS for a file to a single NFSv4 server Need to track which server is handling OPEN Large number of clients opening a file is problematic

NFSv4 Server State Sharing Methods Via underlying file system Need additional file system interfaces and functionality Uses existing file system meta data sharing architecture Coordinates with local access

Potential NFSv4 Server State to Share ClientID Open owner, Open stateid (Share/deny locks) Lock Owner, Lock stateid (Byte-range locks) Delegation stateid and call back info

Simplifying Assumption Assumption: An NFS client mounts only one server per parallel or cluster file system at a time Allows for clientids, open owners, and lock owners to be kept in memory on the single server

File System Queries for Share Locks NFSD: Upon each OPEN ask the file system if any other NFSD has a conflicting share lock File system will need to add book keeping for deny access (for Windows clients) NFSD: OPEN stateid can be created and used as usual.

File System Queries for Byte-Range Locks NFSD: Upon each LOCK/UNLOCK ask the file system to manage the locks. There is an effort underway led by Sridhar Samudrala (sri@us.ibm.com) to expand the existing file_operations lock call to include enable NFS locking over clustered file systems. Useful for both LOCKD and NFSv4 server LOCK stateid can be created and used as usual

File System Queries for Delegation Hand out a delegation count readers/writers check if in recall state Recall a delegation register a recall callback receive a recall request Support Linux file_lock FL_LEASE has this functionality

Discussion The combination of pnfs and parallel NFSv4 server state sharing can solve the 'parallel NFS bugaboo' pnfs effort is well underway Parallel NFSv4 server state sharing is left as a per OS solution CITI will prototype file system extensions to enable parallel NFSv4 server state sharing on Linux

Any Questions? http://www.citi.umich.edu/projects