Network File Systems

Similar documents
NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

NETWORKED STORAGE AND REMOTE PROCEDURE CALLS (RPC)

Distributed File Systems

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Network File System (NFS)

Network File System (NFS)

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

Module 7 File Systems & Replication CS755! 7-1!

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Distributed File Systems I

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

Chapter 12 Distributed File Systems. Copyright 2015 Prof. Amr El-Kadi

Lecture 7: Distributed File Systems

Remote Procedure Call (RPC) and Transparency

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Announcements. P4: Graded Will resolve all Project grading issues this week P5: File Systems

Network File System (NFS)

Filesystems Lecture 13

DISTRIBUTED FILE SYSTEMS & NFS

Distributed File Systems. CS432: Distributed Systems Spring 2017

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Distributed File Systems. File Systems

Introduction. Chapter 8: Distributed File Systems

Filesystems Lecture 11

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

NFS. Don Porter CSE 506

DFS Case Studies, Part 1

File Systems: Consistency Issues

Primary-Backup Replication

Causal Consistency and Two-Phase Commit

Today: Distributed File Systems

Lecture 19. NFS: Big Picture. File Lookup. File Positioning. Stateful Approach. Version 4. NFS March 4, 2005

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #4 Tuesday, December 15 th 11:00 12:15. Advanced Topics: Distributed File Systems

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed file systems

Distributed File Systems. Directory Hierarchy. Transfer Model

THE ANDREW FILE SYSTEM BY: HAYDER HAMANDI

Chapter 8: Distributed File Systems. Introduction File Service Architecture Sun Network File System The Andrew File System Recent advances Summary

Announcements. Persistence: Log-Structured FS (LFS)

Today: Distributed File Systems. File System Basics

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Today: Distributed File Systems

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS: The Google File System

Implementing caches. Example. Client. N. America. Client System + Caches. Asia. Client. Africa. Client. Client. Client. Client. Client.

Today: Distributed File Systems!

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Operating Systems CMPSC 473 File System Implementation April 10, Lecture 21 Instructor: Trent Jaeger

Distributed File System

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

The Google File System

Today: Distributed File Systems. Naming and Transparency

Dr. Robert N. M. Watson

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

PRIMARY-BACKUP REPLICATION

Introduction. Distributed file system. File system modules. UNIX file system operations. File attribute record structure

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

Sun s Network File System (NFS)

Distributed Systems. Lecture 07 Distributed File Systems (1) Tuesday, September 18 th, 2018

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

GFS: The Google File System. Dr. Yingwu Zhu

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

Distributed Systems 16. Distributed File Systems II

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

EECS 482 Introduction to Operating Systems

Administrative Details. CS 140 Final Review Session. Pre-Midterm. Plan For Today. Disks + I/O. Pre-Midterm, cont.

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

Concurrency Control II and Distributed Transactions

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

CLOUD-SCALE FILE SYSTEMS

Distributed File Systems. Distributed Systems IT332

Chapter 7: File-System

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

File Systems. Chapter 11, 13 OSPP

COS 318: Operating Systems. Journaling, NFS and WAFL

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Presented by: Alvaro Llanos E

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

The Google File System (GFS)

Distributed File Systems II

To Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1

NTFS Recoverability. CS 537 Lecture 17 NTFS internals. NTFS On-Disk Structure

Introduction to Cloud Computing

CPS 512 midterm exam #1, 10/7/2016

Current Topics in OS Research. So, what s hot?

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Operating Systems Design 16. Networking: Remote File Systems

CS 318 Principles of Operating Systems

CS5460: Operating Systems Lecture 20: File System Reliability

File Systems. CS170 Fall 2018

CS5412: TRANSACTIONS (I)

Network File System (NFS) Hard State Revisited: Network Filesystems. File Handles. NFS Vnodes. NFS as a Stateless Service

Replication. Consistency models. Replica placement Distribution protocols

Transcription:

Network File Systems CS 240: Computing Systems and Concurrency Lecture 4 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material.

Abstraction, abstraction, abstraction! Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much better Distributed file systems Make a remote file system look local Today: NFS (Network File System) Developed by Sun in 1980s, still used today! 2

3 Goals: Make operations appear: Local Consistent Fast 3

NFS Architecture Server 1 (root) Client Server 2 (root) (root) ex port... vmuni x us r nfs people Remote mount students x staff Remote mount us ers big jon bob... jim ann jane joe Mount remote FS (host:path) as local directories

Virtual File System enables transparency

Interfaces matter 6

VFS / Local FS fd = open( path, flags) read(fd, buf, n) write(fd, buf, n) close(fd) Server maintains state that maps fd to inode, offset 7

Stateless NFS: Strawman 1 fd = open( path, flags) read( path, buf, n) write( path, buf, n) close(fd) 8

Stateless NFS: Strawman 2 fd = open( path, flags) read( path, offset, buf, n) write( path, offset, buf, n) close(fd) 9

10 Embed pathnames in syscalls? Should read refer to current dir1/f or dir2/f? In UNIX, it s dir2/f. How do we preserve in NFS?

Stateless NFS (for real) fh = lookup( path, flags) read(fh, offset, buf, n) write(fh, offset, buf, n) getattr(fh) Implemented as Remote Procedure Calls (RPCs) 11

NFS File Handles (fh) Opaque identifier provider to client from server Includes all info needed to identify file/object on server volume ID inode # generation # It s a trick: store server state at the client!

NFS File Handles (and versioning) With generation # s, client 2 continues to interact with correct file, even while client 1 has changed f This versioning appears in many contexts, e.g., MVCC (multiversion concurrency control) in DBs 13

NFS example fd = open( /foo,...); Send LOOKUP (rootdir FH, foo ) Receive LOOKUP request look for foo in root dir return foo s FH + attributes Receive LOOKUP reply allocate file desc in open file table store foo s FH in table store current file position (0) return file descriptor to application 14

NFS example read(fd, buffer, MAX); Index into open file table with fd get NFS file handle (FH) use current file position as offset Send READ (FH, offset=0, count=max) Receive READ reply update file position (+bytes read) set current file position = MAX return data/error code to app Receive READ request use FH to get volume/inode num read inode from disk (or cache) compute block location (using offset) read data from disk (or cache) return data to client 15

NFS example read(fd, buffer, MAX); Same except offset=max and set current file position = 2*MAX read(fd, buffer, MAX); Same except offset=2*max and set current file position = 3*MAX close(fd); Just need to clean up local structures Free descriptor fd in open file table (No need to talk to server) 16

Handling server failures What to do when server is not responding? Retry again! set a timer; a reply before cancels the retry; else retry Is it safe to retry operations? NFS operations are idempotent the effect of multiple invocations is same as single one LOOKUP, READ, WRITE: message contains all that is necessary to re-execute What is not idempotent? E.g., if we had INCREMENT Real example: MKDIR is not 17

Are remote == local?

TANSTANFL (There ain t no such thing as a free lunch) With local FS, read sees data from most recent write, even if performed by different process Read/write coherence, linearizability Achieve the same with NFS? All operations appear to have Perform all reads & writes executed synchronously atomically in to an server order that is consistent with the global Huge cost: high latency, real-time low scalability ordering of operations. (Herlihy & Wing, 1991) And what if the server doesn t return? Options: hang indefinitely, return ERROR 19

Caching GOOD Lower latency, better scalability Consistency HARDER No longer one single copy of data, to which all operations are serialized 20

Caching options Centralized control: Record status of clients (which files open for reading/writing, what cached, ) Read-ahead: Pre-fetch blocks before needed Write-through: All writes sent to server Write-behind: Writes locally buffered, send as batch

Cache consistency problem Consistency challenges: When client writes, how do others caching data get updated? (Callbacks, ) Two clients concurrently write? (Locking, overwrite, ) C1 cache: F[v1] C2 cache: F[v2] C3 cache: empty Server S disk: F[v1] at first F[v2] eventually 22

Should server maintain per-client state? Stateful Pros Smaller requests Simpler req processing Better cache coherence, file locking, etc. Cons Per-client state limits scalability Fault-tolerance on state required for correctness Stateless Pros Easy server crash recovery No open/close needed Better scalability Cons Each request must be fully self-describing Consistency is harder, e.g., no simple file locking

It s all about the state, bout the state, Hard state: Don t lose data Durability: State not lost Write to disk, or cold remote backup Exact replica or recoverable (DB: checkpoint + op log) Availability (liveness): Maintain online replicas Soft state: Performance optimization Then: Lose at will Now: Yes for correctness (safety), but how does recovery impact availability (liveness)? 24

NFS Stateless protocol Recovery easy: crashed == slow server Messages over UDP (unencrypted) Read from server, caching in NFS client NFSv2 was write-through (i.e., synchronous) NFSv3 added write-behind Delay writes until close or fsync from application 25

Exploring the consistency tradeoffs Write-to-read semantics too expensive Give up caching, require server-side state, or Close-to-open session semantics Ensure an ordering, but only between application close and open, not all writes and reads. If B opens after A closes, will see A s writes But if two clients open at same time? No guarantees And what gets written? Last writer wins 26

NFS Cache Consistency Recall challenge: Potential concurrent writers Cache validation: Get file s last modification time from server: getattr(fh) Both when first open file, then poll every 3-60 seconds If server s last modification time has changed, flush dirty blocks and invalidate cache When reading a block Validate: (current time last validation time < threshold) If valid, serve from cache. Otherwise, refresh from server 27

Some problems Mixed reads across version A reads block 1-10 from file, B replaces blocks 1-20, A then keeps reading blocks 11-20. Assumes synchronized clocks. Not really correct. We ll learn about the notion of logical clocks later Writes specified by offset Concurrent writes can change offset 28

Server-side write buffering xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz write(fd, a_buffer, size); // fill first block with a s write(fd, b_buffer, size); // fill second block with b s write(fd, c_buffer, size); // fill third block with c s Expected: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc But assume server buffers 2 nd write, reports OK but then crashes: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc Server must commit each write to stable (persistent) storage before informing the client of success 29

When statefulness helps Callbacks Locks + Leases 30

NFS Cache Consistency Recall challenge: Potential concurrent writers Timestamp invalidation: NFS Callback invalidation: AFS, Sprite, Spritely NFS Server tracks all clients that have opened file On write, sends notification to clients if file changes; client invalidates cache Leases: Gray & Cheriton 89, NFSv4 31

Locks A client can request a lock over a file / byte range Advisory: Well-behaved clients comply Mandatory: Server-enforced Client performs writes, then unlocks Problem: What if the client crashes? Solution: Keep-alive timer: Recover lock on timeout Problem: what if client alive but network route failed? Client thinks it has lock, server gives lock to other: Split brain 32

Leases Client obtains lease on file for read or write A lease is a ticket permitting an activity; the lease is valid until some expiration time. Read lease allows client to cache clean data Guarantee: no other client is modifying file Write lease allows safe delayed writes Client can locally modify then batch writes to server Guarantee: no other client has file cached

Using leases Client requests a lease May be implicit, distinct from file locking Issued lease has file version number for cache coherence Server determines if lease can be granted Read leases may be granted concurrently Write leases are granted exclusively If conflict exists, server may send eviction notices Evicted write lease must write back Evicted read leases must flush/disable caching Client acknowledges when completed 34

Bounded lease term simplifies recovery Before lease expires, client must renew lease Client fails while holding a lease? Server waits until the lease expires, then unilaterally reclaims If client fails during eviction, server waits then reclaims Server fails while leases outstanding? On recovery, Wait lease period + clock skew before issuing new leases Absorb renewal requests and/or writes for evicted leases

Requirements dictate design Case Study: AFS 36

Andrew File System (CMU 1980s-) Scalability was key design goal Many servers, 10,000s of users Observations about workload Reads much more common than writes Concurrent writes are rare / writes between users disjoint Interfaces in terms of files, not blocks Whole-file serving: entire file and directories Whole-file caching: clients cache files to local disk Large cache and permanent, so persists across reboots

AFS: Consistency Consistency: Close-to-open consistency No mixed writes, as whole-file caching / whole-file overwrites Update visibility: Callbacks to invalidate caches What about crashes or partitions? Client invalidates cache iff Recovering from failure Regular liveness check to server (heartbeat) fails. Server assumes cache invalidated if callbacks fail + heartbeat period exceeded

Next lecture topic: Time Synchronization and Logical Clocks 39