Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Similar documents
416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Distributed File Systems

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

Distributed Systems. Lecture 07 Distributed File Systems (1) Tuesday, September 18 th, 2018

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

GFS: The Google File System

Distributed Systems. Big Topic: Synchronization. Lec 6: Local Synchronization Primitives. Slide acks: Jinyang Li, Dave Andersen, Randy Bryant

GFS: The Google File System. Dr. Yingwu Zhu

Network File Systems

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Systems. Lec 11: Consistency Models. Slide acks: Jinyang Li, Robert Morris

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

DISTRIBUTED MUTEX. EE324 Lecture 11

NFS. Don Porter CSE 506

Distributed File Systems. Directory Hierarchy. Transfer Model

416 Distributed Systems. Distributed File Systems 4 Jan 23, 2017

Today: Distributed File Systems

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Distributed File Systems II

Distributed Systems. Catch-up Lecture: Consistency Model Implementations

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

Network File System (NFS)

Network File System (NFS)

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

PROCESS SYNCHRONIZATION

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Replication. Feb 10, 2016 CPSC 416

Distributed Systems. Distributed File Systems 1

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

3/4/14. Outline Distributed Systems. andrew.cmu.edu. Why DFSs are Useful. What Distributed File Systems Provide.

Arranging lunch value of preserving the causal order. a: how about lunch? meet at 12? a: <receives b then c>: which is ok?

The Google File System

Process Synchroniztion Mutual Exclusion & Election Algorithms

Distributed Systems. Lec 12: Consistency Models Sequential, Causal, and Eventual Consistency. Slide acks: Jinyang Li

Causal Consistency and Two-Phase Commit

Distributed Operating Systems. Distributed Synchronization

Today: Distributed File Systems!

EECS 482 Introduction to Operating Systems

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Synchronization (contd.)

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Map-Reduce. Marco Mura 2010 March, 31th

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Synchronization. Clock Synchronization

PRIMARY-BACKUP REPLICATION

DISTRIBUTED FILE SYSTEMS & NFS

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Implementing caches. Example. Client. N. America. Client System + Caches. Asia. Client. Africa. Client. Client. Client. Client. Client.

Coordination and Agreement

Today: Distributed File Systems. Naming and Transparency

416 practice questions (PQs)

Ext3/4 file systems. Don Porter CSE 506

Distributed Systems 16. Distributed File Systems II

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Intuitive distributed algorithms. with F#

CSE 486/586 Distributed Systems

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 25 Feb 2009 Spring 2010 Exam 1

Mutual Exclusion. A Centralized Algorithm

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

CS5460: Operating Systems Lecture 20: File System Reliability

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

SEGR 550 Distributed Computing. Final Exam, Fall 2011

CPS 512 midterm exam #1, 10/7/2016

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED MUTUAL EXCLUSION] Frequently asked questions from the previous class survey

The Google File System. Alexandru Costan

Distributed Systems Fall 2009 Final

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review

Data Replication CS 188 Distributed Systems February 3, 2015

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

CMPSCI 677 Operating Systems Spring Lecture 14: March 9

What is a file system

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Byzantine Fault Tolerance

Last Class: Clock Synchronization. Today: More Canonical Problems

Byzantine Fault Tolerance

Network File System (NFS)

Building up to today. Remote Procedure Calls. Reminder about last time. Threads - impl

Distributed File Systems. CS432: Distributed Systems Spring 2017

Google File System 2

Distributed Systems COMP 212. Revision 2 Othon Michail

Replication and Consistency

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Remote Procedure Call. Tom Anderson

Transcription:

Distributed Systems Lec 9: Distributed File Systems NFS, AFS Slide acks: Dave Andersen (http://www.cs.cmu.edu/~dga/15-440/f10/lectures/08-distfs1.pdf) 1

VFS and FUSE Primer Some have asked for some background on Linux FS structure and FUSE in support of homework 3 We'll talk about them now on whiteboard 2

Today Finish up distributed mutual exclusion from last lecture Distributed file systems (start) Sun s Network File System (NFS) CMU s Andrew File System (AFS) 3

Distributed Mutual Exclusion (Reminder) Ensure that only one thread can interact with shared resource (shared memory, file) at the same time Algorithms: Centralized algorithm (A1) Distributed algorithms A2: Token ring A3: Lamport s priority queues A4: Ricart and Agrawala (today) A5: Voting (today) Quiz: Explain algorithms A1-A3 4

Lamport s Algorithm (Reminder) Each process keeps a priority queue Q i, to which it adds any outstanding lock request it knows of, in order of logical timestamp To enter critical section at time T, process i sends REQUEST to everyone and waits for REPLYs from all processes and for all earlier requests in Q i to be RELEASEd To exit critical section, sends RELEASE to everyone Process i delays its REPLY to process j s REQUEST until j has answered any earlier REQUESTs that i has outstanding to j 5

Solution 4: Ricart and Agrawala An improved version of Lamport s shared priority queue Combines function of REPLY and RELEASE messages Delay REPLY to any requests later than your own Send all delayed replies after you exit your critical section 6

Solution 4: Ricart and Agrawala To enter critical section at process i : Stamp your request with the current time T Broadcast REQUEST(T) to all processes Wait for all replies To exit the critical section: Broadcast REPLY to all processes in Qi Empty Qi On receipt of REQUEST(T ): If waiting for (or in) critical section for an earlier request T, add T to Qi Otherwise REPLY immediately 7

Ricart and Agrawala Safety Safety and fairness claim: If T1<T2, then process P2 requesting a lock at T2 will enter its critical section after process P1, who requested lock at T1, exits Proof sketch (by contradiction): Assume P2 is enters crit. sec. before T1 exits it Consider how P2 collects its reply from P1: T1 must have already been time-stamped when request T2 was received by P1, otherwise the Lamport clock would have been advanced past time T2 But then P1 must have delayed reply to T2 until after request T1 exited the critical section Therefore T2 will not conflict with T1. 8

Solution 4: Ricart and Agrawala Advantages: Fair Short synchronization delay Simpler (therefore better) than Lamport s algorithm Disadvantages Still very unreliable 2(N-1) messages for each entry/exit 9

Solution 5: Majority Rules Instead of collecting REPLYs, collect VOTEs Each process VOTEs for which process can hold the mutex Each process can only VOTE once at any given time You hold the mutex if you have a majority of the VOTEs Only possible for one process to have a majority at any given time! 10

Solution 5: Majority Rules To enter critical section at process i : Broadcast REQUEST(T), collect VOTEs Can enter crit. sec. if collect a majority of VOTEs (N/2+1) To leave: Broadcast RELEASE to all processes who VOTEd for you On receipt of REQUEST(T ) from process j: If you have not VOTEd, VOTE for T Otherwise, add T to Qi On receipt of RELEASE: If Qi not empty, VOTE for pop(qi) 11

Solution 5: Majority Rules Advantages: Can progress with as many as N/2 1 failed processes Disadvantages: Not fair Deadlock! No guarantee that anyone receives a majority of votes 12

Solution 5 : Dealing with Deadlock Allow processes to ask for their vote back If already VOTEd for T and get a request for an earlier request T, RESCIND-VOTE for T If receive RESCIND-VOTE request and not in critical section, RELEASE-VOTE and re-request Guarantees that some process will eventually get a majority of VOTEs liveness Assuming network messages eventually get to destination But still not fair 13

Algorithm Comparison Algorithm Messages per entry/exit Synchronization delay (in RTTs) Liveness Central server Token ring 3 1 RTT Bad: coordinator crash prevents progress N <= sum(rtts)/2 Horrible: any process failure prevents progress Lamport 3*(N-1) max(rtt) across processes Horrible: any process failure prevents progress Ricart & Agrawal 2*(N-1) max(rtt) across processes Horrible: any process failure prevents progress Voting >= 2*(N-1) (might have vote recalls, too) max(rtt) between the fastest N/2+1 processes Great: can tolerate up to N/2-1 failures (sync delay: you request the lock; no one else has it; how long till you get it?) 14

So, Who Wins? Well, none of the algorithms we ve looked at thus far But the closest one to industrial standards is 15

So, Who Wins? Well, none of the algorithms we ve looked at thus far But the closest one to industrial standards is The centralized model (e.g., Google s Chubby, Yahoo s ZooKeeper) 16

So, Who Wins? The closest to the industrial standards is The centralized model (e.g., Google s Chubby, Yahoo s ZooKeeper) But replicate it for fault-tolerance across a few machines Replicas coordinate closely via mechanisms similar to the ones we ve shown for the distributed algorithms (e.g., voting) we ll talk later about generalized voting alg. For manageable load, app writers must avoid using the centralized lock service as much as possible! 17

Take-Aways Lamport and Ricart & Agrawala s algorithms demonstrate utility of logical clocks Lamport algorithm demonstrates how distributed processes can maintain consistent replicas of a data structure (the priority queue)! We ll talk about replica consistency in the future If you build your distributed system wrong, then you get worse properties from distribution than if you didn t distribute at all 18

Today Finish up distributed mutual exclusion from last lecture Distributed file systems (start) Sun s Network File System (NFS) CMU s Andrew File System (AFS) 19

NFS and AFS Overview Networked file systems Their goals: Have a consistent namespace for files across computers Let authorized users access their files from any computer These FSes are different in properties and mechanisms, and that s what we ll discuss Server 10,000s of users from 10,000 of machines 20

Distributed-FS Challenges Remember our initial list of distributed-systems challenges from the first lecture? Interfaces Scalability Fault tolerance Concurrency Security Oh no... we ve got em all Can you give examples? How can we even start building such a system??? 21

How to Start? Often very useful to have a prioritized list of goals Performance, scale, consistency what s most important? Workload-oriented design Measure characteristics of target workloads to inform the design E.g., AFS and NFS are user-oriented, hence they optimize to how users use files (vs. big programs) Most files are privately owned Not too much concurrent access Sequential is common; reads more common than writes Other distributed FSes (e.g., Google FS) are geared towards big-program/big-data workloads (next time) 22

The FS Interface File Ops Directory Ops Client Open Read Write Read Write Close Create file Mkdir Rename file Rename directory Delete file Delete directory Server File Dir. 23

Naïve DFS Design Use RPC to forward every FS operation to the server Server orders all accesses, performs them, and sends back result Good: Same behavior as if both programs were running on the same local filesystem! Bad: Performance will stink. Latency of access to remote server often much higher than to local memory. Really bad: Server would get hammered! Lesson 1: Needing to hit the server for every detail impairs performance and scalability. Question 1: How can we avoid going to the server for everything? What can we avoid this for? What do we lose in the process? 24

Solution: Caching Lots of systems problems are solved in 1 of 2 ways: 1) Caching data 2) Adding a level of indirection All problems in computer science can be solved by adding a level of indirection; but this will usually cause other problems -- David Wheeler open/read/write/mkdir/ (frequent) Client Cache cache synch (rare) Server File Dir. Questions: What do we cache?? If we cache, don t we risk making things inconsistent? 25

Sun NFS Cache file blocks, directory metadata in RAM at both clients and servers. Advantage: No network traffic if open/read/write/close can be done locally. But: failures and cache consistency are big concerns with this approach NFS trades some consistency for increased performance... 26

Caching Problem 1: Failures Server crashes Any data that s in memory but not on disk is lost What if client does seek(); /* SERVER CRASH */; read() If server maintains file position in RAM, the read will return bogus data Lost messages What if we lose acknowledgement for delete( foo ) And in the meantime, another client created foo anew? The first client might retry the delete and delete new file Client crashes Might lose data updates in client cache 27

NFS s Solutions Stateless design Flush-on-close: When file is closed, all modified blocks sent to server. close() does not return until bytes safely stored. Stateless protocol: requests specify exact state. read() -> read([position]). no seek on server. Operations are idempotent How can we ensure this? Unique IDs on files/directories. It s not delete( foo ), it s delete(1337f00f), where that ID won t be reused. (See the level of indirection we ve added with this ID? ) 28

Caching Problem 2: Consistency If we allow client to cache parts of files, directory metadata, etc. What happens if another client modifies them? cache blocks in F access access access access C1 C2 cache blocks in F access access access 2 readers: no problem! But if 1 reader, 1 writer: inconsistency problem! 29

NFS s Solution: Weak Consistency NFS flushes updates on close() How does other client find out? NFS s answer: It checks periodically. This means the system can be inconsistent for a few seconds: two clients doing a read() at the same time for the same file could see different results if one had old data cached and the other didn t. 30

Design Choice Clients can choose a stronger consistency model: close-to-open consistency How? Always ask server for updates before open() Trades a bit of scalability / performance for better consistency (getting a theme here? ) 31

What about Multiple Writes? NFS provides no guarantees at all! Might get one client s writes, other client s writes, or a mix of both! 32

NFS Summary NFS provides transparent, remote file access Simple, portable, really popular (it s gotten a little more complex over time) Weak consistency semantics Requires hefty server resources to scale (flush-on-close, server queried for lots of operations) 33

Let s Look at AFS Now NFS addresses some of the challenges, but Doesn t handle scale well (one server only) Is very sensitive to network latency How does AFS improve this? More aggressive caching (AFS caches on disk in addition to RAM) Prefetching (on open, AFS gets entire file from server, making subsequent ops local & fast) 34

How to Cope with That Caching? Close-to-open consistency only Why does this make sense? (Hint: user-centric workloads) Cache invalidation callbacks Clients register with server that they have a copy of file Server tells them: Invalidate! if the file changes This trades server-side state (read: scalability) for improved consistency 35

AFS Summary Lower server load than NFS More files cached on clients Cache invalidation callbacks: server not busy if files are read-only (common case) But maybe slower: Access from local disk is much slower than from another machine s memory over a LAN For both, central server is: A bottleneck: reads and writes hit it at least once per file use; A single point of failure; Expensive: to make server fast, beefy, and reliable, you need to pay $$$. 36

Today s Bits Distributed filesystems always involve a tradeoff: consistency, performance, scalability. We ve learned a lot since NFS and AFS (and can implement faster, etc.), but the general lesson holds. Especially in the wide-area. We ll see a related tradeoff, also involving consistency, in a while: the CAP tradeoff (Consistency, Availability, Partition-resilience) 37

More Bits Client-side caching is a fundamental technique to improve scalability and performance But raises important questions of cache consistency Periodic refreshes and callbacks are common methods for providing (some forms of) consistency We ll talk about consistency more formally in future AFS picked close-to-open consistency as a good balance of usability (the model seems intuitive to users), performance, etc. Apps with highly concurrent, shared access, like databases, needed a different model 38

Next Time Another distributed file system, oriented toward other types of workloads (big-data/big-application workloads): the Google File System 39