Distributed File Systems I

Similar documents
Distributed File Systems

Distributed File Systems. CS432: Distributed Systems Spring 2017

DFS Case Studies, Part 1

Lecture 7: Distributed File Systems

A Low-bandwidth Network File System

Introduction. Chapter 8: Distributed File Systems

THE ANDREW FILE SYSTEM BY: HAYDER HAMANDI

Network File Systems

Chapter 8: Distributed File Systems. Introduction File Service Architecture Sun Network File System The Andrew File System Recent advances Summary

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Distributed File Systems. File Systems

Introduction. Distributed file system. File system modules. UNIX file system operations. File attribute record structure

Network File System (NFS)

Network File System (NFS)

Chapter 12 Distributed File Systems. Copyright 2015 Prof. Amr El-Kadi

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

DISTRIBUTED FILE SYSTEMS & NFS

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Chapter 8 Distributed File Systems

Filesystems Lecture 11

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

Network File System (NFS)

Filesystems Lecture 13

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Distributed File Systems. Directory Hierarchy. Transfer Model

3/4/14. Review of Last Lecture Distributed Systems. Topic 2: File Access Consistency. Today's Lecture. Session Semantics in AFS v2

Module 7 File Systems & Replication CS755! 7-1!

DFS Case Studies, Part 2. The Andrew File System (from CMU)

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Distributed File Systems II

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

The Google File System

Distributed File Systems

Distributed file systems

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Lecture 19. NFS: Big Picture. File Lookup. File Positioning. Stateful Approach. Version 4. NFS March 4, 2005

Distributed File Systems

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Section 14: Distributed Storage

Operating Systems Design 16. Networking: Remote File Systems

Cloud Computing CS

Chapter 11: File System Implementation. Objectives

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Announcements. P4: Graded Will resolve all Project grading issues this week P5: File Systems

Chapter 12: File System Implementation

CSE 486/586: Distributed Systems

OPERATING SYSTEM. Chapter 12: File System Implementation

Remote Procedure Call (RPC) and Transparency

Chapter 11: Implementing File-Systems

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #4 Tuesday, December 15 th 11:00 12:15. Advanced Topics: Distributed File Systems

Distributed File Systems: Design Comparisons

Ch. 7 Distributed File Systems

Chapter 11: Implementing File Systems

File Systems Management and Examples

File-System Structure

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Today: Distributed File Systems

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Distributed File Systems. Case Studies: Sprite Coda

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Operating Systems 2010/2011

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Today: Distributed File Systems

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Distributed File Systems. Jonathan Walpole CSE515 Distributed Computing Systems

Distributed Systems. Lecture 07 Distributed File Systems (1) Tuesday, September 18 th, 2018

CSE 153 Design of Operating Systems

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

Chapter 10: File System Implementation

Current Topics in OS Research. So, what s hot?

What is a file system

CS 416: Operating Systems Design April 22, 2015

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Today: Distributed File Systems. File System Basics

Chapter 11: File System Implementation

Chapter 12: File System Implementation

CS454/654 Midterm Exam Fall 2004

Chapter 11: Implementing File

Chapter 11: File System Implementation

COS 318: Operating Systems. Journaling, NFS and WAFL

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Service and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838

Announcements. Reading: Chapter 16 Project #5 Due on Friday at 6:00 PM. CMSC 412 S10 (lect 24) copyright Jeffrey K.

Chapter 11: Implementing File Systems

Distributed Systems. Distributed File Systems. Paul Krzyzanowski

Cloud Computing CS

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

Dr. Robert N. M. Watson

Today: Distributed File Systems. Naming and Transparency

CLOUD-SCALE FILE SYSTEMS

Chapter 12 File-System Implementation

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

The Google File System

Today: Distributed File Systems!

Google File System. Arun Sundaram Operating Systems

Transcription:

Distributed File Systems I To do q Basic distributed file systems q Two classical examples q A low-bandwidth file system xkdc

Distributed File Systems Early DFSs come from the late 70s early 80s Support network-wide sharing of files and devices DFS typically present a traditional file system view A single file system namespace that all clients see One client can observe the side-effects of other clients file system activities In many ways, an ideal DFS provides clients with the illusion of a shared, local FS But with a distributed implementation Read blocks / files from remote machines across a network, instead of from a local disk 2

Goals and challenges Start with a prioritized set of goals Performance, scale Understand the workload to inform the design User-oriented file systems NFS, AFS, How users use files Most files are privately owned Not too much concurrent access Sequential access is common More reads than writes Big-program/big-data workloads GFS, HDFS 3

A basic DFS architecture Offering a clear separation of concerns Client Server Application program Client module Application program Directory service Flat file service Client module supports a FS API (say, Posix) using DS and FFS Flat file service (FSS) operations on files, referred by a UFID (unique file id) Directory service (DS) mapping of text names to UFIDs; a client of the FFS (directories as files) 4

Clients, FFS and DS operations If client issues an open and then a read, client module invokes DS and FFS operations and maintains the necessary state FFS Create() è FileID Read(FileID, i, n) è Data - Read up to n from FileID starting at i Write(FileID, i, Data) - Write Data to FileID starting at i DS Lookup(Dir, Name) è FileID Locate names, return UFID AddName(Dir, Name, FileID) - Add (Name, FileID) to directory and update file s attributes GetNames(Dir, Pattern) è NameSeq 5

Sun s Network File System (NFS) Developed by SUN as an open-protocol system Now, a common standard for distributed UNIX file access The first DFS built as a product NFS runs over LANs (even over WANs slowly) A key goal simple, fast server crash recovery High-level architecture Client Application program Application program NFS protocol Server Virtual File System RPC Virtual File System UNIX FS Other FS NFS client NFS client UNIX FS kernel kernel 6

NFS protocol Key to the protocol file handle Unit of file grouping is the mountable file system File handle is opaque to clients Derived from the i-node plus generation number and FS identifier Gen. # since file i-node #s are reused in UFS after file is removed Some operations similar to our model NFSPROC_LOOKUP In: dirfh, name; Out: fh, attr To get a file handle; attributes are just metadata the FS tracks for each file NFSPROC_GETATTR In: fh, Out: attr NFSPROC_READ In: fh, offset, count; Out:attr, data File handle, offset and number of bytes to read NFSPROC_MKDIR https://tools.ietf.org/html/rfc1094 In: dirfh, name, attr; Out: newfh,attr 7

NFS protocol A client passes a directory file handle and name of a file to look up, this to obtain a file handle and its attributes Attributes metadata kept by the FS such as creation and last modification time, size, ownership, You can set them with NFSPROC_SETATTR Once the client has the file handle, it can issue R/W To read (NFSPROC_READ), client has to pass the FH along with the offset and number of bytes to read 8

Reading a file App Client Server fd = open( /foo, ); Send Lookup (rootdir FH, foo ) Receive lookup reply Allocate file desc in open file table Store foo s FH in table Store current file position (O) Return file descriptor to app Receive Lookup request Look for foo in root dir Return foo s FH + attributes read(fd, buffer, MAX); Index into open file table with fd Get NFS file handle (FH) Use current file position as offset Send Read(FH, offset, count) Receive read reply Update file position (+bytes read) Set current file position = MAX Return data/error code to app Receive Read request Use FH to get vol/i-node num Read i-node from disk Compute block location (using offset) Read data from disk Return data to client Note that every request has all the information needed to complete it 9

Key to fast crash recovery Statelessness State-full and stateless think of open Server opens file locally, sends fd back to client Client uses fd on subsequent operations File descriptor is a piece of shared state What if server crashes? A stateless protocol No state kept on the server side char buffer[max]; int fd = open( foo, O_RDONLY); read(fd, buffer, MAX); read(fd, buffer, MAX); read(fd, buffer, MAX); close(fd); If client is caching and what, which files are open, where is the file pointer for a file, Authentication with each request Each client op carries all info needed to complete the request 10

More on fault tolerance Idempotent ops When a client sends a message, it may not get a reply Did the network drop it? Did the server crash (before, after)? What to do?! NFS answer simply retry Set a timer when sending request If reply doesn t arrive on time, retry Key: operations must be idempotent Doing them 1+ times should be the same (e.g., read a value) Counterexample increment a variable Hard to do with everything - mkdir 11

NFS and VFS for transparency The virtual file system (VFS) provides a standard interface, using v-nodes as file handles A v-node describes either a local or remote file Basic idea Allow a remote directory to be mounted onto a local directory Give access to remote directory and descendants as if they were part of the local hierarchy Virtual File System UNIX FS Other FS Pretty similar to a local mount or link on UNIX, except for implementation and performance NFS client kernel 12

NFS mounting Mounting done by a separate mount service On each server a /etc/exports w/ names of local FS available for remote mounting and a ACL for each A modified version of mount that uses an RPC protocol to hard or soft-mount a remote FS Hard-mounted a process accessing a file in the FS is blocked until it succeed (so if server is restarted, the process would continue as normal) Soft-mounted returns a failure after a few retries Automounter Added later Mount on demand Try a number of servers when first access Mount the first one to respond Fault tolerance and some degree of load balancing One can define multiple repos for read-only data to choose from 13

Performance NFS server & client caching NFS server uses cache as in other file accesses Reads have no issue Writes can be write-through to disk before replying Asynchronous write Added to NFSv3 to handle performance bottleneck at servers Write to disk with an explicit commit operation Client caching for performance, but Update visibility when do updates from a client become visible to others? Stale cache One client wrote and flushed to the server so server has the latest copy, but another client has a cached version 14

NFS caching / sharing To address them Flush-on-close on clients Flush updates when closing file, or a sync is issued (or every 30s in newer versions) Clients are responsible for validating cache entries Cache entry is newer than freshness interval t is 3-30, 30-60 for directories If valid, no need to talk to the server (reducing the load on it) Last modification time recorded by client is same as server s Clients check (getattr) with server before using cache Attr are piggybacked on results of other ops Still not same consistency as local delay after write + freshness interval 15

One level up Some DFS issues Consider these issues and how they map to NFS and the following file and storage systems we ll discuss What is the basic abstraction A remote file system? Open, close, read, write, A remote disk? Read block, write block Degrees of transparency Access Local or remote without change Location Is the file location visible to the user? Mobility Do name change if the file moves? Performance Consistent while load on the system changes Scaling Can be expanded by incremental growth 16

DFS issues Caching for performance Where are file blocks cached? On the file server? On the client machine? Both? Sharing and coherency What are the semantics of sharing? What happens when a cached block/file is modified? How does a node know when its cached blocks are stale? If we cache on the client side, we re presumably caching on multiple client machines if a file is being shared 17

DFS issues Replication for performance and/or availability Can there be multiple copies of a file in the network? If multiple copies, how are updates handled? What if there is a network partition? Can clients work on separate copies? How does reconciliation work? Performance What is the performance of remote operations? What is the additional cost of file sharing? How does the system scale with number of clients? What are the performance bottlenecks: network, CPU, disks, protocols, data copying? 18

DFS Issues Access control In Unix FS the user s access rights are checked against access mode in open User ID is retrieved at login and cannot be tampered with UID is used in access rights checks, once at open In a DFS, access rights have to be done at the server RPC interface is an unprotected access point otherwise UID has to be passed and server is vulnerable to forged IDs If access are retained at the server, no longer stateless Two approaches Capability-based: check when resolving to UFID and encode as capability (returned to client) Per request: submit UID with every request (digital signatures for forged IDs); this is the most common 19

20

CMU s Andrew File System (AFS) From CMU (80s) to support students computing UNIX API, NFS compatible Key design goal scalability to number of clients System setup Workstation clients (with disks) and dedicated file server machines (differs from NFS where machines are symmetric) Key strategy Whole-file serving and caching on the local disk 21

AFS design guides (the value of characterization) Designed based on measurement-driven observations Files are small, <10KB Reads are much more frequent than writes Sequential access is common, random is rare Most files are read/written by a single user Files are referenced in burst (temporal locality) Counterexample? Databases 22

CMU s Andrew File System (AFS) Implemented as two software components, running as user processes at client (Venus) and server machines (Vice) Client Application program Venus Vice Server UNIX Kernel UNIX Kernel 23

CMU s AFS First version (ITC) With open, client sends fetch with entire pathname Serve traversed pathname, find file, ship the entire file back Read/write locally Flush file at closing, if modified Next time file is accessed, send a TestAuth message to server to see if the file has changed If not, proceed with local copy Application program UNIX FS call Modified UNIX Kernel Venus Non-local file operation / tmp bin home cmu bin Local Shared 24

From AFS version 1 to 2 Sever overloaded; again measure to diagnose Path-traversal cost are high To access a file yourfile.txt, server have to traverse the full pathname (/home/you/yourfile.txt) each time Use file handles Client issues too many TestAuth to server To check whether a local file was valid Use callbacks Vice issue a callback promise with every copy of a file; when a server updates a file, notifies all Venuses with valid callbacks Not a stateless server; keep list in disc, updates via atomic ops 25

From AFS version 1 to 2 Sever overloaded; again measure to diagnose Load balancing problem (some servers used more) Define volumes; move volumes between servers if necessary Each client was handle by a single process, with context switching costs and other overhead Use threads instead of processes in the server 26

Other AFS issues What happen if two clients are modifying a file at the same time? Last write wins What about crash recovery? Client crash maybe the server was trying to send an invalidation Client should check with the server about its cache content before using it Server crash callbacks are kept in memory, so when server reboots it has no idea who has what Maybe server can warn clients ( don t trust your cache ) when it gets back? Other improvements Andrew has a single name space your files have the same names everywhere in the world In NFS you can mount a FS where you pleased User authentication, flexible user-managed access control 27

File systems and wide-area networks These network file systems are a useful abstraction But few people use them over wide-area networks Problem they require too much bandwidth Saturate bottleneck link Interfere with others Other alternatives Relax consistency semantics But many apps need strict consistency (email, RCS, ) Copy file back and forth to work on them Threatens consistency; not all ways works (symlinks) User remote login Graphical apps require too much bandwidth Interactive programs sensitive to latency and packet loss 28

Low Bandwidth File System Observation Much inter-file commonality Editing/word processing workloads - localize edits, autosave files, Software development workloads modify headers, concatenate object files into a library, LBFS exploit the commonalities to save bandwidth Avoids sending data that can be found in the server s FS or the client s cache 29

LBFS avoiding redundant data transfers Server divides file it stores into chunks and indexes the chunks by hash value Break file into ~8k data chunks Send hashes of the file s chunks Client similarly indexes its file cache Only send the chunks needed 30

Dividing files into chunks Straw man approach aligned 8k chunks Inserting one byte at the start changes all chunks Base chunks on file contents Allow variable-length chunks Compute running hash of every overlapping 48B region If has mod 8K = special value, a chunk boundary Stripes show regions with magic values that create chunk boundaries Chunks of file before/after edits; color shows edits 31

Some details Chunking pathological cases Very small chunks Sending hashes of chunks ~= just sending the file Very large chunks Cannot be sent in a single RPC LBFS imposes min. (2K) and max chunk (64K) sizes Other features of LBFS Uses conventional compression (gzip) and caching Leases instead of Andrew s callbacks Server s commitment to inform clients of changes expires after some time 32

Reading in LBFS Client GETHASH READ READ Server (hash1, size1) (hash1, size1) (hash1, size1) EOF data2 data3 33

Bandwidth utilization with LBFS Emacs recompile Bandwidth: emacs recompile To isolate the benefit of exploiting file commonalities MBytes 30 20 10 NFS v3 AFS Leases+Gzip LBFS, new DB LBFS 0 Upstream Downstream Server started with a new database without chunks from previous compiles 34

Summary Building a DFS, many issues to deal with Basic abstraction? naming, caching, sharing and coherency, replication, performance, workload No right answer! Different systems, different tradeoff Performance is always an issue Always a tradeoff between performance and the semantics of file operations (e.g., for shared files) And the changing underlying settings change this Caching is crucial in any file system And so it is maintaining coherency 35