Introduction. Chapter 8: Distributed File Systems

Similar documents
Chapter 8: Distributed File Systems. Introduction File Service Architecture Sun Network File System The Andrew File System Recent advances Summary

Introduction. Distributed file system. File system modules. UNIX file system operations. File attribute record structure

Chapter 8 Distributed File Systems

DFS Case Studies, Part 1

Distributed File Systems. File Systems

Lecture 7: Distributed File Systems

Distributed File Systems

Chapter 12 Distributed File Systems. Copyright 2015 Prof. Amr El-Kadi

Distributed File Systems

UNIT V Distributed File System

Module 7 File Systems & Replication CS755! 7-1!

DFS Case Studies, Part 2. The Andrew File System (from CMU)

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Ch. 7 Distributed File Systems

Distributed File Systems. CS432: Distributed Systems Spring 2017

Distributed Systems Distributed Objects and Distributed File Systems

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

Distributed File Systems. Distributed Computing Systems. Outline. Concepts of Distributed File System. Concurrent Updates. Transparency 2/12/2016

Distributed File Systems I

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25


Network File Systems

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Distributed File Systems

UNIT III. cache/replicas maintenance Main memory No No No 1 RAM File system n No Yes No 1 UNIX file system Distributed file

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Filesystems Lecture 11

Distributed file systems

DISTRIBUTED FILE SYSTEMS & NFS

SOFTWARE ARCHITECTURE 11. DISTRIBUTED FILE SYSTEM.

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Filesystems Lecture 13

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

CSE 486/586: Distributed Systems

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Operating Systems Design 16. Networking: Remote File Systems

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

Distributed File Systems. Case Studies: Sprite Coda

Distributed File Systems. Directory Hierarchy. Transfer Model

Bhaavyaa Kapoor Person #

Today: Distributed File Systems

Distributed File Systems. Jonathan Walpole CSE515 Distributed Computing Systems

Today s Objec2ves. AWS/MR Review Final Projects Distributed File Systems. Nov 3, 2017 Sprenkle - CSCI325

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

Today: Distributed File Systems. Naming and Transparency

Today: Distributed File Systems!

CA485 Ray Walshe Google File System

Course Objectives. Course Outline. Course Outline. Distributed Systems High-Performance Networks Clusters and Computational Grids

Distributed Systems - III

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Distributed Systems. Distributed File Systems. Paul Krzyzanowski

PEER TO PEER SERVICES AND FILE SYSTEM

The Google File System (GFS)

Module 17: Distributed-File Systems

Current Topics in OS Research. So, what s hot?

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Distributed File Systems: Design Comparisons

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

CS454/654 Midterm Exam Fall 2004

3/4/14. Review of Last Lecture Distributed Systems. Topic 2: File Access Consistency. Today's Lecture. Session Semantics in AFS v2

COS 318: Operating Systems. Journaling, NFS and WAFL

Cloud Computing CS

Module 17: Distributed-File Systems

THE ANDREW FILE SYSTEM BY: HAYDER HAMANDI

Operating Systems 2010/2011

NFS Design Goals. Network File System - NFS

Distributed Systems. Lecture 07 Distributed File Systems (1) Tuesday, September 18 th, 2018

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

The Google File System

What is a file system

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Presented by: Alvaro Llanos E

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Lecture 19. NFS: Big Picture. File Lookup. File Positioning. Stateful Approach. Version 4. NFS March 4, 2005

NETWORKED STORAGE AND REMOTE PROCEDURE CALLS (RPC)

Extreme computing Infrastructure

Network File System (NFS)

Network File System (NFS)

CLOUD-SCALE FILE SYSTEMS

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/15

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Distributed Filesystem

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #4 Tuesday, December 15 th 11:00 12:15. Advanced Topics: Distributed File Systems

Example Implementations of File Systems

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Cloud Computing CS

File-System Structure

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

File Systems 1. File Systems

File Systems 1. File Systems

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Chapter 11: Implementing File-Systems

GFS: The Google File System

File Systems. What do we need to know?

COMP520-12C Final Report. NomadFS A block migrating distributed file system

Transcription:

Chapter 8: Distributed File Systems Summary Introduction File system persistent storage Distributed file system persistent storage information sharing similar (in some case better) performance and reliability Various kinds of storage systems Storage systems and their properties Sharing Persistence Distributed Consistency cache/replicas maintenance Example Main memory 1 RAM File system 1 UNIX file system Distributed file system Sun NFS Web Web server Distributed shared memory Ivy (Ch. 16) Remote objects (RMI/ORB) 1 CORBA Persistent object store 1 CORBA Persistent Object Service Persistent distributed object store PerDiS, Khazana Characteristics of file systems Responsibilities Organization, storage, retrieval, naming, sharing and protection Important concepts related to file File Include data and attributes Directory A special file that provides a mapping from text names to internal file identifiers Metadata Extra management information; including attribute, directory etc File system architecture File system operations Applications access via library procedures File attributed record structure File system modules File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner Directory module: relates file names to file IDs File module: relates file IDs to particular files Access control module: checks permission for operation requested File access module: reads or writes file data or attributes File type Access control list Block module: Device module: accesses and allocates disk blocks disk I/O and buffering 1

Unix file system operations Distributed file system requirements filedes = open(name, mode) filedes = creat(name, mode) status = close(filedes) count = read(filedes, buffer, n) count = write(filedes, buffer, n) pos = lseek(filedes, offset, whence) status = unlink(name) status = link(name1, name2) status = stat(name, buffer) Opens an existing file with the given name. Creates a new file with the given name. Both operations deliver a file descriptor referencing the open file. The mode is read, write or both. Closes the open file filedes. Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the read-write pointer. Moves the read-write pointer to offset (relative or absolute, depending on whence). Removes the file name from the directory structure. If the file has no other names, it is deleted. Adds a new name (name2) for a file (name1). Gets the file attributes for file name into buffer. Transparency access transparency location transparency mobility transparency performance transparency scaling transparency Concurrent file updates concurrency control File replication better performance & fault tolerance Hardware and operating system heterogeneity Distributed file system requirements continuted Fault tolerance To cope with the transient communication failures, the design can b based the on at-most-once invocation semantics. idempotent operations: support at-least-semantics stateless server: restart from crash without recovery Consistency One-copy update semantics Security Authenticate, access control, secure channel Efficiency comparable with, or better than local file systems in performance and reliability Case studies SUN NFS First file service that was designed as a product [1984] Adopted as a internet standard Supported by almost platforms, e.g. Windows NT, Unix Andrew File System Campus information sharing system in CMU [1986] 800 workstations and 40 servers at CMU [1991] What is important in DS: What measurements were taken for transparency? What measurements were taken for performance? What measurements were taken for fault tolerant? What measurements were taken for concurrent operations? Chapter 8: Distributed File Systems Summary 2

Three components of a file service File service architecture Flat file service Operate on the contents of files Unique file identifier (UFID) Directory service Provide a mapping between text names to UFIDs Client module Support applications accessing remote file service transparently E.g. iterative request to directory service, cache files File service architecture Application Client computer Application Client module Server computer Directory service Flat file service Flat file service interface Flat file service operations Comparison with Unix No open and close Read or write specifying a starting point Fault tolerance reasons for the differences Repeatable operations except for create, all operations are idempotent Stateless servers E.g. without pointer when operate on files restart after crash without recovery The fault occurs more often in the distributed system than local one. Flat file service operations Read(FileId, i, n) -> Data If 1 i Length(File): Reads a sequence of up to n items throws BadPosition from a file starting at item i and returns it in Data. Write(FileId, i, Data) If 1 i Length(File)+1: Writes a sequence of Data to a throws BadPosition file, starting at item i, extending the file if necessary. Create() -> FileId Creates a new file of length 0 and delivers a UFID for it. Delete(FileId) Removes the file from the file store. GetAttributes(FileId) -> Attr Returns the file attributes for the file. SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not shaded in ). Access control in DFS Unix File System User s access rights are checked against the access mode requested in the open call Stateless DFS DFS s interface is opened to public File server can t retain the user ID Two approaches for access control 1. authenticate based on capability 2. attach user ID on each request Kerberos in AFS and NFS Directory service interface Main task Translate text names to UFIDs Lookup(Dir, Name) -> FileId Throws NotFound AddName(Dir, Name, File) Throws NameDuplicate UnName(Dir, Name) Throws NotFound GetNames(Dir, Pattern) -> NameSeq Locates the text name in the directory and returns the relevant UFID. If Name is not in the directory, throws an exception. If Name is not in the directory, adds (Name, File) to the directory and updates the file s attribute record. If Name is already in the directory: throws an exception. If Name is in the directory, the entry containing Name is removed from the directory. If Name is not in the directory: throws an exception. Returns all the text names in the directory that match the regular expression Pattern. 3

Hierarchic file system Directory tree Each directory is a special file holds the names of the files and other directories that are accessible from it Pathname Reference a file or a directory Multi-part name, e.g. /etc/rc.d/init.d/nfsd Explore in the tree Translate pathname via multiple lookup operations Directory cache at the client Chapter 8: Distributed File Systems Summary NFS architecture Virtual file system UNIX system calls Application Client computer Local UNIX file system Application Virtual file system Other file system NFS client Remote NFS protocol RPC NFS server Server computer Virtual file system UNIX file system Keep track of local and remote file system V-node For local file: refer to an i-node For remote file: refer to a file handle File handle The file identifier used in NFS File handles are passed between client and server to refer to a file i-node number a unique number within a filesystem i-node generation number, a number that changes each time an i-node is reused for a different file). Filesystem identifier i-node number of file i-node generation number File handle Design points NFS server operations (simplified) - 1 Access control and authentication User ID is attached to each request Kerberos embedded in NFS Authenticate user when mount Ticket, Authenticator and secure channel Client integration into the kernel No recompilation Single client module serving all user-level processes Retain encryption key used to authenticate user ID passed to the server NFS Server interface Defined in RFC 1813 lookup(dirfh, name) -> fh, attr create(dirfh, name, attr) -> newfh, attr remove(dirfh, name) status getattr(fh) -> attr Returns file handle and attributes for the file name in the directory dirfh. Creates a new file name in directory dirfh with attributes attr and returns the new file handle and attributes. Removes file name from directory dirfh. Returns file attributes of file fh. (Similar to the UNIX stat system call.) Sets the attributes (mode, user id, group id, size, access time and setattr(fh, attr) -> attr modify time of a file). Setting the size to 0 truncates the file. read(fh, offset, count) -> attr, data Returns up to count bytes of data from a file starting at offset. Also returns the latest attributes of the file. write(fh, offset, count, data) -> attr Writes count bytes of data to a file starting at offset. Returns the attributes of the file after the write has taken place. rename(dirfh, name, todirfh, toname) Changes the name of file name in directory dirfh to toname in -> status directory to todirfh. link(newdirfh, newname, dirfh, name) Creates an entry newname in the directory newdirfh which refers to -> status file name in the directory dirfh. 4

Mount service File server NFS Server export a well known file name: /etc/exports--contains the names of local filesystems that are available for remote mounting Client When client want to access a remote file, it uses mount protocol(rpc) to make it available. include location, pathname of the remote directory Example Local and remote file systems accessible on an NFS client Server 1 Client Server 2 (root) (root) (root) export... vmunix usr nfs Remote Remote people mount students x staff mount big jon bob... users jim ann jane joe Path name translation Path name translation When use open, create, status, it is needed to translate the pathname to file handle Multi-part pathname translation In NFS name can not translate at a server Client issues several separated lookup requests to server Directory cache Cache the results of translation conducted recently File cache at the server Buffer cache in UNIX file system read-ahead, one of cache replace strategy delay-write, write to disk only when the page is going to be replaced. (for performance) sync periodically, e.g. 30 seconds, just in case the page not change for a long time. (for tolerant) Cache reading in NFS server Similar to local file system Cache writing in NFS server: enhance reliability(communication between client and server) write-through :write to disk first, then response the user. (atleast-once, for fault tolerant) commit operation:write when close is issued. (for performance) File cache at the client Cache file blocks at the client Maintain coherence Client polls server to validate the blocks when using the blocks(read or write,only the inquire message transferred) Validity condition Two timestamp attached to each block in the cache T c : the time when the cache entry was last validated Tm : the time when the block was last modified at the server Valid: (T-T c <t) (Tm client =Tm server ) t is a compromise between consistency and efficiency E.g. 3-30s File cache at the client---continue Several measures are used to reduce the traffic of getattr call to the server. Received a new value of Tm, applied it to all cache entries derived from the relevant file. The current attribute values are sent piggybacked with the result of every operation on a file, and if the value of Tm has changed the client uses it to update the cache entries relating to the file. The adaptive algorithm for setting g\freshness interval t outlined above reduces the traffic considerably for most files. 5

File cache at the client continued Write cache flush result to server when file is closed conduct sync periodically Cache semantics Not guarantee the Unix file consistency NFS Summary Access transparency location transparency Mobility transparency Must remount a filesystems when it migrates Scalability limited performance for hot-spot files File replication not support file replication with updates NFS Summary Hardware and operating system heterogeneity Fault tolerance stateless server, idempotent operations,write through Consistency Security be enhanced by kerberos Efficiency Client cache Chapter 8: Distributed File Systems Summary The Andrew File System Motivation of AFS Information sharing Share information among large number of users Scalability Large number of users Large amount of files Large number users accessing the hot files Unusual designs Whole file serving Whole file caching 6

Observation of typical Unix file system Files are small Most files are less than 10k in size Read are 6 times more than write Sequential access is common, random access is rare Most files are accessed by only one user Most shared files are modified by one user Recently used file is highly probable be used again Typical scenario of using AFS Client open an remote file Store the file copy in the client computer Client read(write) on the local copy When client close the file If the file was updated, flush it to the server The base of AFS design File majority Infrequently updated accessed by a single user Working set 100 megabytes Not support database files Implementation System architecture Name space Unix kernel modified BSD Intercept / forward relevant system calls to Venus Distribution of processes in the Andrew File System File name space seen by clients of AFS Workstations Servers Local Shared Venus User Vice / (root) User Venus Network Vice tmp bin... vmunix cmu bin Venus User Symbolic links 7

System call interception in AFS User UNIX file system calls Local disk Workstation UNIX file system Non-local file operations Venus Unix kernel intercept the open, close and other file system calls referring to shared name space and pass them to the Venus. One of the file partition on local disk is used as cache for shared space. The questions in implementation How does AFS gain the control when an open or close system call referring to a file in the shared field space is issued by a client? How is the server holding the required file located? What space is allocated for cached files in workstation? How does AFS ensure that cached copies are up to date when files may be updated by several clients? Implementation continued The main components of the Vice service interface Venus Volume Access files by fids number step-by-step lookup Translate pathname to fids File cache 32 bits 32 bits 32 bits File handle Uniqui fier One file partition for cache: Accommodate several hundred average-size files Maintain cache coherence: callback mechanism Vice Accept requests in terms of fids Flat file service Fetch(fid) -> attr, data Store(fid, attr, data) Create() -> fid Remove(fid) SetLock(fid, mode) ReleaseLock(fid) RemoveCallback(fid) BreakCallback(fid) Returns the attributes (status) and, optionally, the contents of file identified by the fid and records a callback promise on it. Updates the attributes and (optionally) the contents of a specified file. Creates a new file and records a callback promise on it. Deletes the specified file. Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes. Unlocks the specified file or directory. Informs server that a Venus process has flushed a file from its cache. This call is made by a Vice server to a Venus process. It cancels the callback promise on the relevant file. Cache coherence Call back promise a token issued by Vice that is the custodian of the file. Valid or cancelled When a server perform a request to update a file, it notifies all of the Venus to set call back token as cancelled. Open a file Venus fetch the file when without the file or the file is cancelled Vice remember each cached file s location Close a file Venus flushes the file when application updates the file Vice executes the updates on the file in a sequential order Vice informs all caches of the file to be cancelled Cache coherence continuted Validate a file Venus validates the file when client restart or not receive a callback for a time T Scalability Client-server interaction is reduced dramatically Due to most files are read-only, callback reduces client-server interaction dramatically in contrast to client polling 8

Update semantics Approximation to one-copy file semantics AFS-1 (F File,S Server) After a successful open: latest(f,s) The current value of F at C is same as the value at S. After a failed open,close: failure(s) Operation has not been performed at S, Failure can detected by C After a successful close: updated(f,s) C s value has been successfully propagated to S. Update semantics AFS-2: weaker open semantics A client may open an old copy of a file after it has been updated by another client, which occurs if a callback message is lost. After a successful open latest(f, S, 0) The current value of F at C is same as the value at S or (lostcallback(s,t) a callback message has been lost at some time during the last T seconds. and incache(f) the F was in cache at C before the operation was attempted. And latest(f, S, T))--Last(F,S,T): the copy of F seen by C is no more than T seconds out of date. Other aspects continued Threads Multi-threads in Venus and Vice Read-only replicas Bulk transfers 64k chunk to be transferred to minimize the latency Partial file caching modifications Fully replicated map of volumes to servers Location database Other aspects continued A cell is an independently administered site running AFS. A machine can only belong to one cell at a time. Users also belong to a cell in the sense of having an account in it, but unlike machines can belong to (have an account in) multiple cells. /usr/vice/etc/celldb /usr/afs/etc/celldb Location database in AFS Map of upper volumes volume cell a i1 Map of upper volumes volume i2 volume i3 volume j1 client volume j2 volume j3 cell b Other aspects continued Performance AFS benchmark Server load of AFS and NFS under the same benchmark 40% / 100% Wide-area support Transarc Corp. 96%-98% cache hit 9

Chapter 8: Distributed File Systems Summary NFS enhancements Sprite: one-copy update semantics Multiple readers operate on cached copies One writer, multiple readers operate on the same server copy NONFS: more precise consistency Maintain cache consistency by lease WebNFS: access NFS server via Web NFS version 4: wide-area networks application Improvements in storage organization Redundant Arrays of Inexpensive Disks (RAID) Segmented into fixed-size chunks Stored in stripes across several disks Redundant error-correcting codes Log-structured file storage (LFS) Accumulate a set of tiny writes in memory Commit the accumulated writes in large, continuous, fixed sized segments New design approaches xfs (Serverless File System) Separate file server management task from processing task Distribute file server processing responsibility Software RAID Cooperative cache Frangipani Separate persistent storage responsibility from other service actions Petal: virtual disk abstraction across many disks located on multiple servers Log-structured data store Xfs architecture Chapter 8: Distributed File Systems Summary 10

Summary Key design issues for DFS Effective use of client cache Cache coherence Recovery after client or server failure High throughput for reading and writing Scalability NFS Stateless, efficient, poor scalability AFS high scalability DFS state-of-the-art Support application in wide area network and pervasive computing Important issue: What measurements were taken for transparency? Virtual file system, integrated client What measurements were taken for performance? Whole file transfer, whole file cache, callback, What measurements were taken for fault tolerant? Stateless,no open and close, What measurements were taken for concurrent operations? Validate, callback ~End~ 11