Lecture 7: Distributed File Systems

Similar documents
DFS Case Studies, Part 1

Introduction. Distributed file system. File system modules. UNIX file system operations. File attribute record structure

Introduction. Chapter 8: Distributed File Systems

Chapter 12 Distributed File Systems. Copyright 2015 Prof. Amr El-Kadi

Chapter 8: Distributed File Systems. Introduction File Service Architecture Sun Network File System The Andrew File System Recent advances Summary

Distributed File Systems

Distributed File Systems. CS432: Distributed Systems Spring 2017

Chapter 8 Distributed File Systems

Ch. 7 Distributed File Systems

Distributed File Systems. File Systems

Distributed File Systems

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Module 7 File Systems & Replication CS755! 7-1!

Distributed File Systems I

DISTRIBUTED FILE SYSTEMS & NFS

Distributed File Systems. Distributed Computing Systems. Outline. Concepts of Distributed File System. Concurrent Updates. Transparency 2/12/2016

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

Introduction to Distributed Computing

Today: Distributed File Systems

Network File Systems

UNIT V Distributed File System

Today: Distributed File Systems!

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Today: Distributed File Systems. Naming and Transparency

Lecture 19. NFS: Big Picture. File Lookup. File Positioning. Stateful Approach. Version 4. NFS March 4, 2005


DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

Distributed File Systems

UNIT III. cache/replicas maintenance Main memory No No No 1 RAM File system n No Yes No 1 UNIX file system Distributed file

CSE 486/586: Distributed Systems

Operating Systems Design 16. Networking: Remote File Systems

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Distributed File Systems. Distributed Systems IT332

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

Filesystems Lecture 11

DFS Case Studies, Part 2. The Andrew File System (from CMU)

Distributed file systems

Advanced Operating Systems

Filesystems Lecture 13

Distributed Systems - III

What is a file system

Today s Objec2ves. AWS/MR Review Final Projects Distributed File Systems. Nov 3, 2017 Sprenkle - CSCI325

Chapter 11: File System Implementation

Chapter 11: File System Implementation

Chapter 11: Implementing File Systems

File-System Structure

Cloud Computing CS

Chapter 7: File-System

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

OPERATING SYSTEM. Chapter 12: File System Implementation

CS454/654 Midterm Exam Fall 2004

Service and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838

Distributed Systems Distributed Objects and Distributed File Systems

Distributed File Systems. Case Studies: Sprite Coda

Today: Distributed File Systems. File System Basics

Distributed File Systems. Directory Hierarchy. Transfer Model

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

File-System Interface

CS 416: Operating Systems Design April 22, 2015

Today: Distributed File Systems

Distributed Systems. Distributed File Systems. Paul Krzyzanowski

Chapter 18 Distributed Systems and Web Services

Chapter 11: Implementing File-Systems

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

Chapter 11: Implementing File Systems

Operating Systems 2010/2011

Distributed File Systems Part II. Distributed File System Implementation

Cloud Computing CS

Extreme computing Infrastructure

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Chapter 11: Implementing File Systems

Chapter 11: Implementing File

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Chapter 11: File-System Interface

Chapter 12 File-System Implementation

File Management. Information Structure 11/5/2013. Why Programmers Need Files

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Chapter 12: File System Implementation

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

Chapter 10: File System Implementation

Introduction to Cloud Computing

NFS. Don Porter CSE 506

Remote Procedure Call (RPC) and Transparency

File Concept Access Methods Directory and Disk Structure File-System Mounting File Sharing Protection

Distributed File Systems. Jonathan Walpole CSE515 Distributed Computing Systems

CLOUD-SCALE FILE SYSTEMS

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

COS 318: Operating Systems. Journaling, NFS and WAFL

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Distributed File Systems: Design Comparisons

Module 17: Distributed-File Systems

Transcription:

06-06798 Distributed Systems Lecture 7: Distributed File Systems 5 February, 2002 1

Overview Requirements for distributed file systems transparency, performance, fault-tolerance,... Design issues possible options, architectures file sharing, concurrent updates caching Example Sun NFS 5 February, 2002 2

Distributed Service client client client Communication System Time Service Name Service Compute Service client server File Service 5 February, 2002 3

Basic service Distributed file service persistent file storage of data and programs operations on files (create, open, read, ) multiple remote clients, within intranet file sharing typically one-copy update semantics, over RPC Many new developments persistent object stores (storage of objects) Persistent Java, Corba, replication, whole-file caching distributed multimedia (Tiger video file server) 5 February, 2002 4

Characteristics of file systems Operations on files (=data + attributes) create/delete query/modify attributes open/close read/write access control Storage organisation directory structure (hierarchical, pathnames) metadata (file management information) file attributes directory structure info, etc 5 February, 2002 5

File attributes File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner File type Access control list User controlled 5 February, 2002 6

Layered structure of file system Directory module: File module: Access control module: File access module: Block module: Device module: relates file names to file IDs relates file IDs to particular files checks permission for operation requested reads or writes file data or attributes accesses and allocates disk blocks disk I/O and buffering Concentrate on higher levels. 5 February, 2002 7

Distributed file system requirements Transparency (clients unaware of the distributed nature) access transparency (client unaware of distribution of files, same interface for local/remote files) location transparency (uniform file name space from any client workstation) mobility transparency (files can be moved from one server to another without affecting client) performance transparency (client performance not affected by load on service) scaling transparency (expansion possible if numbers of clients increase) 5 February, 2002 8

Distributed file system requirements Concurrent file updates (changes by one client do not affect another) File replication (for load sharing, fault-tolerance) Heterogeneity (interface platform-independent) Fault-tolerance (continues to operate in the face of client and server failures) Consistency (one-copy-update semantics or slight variations) Security (access control) Efficiency (performance comparable to conventional file systems) 5 February, 2002 9

File Service Design Options Stateful server holds information on open files, current position, file locks open before access, close after better performance - shorter message, read-ahead possible server failure - lose state client failure - tables fill up can provide file locks 5 February, 2002 10

File Service Design Options Stateless no state information held by server file operations idempotent, must contain all information needed (longer message) simpler file server design can recover easily from client or server crash locking requires extra lock server to hold state 5 February, 2002 11

File Service Architecture Text names to UFIDs Client computer Server computer Application program Application program Directory service RPC Flat file service Client module API: knows open files, positions... UFIDs opns on contents 5 February, 2002 12

File server architecture Components (for openness): Flat file service operations on file contents unique file identifiers (UFIDs) translates UFIDs to file locations Directory service mapping between text names to UFIDs Client module API for file access, one per client computer holds state: open files, positions knows network location of flat file & directory server 5 February, 2002 13

Flat file service RPC interface Used by client modules, not user programs FileId (UFID) uniquely identifies file invalid if file not present or inappropriate access Read/Write; Create/Delete; Get/SetAttributes No open/close! (unlike UNIX) access immediate with FileId Read/Write identify starting point Improved fault-tolerance operations idempotent except Create, can be repeated (atleast-once RPC semantics) stateless service 5 February, 2002 14

Flat file service operations Read(FileId, i, n) -> Data throws BadPosition Write(FileId, i, Data) throws BadPosition Create() -> FileId Delete(FileId) If 1 i Length(File): Reads a sequence of up to n items from a file starting at item i and returns it in Data. If 1 i Length(File)+1: Writes a sequence of Data to a file, starting at item i, extending the file if necessary. Creates a new file of length 0 and delivers a UFID for it. Removes the file from the file store. GetAttributes(FileId) -> Attr Returns the file attributes for the file. SetAttributes(FileId, Attr) Sets the file attributes (only owner, file type and access control list; rest maintained by flat file service). 5 February, 2002 15

In UNIX file system Access control access rights are checked against the access mode (read, write, execute) in open user identity checked at login time, cannot be tampered with In distributed systems access rights must be checked at server RPC unprotected forging identity possible, a security risk user id typically passed with every request (e.g. Sun NFS) stateless 5 February, 2002 16

Directory structure Hierarchical tree-like, pathnames from root (in UNIX) several names per file (link operation) Naming system implemented by client module, using directory service root has well-known UFID locate file following path from root 5 February, 2002 17

File names Text name (=directory pathname+file name) hostname:local name not mobility transparent uniform name structure (same name space for all clients) remote mount (e.g. Sun NFS) remote directory inserted into local directory relies on clients maintaining consistent naming conventions across all clients all clients must implement same local tree must mount remote directory into the same local directory 5 February, 2002 18

Remote mount Server 1 (root) Client Server 2 (root) (root) export... vmunix usr nfs people Remote mount students x staff Remote mount users big jon bob... jim ann jane joe Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2. 5 February, 2002 19

Directory Directory service conventional file (client of the flat file service) mapping from text names to UFIDs Operations require FileId, machine readable UFID as parameter locate file (LookUp) add/delete file (AddName/UnName) match file names to regular expression (GetNames) 5 February, 2002 20

Directory service operations Lookup(Dir, Name) -> FileId throws NotFound AddName(Dir, Name, File) throws NameDuplicate UnName(Dir, Name) throws NotFound GetNames(Dir, Pattern) -> NameSeq Locates the text name in the directory and returns the relevant UFID. If Name is not in the directory, throws an exception. If Name is not in the directory, adds (Name, File) to the directory and updates the file s attribute record. If Name is already in the directory: throws an exception. If Name is in the directory: the entry containing Name is removed from the directory. If Name is not in the directory: throws an exception. Returns all the text names in the directory that match the regular expression Pattern. 5 February, 2002 21

File sharing Multiple clients share the same file for read/write access. One-copy update semantics every read sees the effect of all previous writes a write is immediately visible to clients who have the file open for reading Problems! caching: maintaining consistency between several copies difficult to achieve serialise access by using file locks (affects performance) trade-off between consistency and performance 5 February, 2002 22

Example: Sun NFS (1985) Structure of flat file & client & directory service NFS protocol RPC based, OS independent (originally UNIX) NFS server stateless (no open/close) no locks or concurrency control no replication with updates Virtual file system, remote mount Access control (user id with each request) security loophole (modify RPC to impersonate user ) Client and server caching 5 February, 2002 23

NFS architecture Client computer Server computer UNIX system calls UNIX kernel Application program Application program Virtual file system UNIX kernel Virtual file system Local Remote UNIX file system Other file system NFS client NFS protocol NFS server UNIX file system RPC (UDP or TCP) 5 February, 2002 24

File identifier (FileId) Simple Solution i-node (number identifying file within file system) file migration requires finding and changing all FileIds UNIX reuses i-node numbers after file deleted (i-node gen. no) Server address Index IP address.socket NFS file handle Virtual file system uses i-node if local, file handle if remote. i-node number File handle File system identifier i-node no. i-node gener. no. 5 February, 2002 25

Caching in NFS Indispensable for performance Caching retains recently used data (file pages, directories, file attributes) in cache updates data in cache for speed block size typically 8kbytes Server caching cache in server memory (UNIX kernel) Client caching cache in client memory, local disk 5 February, 2002 26

Server caching Store data in server memory Read-ahead: anticipate which pages to read Delayed write update in cache; write to disk periodically (UNIX sync to synchronise cache) or when space needed which contents seen by users depends on timing Write through cache and write to disk (reliable, poor performance) Write on close write to disk only when commit received (fast but problems with files open for a long time) 5 February, 2002 27

Client caching Potential consistency problems! different versions, portions of files, since writes delayed clients poll server to check if copy still valid Timestamp method tag with latest time of validity check and modification time copy valid if time since last check less than freshness interval, or modification time on server the same choose freshness interval adaptively, 3-30 sec for files, 30-60 sec for directories for small freshness interval, potential heavy load on network 5 February, 2002 28

Reads Client caching perform validity check whenever cache entry used if not valid, request data from server several optimisations to reduce traffic recent updates not always visible (timing!) Writes when page modified, marked as dirty dirty pages flushed asynchronously, periodically (client s synch) and on close Not truly one-copy update semantics... 5 February, 2002 29

New developments AFS, Andrew file system (CMU 1986) whole-file serving (64kbytes), whole-file caching (on local client disk, 100s of recently used files) NFS protocol (precise one-copy update semantics) Spritely NFS: extend with open/close with state info held at server, add server callbacks to notify about cache entries WebNFS (Internet programs can interact with NFS server directly, bypassing mount) xfs (serverless network, file serving responsibility distributed across LAN) 5 February, 2002 30

File service Summary crucial to the running of a distributed system performance, consistency and easy recovery essential Design issues separate flat file service from directory service and client module stateless for performance and fault-tolerance caching for performance concurrent updates difficult with caching approximation of one-copy update semantics 5 February, 2002 31