DFS Case Studies, Part 2. The Andrew File System (from CMU)

DFS Case Studies, Part 2 The Andrew File System (from CMU)

Case Study Andrew File System Designed to support information sharing on a large scale by minimizing client server communications Makes heavy use of caching technologies Adopted as the basis for the DCE/DFS file system in the Open Software Foundation's Distributed Computing Environment (DCE)

AFS Characteristics Provides transparent access to remote shared files for UNIX programs (using the normal UNIX file primitives) Programs access to remote files without modification or recompilation AFS is compatible with NFS, in that files may be remotely accessed using NFS However, AFS differs markedly from NFS in its design and implementation

AFS Design Goals Scalability is the most important design goal for the AFS designers Designed to perform well with larger numbers of active users (when compared to NFS) Key strategy in achieving scalability is the caching of whole (complete) files in client nodes

AFS Design Characteristics Whole file serving the entire contents of directories and files are transmitted to client computers by AFS servers Whole file caching once a copy of a file (or a filechunk) has been transferred to a client computer, it is stored in a cache on the local disk; the cache is permanent, surviving reboots of the client computer and it is used to satisfy clients' open requests in preference to remote copies whenever possible

AFS Observations Shared files that are infrequently updated (such as UNIX commands and libraries) and files accessed solely by a single user account form the overwhelming majority of file accesses If the local cache is allocated a sufficiently large amount of storage space, the "working set" of files in regular use on a given workstation are normally retained in the cache until they are needed again AFS's design strategy is based on some assumptions about the average and maximum file size and locality of reference to files in the UNIX environment

AFS Assumptions Most files are small less than 10 k bytes (typically) Read operations are six times more likely than writes Sequential access is common, random access is rare Most files are read/written by a single user; even when a file is shared, typically only one of the sharers updates it Files are referenced in "bursts" there's a high probability that a recently accessed file will be used again in the near future

AFS Gotcha! There is one important type of file that does not fit in the design goals of AFS shared databases Databases are typically shared by many users and updated frequently The AFS designers explicitly excluded the provision of storage facilities for databases from the AFS design goals It is argued that the provision of facilities for distributed databases should be addressed separately

AFS Questions How does AFS gain control when an open or close system call referring to a file in the shared file space is issued by a client? How is the AFS server that's holding the required file actually located? What space is allocated for cached files in workstations? How does AFS ensure that the cached copies of files are up to date when files may be updated by several clients?

AFS Software Components AFS is implemented as two software components Vice and Venus Vice is the server component, and runs as a user level UNIX process within the server's process space Venus is the client component

Vice and Venus Workstations User Venus program UNIX kernel Servers Vice UNIX kernel User Venus program UNIX kernel Network Vice Venus User program UNIX kernel UNIX kernel

Dealing with File Accesses Files are either local or shared Local files are handled in the usual UNIX way (by UNIX) Shared files are stored on servers, and copies of them are cached on the local disks of workstations (as required) The AFS namespace is a standard UNIX hierarchy, with a specific sub tree (called cmu) containing all of the shared files

The AFS Namespace Local Shared / (root) tmp bin... vmunix cmu bin Symbolic links

Important Points The splitting of the file namespace into local and shared files leads to some loss of location transparency, but this is hardly noticeable to users other than system administrators Users' directories are always stored in the shared space, enabling users to access their file from any workstation on the network Each workstation's kernel within AFS is modified to intercept the file access system calls and pass them to Venus when they are non local accesses

System Call Integration Workstation User program UNIX file system calls Non local file operations Venus UNIX kernel UNIX file system Local disk

AFS Caching One of the file partitions on the local disk of each workstation is used as a cache, holding the cached copies of files from the shared space The workstation cache is usually large enough to accommodate several hundred average sized files This renders the workstation largely independent of the Vice servers once a working set of the current user's files and frequently used system files has been cached

AFS Volumes Files are grouped into volumes for ease of location and movement Each user's personal files are generally located in a separate volume Other volumes are allocated for system binaries, documentation and library code

Open/Read/Write/Close within AFS User process UNIX kernel Venus Net Vice open(filename, mode) read(filedescriptor, Buffer, length) write(filedescriptor, Buffer, length) close(filedescriptor) If FileName refers to a file in shared file space, pass the request to Venus. Open the local file and return the file descriptor to the application. Perform a normal UNIX read operation on the local copy. Perform a normal UNIX write operation on the local copy. Close the local copy and notify Venus that the file has been closed. Check list of files in local cache. If not present or there is no valid callback promise, send a request for the file to the Vice server that is custodian of the volume containing the file. Place the copy of the file in the local file system, enter its local name in the local cache list and return the local name to UNIX. If the local copy has been changed, send a copy to the Vice server that is the custodian of the file. Transfer a copy of the file and a callback promise to the workstation. Log the callback promise. Replace the file contents and send a callback to all other clients holding callback promises on the file.

Cache Consistency On the Vice server there's a "callback promise" process that guarantees that it will notify the Venus process whenever any client modifies a file When a server performs a request to update a file, it notifies all of the Venus processes to which it has issued callback promises by sending a callback to each A callback is a remote procedure call from a server to a Venus process

More Caching Whenever Venus handles an open on behalf of a client, it checks the cache If the required file is found in the cache, then its token is checked If its token value is canceled, then a fresh copy of the file must be fetched from the Vice server If its token value is valid, then the cached copy can be opened and used without reference to Vice

Why Callbacks? The callback based mechanism for maintaining cache consistency offers the most scalable approach It has been shown to dramatically reduce the number of client server interactions

The Vice Service Interface Fetch(fid) > attr, data Store(fid, attr, data) Create() > fid Remove(fid) SetLock(fid, mode) ReleaseLock(fid) RemoveCallback(fid) BreakCallback(fid) Returns the attributes (status) and, optionally, the contents of file identified by the fid and records a callback promise on it. Updates the attributes and (optionally) the contents of a specified file. Creates a new file and records a callback promise on it. Deletes the specified file. Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes. Unlocks the specified file or directory. Informs server that a Venus process has flushed a file from its cache. This call is made by a Vice server to a Venus process. It cancels the callback promise on the relevant file.

Update Semantics The goal of the cache consistency mechanism is to achieve the best approximation to one copy file semantics that is practicable without serious performance degradation It has been shown that the callback promise mechanism maintains a well known approximation to one copy semantics

AFS Updates AFS does not provide extra mechanisms for the control of concurrent updates When a file is closed, a copy of the file is returned to the server, replacing the current version All but the update resulting from the last "close" will be silently lost (with no error report given) Clients must implement concurrency control independently Despite this behaviour, AFS's update semantics are sufficiently close for the vast majority of existing UNIX programs to operate correctly

AFS Performance When measured, whole file caching and the callback protocol led to dramatically reduced loads on the servers Server loads of 40% were measured with 18 client nodes running a standard NFS benchmark, as opposed to a nearly 100% load using NFS with the same benchmark Transarc Corp. installed AFS on over 1000 servers at 150 sites the survey showed cache hit ratios in the range 96 98% for accesses to a sample of 32,000 file volumes holding 200 Gig of data

DFS Enhancements/Future Developments (For full details, refer to the textbook, pages 359ff) WebNFS allows access to NFS servers from the WWW, Java applets, etc. Spritely NFS based on Sprite OS, adds "open" and "close" to NFS NQNFS (not quite) adds caching and callbacks to NFS NFS version 4 in the advanced stages of development and deployment, and on the Internet standards track

DFS Summary Key Design Issues The effective use of client caching The maintenance of consistency (when files are updated) Recovery after client or server failures High Throughput Scalability

DFS Current State DFSes are very heavily employed in organizational computing, and their performance has been the subject of much tuning NFS is still the dominant DFS technology However, AFS outperforms NFS in many situations Current state of the art DFSes are highly scalable, provide good performance across both LANs and WANs, maintain one copy semantics, and tolerate/recover from failures