Chapter 12 Distributed File Systems Copyright 2015 Prof. Amr El-Kadi
Outline Introduction File Service Architecture Sun Network File System Recent Advances Copyright 2015 Prof. Amr El-Kadi 2
Introduction The sharing of stored information happens to be among the most important forms of resource sharing. We will introduce the design of a basic distributed file system that does not maintain persistent replicas of files, nor does it support the bandwidth and timing guarantees required for multimedia data streaming. A well-designed distributed file service provides access to distributed files with performance and reliability that is same or better than files stored on local disks in a transparent way. Copyright 2015 Prof. Amr El-Kadi 3
Characteristics of File Systems Files contain both data and attributes. File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner File type Access control list Copyright 2015 Prof. Amr El-Kadi 4
File system modules Directory module: File module: Access control module: File access module: Block module: Device module: relates file names to file IDs relates file IDs to particular files checks permission for operation requested reads or writes file data or attributes accesses and allocates disk blocks disk I/O and buffering Copyright 2015 Prof. Amr El-Kadi 5
UNIX file system operations Copyright 2015 Prof. Amr El-Kadi 6
Distributed File System Requirements Transparency A balance between flexibility and scalability against complexity and performance has to be achieved in the design. Current distributed file system address (fully or partially) the following forms of transparency: Access transparency Location transparency Mobility transparency Performance transparency Scaling transparency Copyright 2015 Prof. Amr El-Kadi 7
Concurrent File Updates Most current systems follow UNIX standards in providing advisory or mandatory file (or record) level locking. Other well-known techniques for concurrency control are costly to implement in the operating system. Copyright 2015 Prof. Amr El-Kadi 8
File Replication Replication is done to share load, to enhance fault-tolerance, and to enhance scalability. Most existing systems support caching which is a limited form of replication. Very few support full replication. Copyright 2015 Prof. Amr El-Kadi 9
Hardware and Operating Systems Heterogeneity This is a very important requirement to support openness. Fault Tolerance The service must continue to operate in the event of client or server failures. Copyright 2015 Prof. Amr El-Kadi 10
Consistency Most systems follow UNIX one-copy update semantics. With multiple replicas and caches, some deviation from one-copy semantics may result. Security All systems provide access control mechanisms based on access control lists. Copyright 2015 Prof. Amr El-Kadi 11
Efficiency It should provide comparable performance to that of conventional file systems, if not better. Copyright 2015 Prof. Amr El-Kadi 12
File Service Architecture Client computer Application program Application program Directory service Flat file service Client module Copyright 2015 Prof. Amr El-Kadi 13
The design is open in the sense that different client modules may implement different operating systems semantics and provide optimizations. Flat File Service Unique File Identifiers (UFIDs) are used to refer to files in all requests. Those are long sequences of bits chosen to ensure their uniqueness. Directory Service It provides a mapping from text-based names to their UFIDs. Copyright 2015 Prof. Amr El-Kadi 14
Client Module Provides the semantics of an operating system s file service (e.g., UNIX) as well as boosting up the performance by caching at the client side. Copyright 2015 Prof. Amr El-Kadi 15
Flat File Service Interface Not normally used by user-level programs. Copyright 2015 Prof. Amr El-Kadi 16
The interface provided above does not have open or close operations. It differs from UNIX file system interface mainly for fault-tolerance: Except for the create operation, all operations are idempotent. The interface is suitable to implement stateless servers. Copyright 2015 Prof. Amr El-Kadi 17
Access Control A user s identity is submitted with every request so that access checks are performed by the server for every operation. Still, this approach is open to forged user identities (solved with digital signatures). Copyright 2015 Prof. Amr El-Kadi 18
Other file system Sun Network File System Client computer Server computer UNIX system calls UNIX kernel Application program Application program Virtual file system UNIX kernel Virtual file system Local Remote UNIX file system NFS client NFS protocol NFS server UNIX file system Copyright 2015 Prof. Amr El-Kadi 19
The NFS client and server modules communicate via RPC. The server resides in the kernel of each computer to act as an NFS server. The Virtual File System (VFS) achieves the integration and transparency (local and remote). Copyright 2015 Prof. Amr El-Kadi 20
In NFS, a file identifier (file handle) is opaque to clients. It consists of a file system identifier, an i- node number of file, and an i-node generation number. NFS server is stateless, so for each request the server must check for access permission, such information is supplied automatically by the RPC system. Kerberos has been integrated with Sun NFS to provide user authentication and security. Copyright 2015 Prof. Amr El-Kadi 21
NFS server operations (simplified) 1 lookup(dirfh, name) -> fh, attr create(dirfh, name, attr) -> newfh, attr remove(dirfh, name) status getattr(fh) -> attr setattr(fh, attr) -> attr read(fh, offset, count) -> attr, data write(fh, offset, count, data) -> attr rename(dirfh, name, todirfh, toname) -> status Returns file handle and attributes for the file name in the directory dirfh. Creates a new file name in directory dirfh with attributes attr and returns the new file handle and attributes. Removes file name from directory dirfh. Returns file attributes of file fh. (Similar to the UNIX stat system call.) Sets the attributes (mode, user id, group id, size, access time and modify time of a file). Setting the size to 0 truncates the file. Returns up to count bytes of data from a file starting at offset. Also returns the latest attributes of the file. Writes count bytes of data to a file starting at offset. Returns the attributes of the file after the write has taken place. Changes the name of file name in directory dirfh to toname in directory to todirfh. link(newdirfh, newname, dirfh, name) Creates an entry newname in the directory newdirfh which refers to -> status file name in the directory dirfh. Copyright 2015 Prof. Amr El-Kadi 22
NFS server operations (simplified) 2 symlink(newdirfh, newname, string) -> status readlink(fh) -> string mkdir(dirfh, name, attr) -> newfh, attr rmdir(dirfh, name) -> status readdir(dirfh, cookie, count) -> entries statfs(fh) -> fsstats Creates an entry newname in the directory newdirfh of type symbolic link with the value string. The server does not interpret the string but makes a symbolic link file to hold it. Returns the string that is associated with the symbolic link file identified by fh. Creates a new directory name with attributes attr and returns the new file handle and attributes. Removes the empty directory name from the parent directory dirfh. Fails if the directory is not empty. Returns up to count bytes of directory entries from the directory dirfh. Each entry contains a file name, a file handle, and an opaque pointer to the next directory entry, called a cookie. The cookie is used in subsequent readdir calls to start reading from the following entry. If the value of cookie is 0, reads from the first entry in the directory. Returns file system information (such as block size, number of free blocks and so on) for the file system containing a file fh. Copyright 2015 Prof. Amr El-Kadi 23
On every NFS server, a mount service process is running to support the mounting of sub-trees by clients. The /etc/exports file. Clients use a modified mount command to mount remote file systems that uses a mount protocol to interact with the server. Files can be hard-mounted or soft-mounted. Copyright 2015 Prof. Amr El-Kadi 24
Server 1 (root) Client Server 2 (root) (root) export... vmunix usr nfs people Remote mount students x staff Remote mount users big jon bob... jim ann jane joe Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2. Copyright 2015 Prof. Amr El-Kadi 25
Pathname Translation The client translates Pathnames in an iterative manner using lookup requests. The results of each step are cached. Automounter When an empty mount point is referenced by the client, the automounter is used to mount the remote directory dynamically. The original implementation ran as a user-level process at each client machine. Current versions (called autofs) are implemented inside the kernels of Solaris and Linux. Copyright 2015 Prof. Amr El-Kadi 26
Server Caching In NFS v. 3, the write operation offers two options: write-through, or the write is done in a local cache until a commit operation is called. Copyright 2015 Prof. Amr El-Kadi 27
Client Caching The client caches results of the read, write, getattr, lookup, and readdir operations. To help solve the problem of multiple inconsistent copies of caches, a timestamp method is used that tags each data item in the cache with two timestamps: Tc is the time when the cache entry was last validated, Tm is the time when the block was last modified at the server. The validity condition is: (T Tc < t) OR (Tm client = Tm server ) Copyright 2015 Prof. Amr El-Kadi 28