Stackable Layers: An Object-Oriented Approach to. Distributed File System Architecture. Department of Computer Science

Similar documents
Appeared in the Proceedings of the Summer USENIX Conference, June 1991, pages 17-29

Appeared in the Proceedings of the Summer USENIX Conference, Anaheim, CA, June 1990, pages 63-71

Published in the Proceedings of USENIX COOTS '97 1. Frigate: An Object-Oriented File System for Ordinary Users. Ted H. Kim Gerald J.

FLEX: A Tool for Building Ecient and Flexible Systems. John B. Carter, Bryan Ford, Mike Hibler, Ravindra Kuramkote,

Machine-Independent Virtual Memory Management for Paged June Uniprocessor 1st, 2010and Multiproce 1 / 15

Transparent Access to Legacy Data in Java. Olivier Gruber. IBM Almaden Research Center. San Jose, CA Abstract

Performance of Cache Coherence in Stackable Filing

Module 17: Distributed-File Systems

Virtual Swap Space in SunOS

File-System Structure

Chapter 11: Implementing File-Systems

MICROKERNELS: MACH AND L4

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations

University of California. Stackable Layers: An Architecture for File System Development

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

2 Application Support via Proxies Onion Routing can be used with applications that are proxy-aware, as well as several non-proxy-aware applications, w

Extensible Realm Interfaces 1

Distributed File Systems. Distributed Systems IT332

and easily tailor it for use within the multicast system. [9] J. Purtilo, C. Hofmeister. Dynamic Reconguration of Distributed Programs.

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Chapter 11: File System Implementation

Chapter 11: File System Implementation

Implementing remote fork() with checkpoint/restart

Module 17: Distributed-File Systems

Chapter 11: Implementing File Systems

S3FS in the wide area Alvaro Llanos E M.Eng Candidate Cornell University Spring/2009

Distributed Computing Environment (DCE)

Notes on the Implementation of a Remote Fork Mechanism

Chapter 11: Implementing File Systems

Chapter 12 File-System Implementation

Coping with Conflicts in an Optimistically Replicated File System

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

Category: Informational October 1996

DISTRIBUTED FILE SYSTEMS & NFS

The modularity requirement

tmpfs: A Virtual Memory File System

Application. CoCheck Overlay Library. MPE Library Checkpointing Library. OS Library. Operating System

Dewayne E. Perry. Abstract. An important ingredient in meeting today's market demands

PerlDSM: A Distributed Shared Memory System for Perl

OPERATING SYSTEM. Chapter 12: File System Implementation

Extending The Mach External Pager Interface To Accommodate User-Level Page Replacement Policies. Dylan McNamee and Katherine Armstrong

Appendix 4-B. Case study of the Network File System (NFS)

Filesystems Lecture 13

Managing Agent Platforms with AgentSNMP

Distributed File Systems

Distributed Systems. Definitions. Why Build Distributed Systems? Operating Systems - Overview. Operating Systems - Overview

Distributed File Systems. CS432: Distributed Systems Spring 2017

Opal. Robert Grimm New York University

In his paper of 1972, Parnas proposed the following problem [42]:

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec.

Mario Tokoro 3. Keio University Hiyoshi, Yokohama 223 JAPAN. Abstract

Parametric and provisioning approaches are again obviously useful for various

Advanced Database Applications. Object Oriented Database Management Chapter 13 10/29/2016. Object DBMSs

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream

Class Inheritance and OLE Integration (Formerly the Common Object Model)

Frequently asked questions from the previous class survey

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

Khoral Research, Inc. Khoros is a powerful, integrated system which allows users to perform a variety

CHAPTER 1 Fundamentals of Distributed System. Issues in designing Distributed System

Overview of Unix / Linux operating systems

Improving I/O Bandwidth With Cray DVS Client-Side Caching

A Stackable File System Interface For Linux

Automatic Code Generation for Non-Functional Aspects in the CORBALC Component Model

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

How do modules communicate? Enforcing modularity. Modularity: client-server organization. Tradeoffs of enforcing modularity

A Report on RMI and RPC Submitted by Sudharshan Reddy B

2 Background: Service Oriented Network Architectures

殷亚凤. Processes. Distributed Systems [3]

Software Component Relationships. Stephen H. Edwards. Department of Computer Science. Virginia Polytechnic Institute and State University

Technical Briefing. The TAOS Operating System: An introduction. October 1994

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

PATTERNS AND SOFTWARE DESIGN

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

An Introduction to GPFS

Operating Systems: Internals and Design Principles. Chapter 2 Operating System Overview Seventh Edition By William Stallings

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Weblogs In Libraries: Opportunities and Challenges

srfs kernel module Nir Tzachar September 25, 2003

Lightweight Remote Procedure Call. Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat

Chapter 12 Distributed File Systems. Copyright 2015 Prof. Amr El-Kadi

Networking Performance for Microkernels. Chris Maeda. Carnegie Mellon University. Pittsburgh, PA March 17, 1992

Chapter 1: Distributed Information Systems

Building a Single Distributed File System from Many NFS Servers -or- The Poor-Man s Cluster Server

Chapter 11: Implementing File Systems

File-System Interface

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

An Analysis of Trace Data for Predictive File Caching in Mobile Computing

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

Operating Systems 2010/2011

A Revocable Backup System

Algorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4

Chapter 12: File System Implementation

File Concept Access Methods Directory and Disk Structure File-System Mounting File Sharing Protection

Client Server & Distributed System. A Basic Introduction

Computer System Overview

Transcription:

Stackable Layers: An Object-Oriented Approach to Distributed File System Architecture Thomas W. Page Jr., Gerald J. Popek y, Richard G. Guy Department of Computer Science University of California Los Angeles 1 Introduction Operating systems and their ling environments are traditionally implemented as large, monolithic pieces of software. The lack of a clean structure renders it expensive to add features to an operating system, and dicult to integrate independently added functionality without extensive reimplementation. It is increasingly recognized that there are great potential benets to be had from structuring an operating system so that services can be easily added by independent third parties. The Mach approach [1] is a ne example of the application of that philosophy to produce an abstraction at the virtual memory and process layer that permits multiple and independent operating system designs and implementations to be provided on top. The Unix System V streams design provides an environment and a set of interfaces by which network and device protocols may be added to an operating system at run time, stacking new layers on existing ones [8]. The ling service is a key component of most operating systems. The denition and support of well-dened internal interfaces in this area would allow a variety of services, perhaps developed by dierent groups or vendors, to be introduced to many systems without re-implementation of the rest of the le system. The ability to snap such components together depends on an architecture which provides eective interfaces, both in terms of assuring that necessary characteristics are present, and doing so in a manner that allows superior quality implementations without forcing signicant compromises. Clearly, these requirements call out for an object-oriented approach. Our group (and independent work by Rosenthal [9]) has proposed a stackable layers architecture for distributed le systems that we believe achieves these goals [7, 4, 3]. We have evaluated the stackable layers approach by designing, building, and using several new layers for the Unix le system. Most signicantly, we have used the architecture to construct a replicated le system, a key component in a general distributed ling environment. This work is sponsored by DARPA under contract number F29601-87-C-0072. y This author is also associated with Locus Computing Corporation 1

2 The Stackable Architecture In most modern implementations of Unix, the rest of the operating system kernel interacts with the le system via the virtual node (vnode) interface [5, 2, 6]. In a manner analogous to object-oriented programming, vnodes are an abstract representation of a le object which has a private data area and a public interface (vnode operations). Each vnode has a vector of pointers to the functions which implement the vnode operations resulting in a form of operator overloading; dierent types of vnodes may point to a dierent vector of operators. A major motivation for the vnode architecture was to permit multiple implementations of the le system without modications to the rest of the kernel. So long as a new implementation adheres to the vnode interface, it may be substituted transparently to the rest of the kernel. Stackable layers are a generalization of vnodes. If the kernel interacts with the le system through a well-dened interface, why not slip in a layer between the le system and the rest of the kernel? If this \slipped in" layer supports the same interface both above and below, it may be inserted completely transparently to the kernel and to the underlying original le system. Further, if one such layer can be inserted transparently, so can many, creating a stack (or tree) of layers. Each layer in the stack implements a value-added service. The base layer (in our case, the Unix le system (UFS)) provides the abstraction of a le in a hierarchical name space with operations dened by the vnode interface (read, write, open, lookup, etc.). The next layer up might provide secure storage, encrypting data as it passes down and decrypting it as it passes back up. On top of that might be the replication layer which looks from above like a single le, but is represented by multiple physical replicas at the layer below. The top layer might support caching, looking in a main memory le for a copy of a requested page and returning it if it is found, otherwise forwarding the request down to the layer below. Thus the stack may be viewed as a specialization/generalization hierarchy: the base layer provides the abstraction of Unix les; the next layer up provides secure Unix les; the next layer provides replicated, secure Unix les; the top layer supports cached, replicated, secure Unix les. None of these layers knows or cares whether the layer below is the base le system or some other value added layer. All each layer has is a pointer to one (or more in the case of replication and caching) lower layers. For example, a stack could be congured without the secure le layer and all other layers would perform as before. Each layer must support the entire vnode interface. That is, for each of the vnode operations, the layer may implement the operation directly (overriding the implementation in lower layers), or delegate it to the layers beneath (inheritance). In the course of implementing the operation, a layer may call the same or a dierent operation on its underpinnings. Layers may, in addition, extend the interface providing additional vnode calls. 2

2.1 Stackable Layers in a Distributed System In order to support distributed and multiprocessor le systems, it should be possible for adjacent layers in a stack to reside in dierent address spaces, even on dierent sites. What is needed is a mechanism which maps calls from one layer, across address space boundaries or a communications network, to the next layer down. This mapping must be done absolutely transparently as no layer should have to be concerned about where the layers above or below are located. Following the stackable architecture philosophy, we propose a transport layer which accepts the same interface both above and below, mapping calls transparently across a communications channel to the next layer; essentially an RPC mechanism with a layer interface. A transport layer can then be inserted as desired between any pair of layers. Sun's Network File System[10], provides an approximation of such a transport facility. An NFS vnode on a client site, when operated on, performs corresponding operations on a UFS vnode on a remote site. NFS, however, is not a transparent mechanism; it was not intended to be used in this context, but rather to implement access to stateless le servers. As such, it does not obediently forward all operations, instead altering some and dropping others altogether. Using NFS as a starting point, we have implemented a transparent transport layer and used it in the Ficus replicated le system. 2.2 Analogy to Object-Oriented Programming There clearly is a close relationship between the stackable layers architecture and object-oriented programming with inheritance, but the use of object-oriented terminology in its description tends to obscure some dierences. For example, each layer in the stack is mounted on the layers beneath; it is not quite correct to say that a layer \inherits" from the layer below because the upper layer must explicitly pass through (delegate to the lower layer) any function which it does not override. Similarly, it may be confusing to think of these lower layers as having a parent-child relationship as the fan-out (one-to-many relationship) is mostly in the other direction (more like multiple inheritance). However, fan-out in the conventional direction is also conceivable (for example when a layer provides more than one view to layers above). Still, the motivations for, and the eects of the stackable layers architecture are very similar to object-oriented structuring. It provides a methodology for easy extensibility as new layers can be written and debugged without considering the internals of other layers. Similarly, it provides a mechanism for code re-use as older functionality does not have to be continually reimplemented in the context of new additions. It is an architecture with which software vendors potentially can supply \shrink wrapped" software modules which can be dynamically congured into existing operating system kernels. 3

3 Experience With Ficus The Ficus replicated le system is operational at UCLA. It is constructed in two layers: a logical layer provides the abstraction of a single copy highly available le and stacks on top of multiple branches, each representing one le replica; a physical layer is associated with each replica and handles the extra information that must be stored with each le to provide concurrency control and recovery from network partitions. Our experience with building the replication functionality in the layered architecture has been extremely positive. Because of our ability to leverage the existing UFS and NFS aorded by the approach, we were able to implement the replication layers in approximately 1:5 man years. A signicant boon to the implementation as been the ability to debug the new layers outside of the operating system kernel. Because the transport layer can map between address spaces on dierent machines, it can equally well map to user space. By simply exposing the vnode interface outside the kernel (via a set of system calls) we are able to test new layers outside the kernel using window-based debugging tools unavailable for in-kernel debugging. Once working in this manner, a layer may be easily moved inside the kernel as its interfaces above and below are identical in both contexts. We are in the process of constructing a measurement layer which may be congured transparently between any pair of adjacent layers. The measurement layer forwards all operations unmodied to the layer below, but allows the selective gathering of performance statistics on the trac between the layers. We look forward to tremendous exibility and ease of data gathering as future layers and versions of the interface are debugged and tuned for performance. 4 Support for User Available Object-Oriented Services It is clear that as systems become increasingly interconnected, the need to exchange data from heterogeneous sources: machine types, operating system types, and even dierent applications, will only continue to increase. Object-oriented services, with \self describing" type structures, have the potential to be very helpful by enabling solutions to the problems that arise in these situations. The operating system environment presents issues that are somewhat dierent from objectoriented programming languages, however. In the distributed environment, where object techniques can be so valuable, it will be necessary if such a system is to be eective that compatible solutions be available on a variety of dierent system types. Thus, in addition to basic functionality, a standard \wire protocol" interface will be needed. Implementations need to be provided in a highly modular manner, so that a given implementation can be easily ported to other environments, with minimal rebuilding. Such ports are probably the best way to assure cross system compatibility. We have thus far discussed the layered architecture as a methodology for adding features to the le environment provided by the operating system kernel. The stackable ling architecture 4

described here may be an attractive way to supply object-oriented serves to the users themselves. An object-oriented layer which could be stacked onto a ling system, perhaps into any system that supports NFS and its internal VFS interface, could provide a useful and widely used set of facilities. Of course, such an approach is not by itself a full solution. Changes elsewhere in the set of operating system services and utilities are needed. As just one example, a basic service so simple as le copy should be enhanced so that it can examine the object denitions of its arguments and take appropriate actions (such as updating companion les). Otherwise, each implementor of a new object type potentially would have to replace the copy function. In todays' systems, there usually isn't even a guaranteed way to take that step. Nevertheless, it appears that much of the work of providing an object-oriented layer in a typical operating system can be modularized as a le system addition. This approach is being pursued at UCLA. 5 Summary We have presented a layered architecture for distributed le systems which has been employed successfully in the implementation of the Ficus Replicated File System at UCLA. While the architecture is neither an object-oriented operating system nor supports an object-oriented ling environment, we believe it is an important step along the way. The layered approach supports code reuse, modular kernel extensibility, typed les, information hiding, specialization/generalization hierarchies, operator overloading, and delegation in ways very much analogous to object-oriented systems. References [1] Mike Accetta, Robert Baron, David Golub, Richard Rashid, Avadis Tevanian, and Michael Young. Mach: A new kernel foundation for UNIX development. In USENIX Conference Proceedings, pages 93{113. USENIX, June 1986. [2] AT&T. Design of the virtual le system features for UNIX System V Release 4. Internal memorandum, January 1990. [3] Richard G. Guy. Ficus: A Very Large Scale Reliable Distributed File System. Ph.D. dissertation, University of California, Los Angeles, 1990. In preparation. [4] Richard G. Guy, John S. Heidemann, Wai Mak, Thomas W. Page, Jr., Gerald J. Popek, and Dieter Rothmeier. Implementation of the Ficus replicated le system. In USENIX Conference Proceedings, pages 63{71. USENIX, June 1990. [5] S. R. Kleiman. Vnodes: An architecture for multiple le system types in Sun UNIX. In USENIX Conference Proceedings, pages 238{247. USENIX, June 1986. 5

[6] Alan Langerman, Joseph Boykin, Susan LoVerso, and Sashi Mangalat. A highly-parallelized Mach-based vnode lesystem. In USENIX Conference Proceedings, pages 297{312. USENIX, January 1990. [7] Thomas W. Page, Jr., Gerald J. Popek, Richard G. Guy, and John S. Heidemann. The Ficus distributed le system: Replication via stackable layers. Technical Report CSD-900009, University of California, Los Angeles, April 1990. [8] D. M. Ritchie. A stream input-output system. AT&T Bell Laboratories Technical Journal, 63(8):1897{1910, October 1984. [9] David S. H. Rosenthal. Evolving the vnode interface. In USENIX Conference Proceedings. USENIX, June 1990. [10] Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon. Design and implementation of the Sun Network File System. In USENIX Conference Proceedings, pages 119{130. USENIX, June 1985. 6