A Distributed System with a Centralized Organization

Similar documents
Designing Issues For Distributed Computing System: An Empirical View

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Task Distribution in a Workstation Cluster with a Concurrent Network

Scheduling of Parallel Jobs on Dynamic, Heterogenous Networks

Distributed Operating System Shilpa Yadav; Tanushree & Yashika Arora

Resource and Service Trading in a Heterogeneous Large Distributed

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Job Management System Extension To Support SLAAC-1V Reconfigurable Hardware

An Introduction to the Amoeba Distributed Operating System Apan Qasem Computer Science Department Florida State University

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

An Efficient Live Process Migration Approach for High Performance Cluster Computing Systems

Client Server & Distributed System. A Basic Introduction

GRID SIMULATION FOR DYNAMIC LOAD BALANCING

Lecture 1: January 22

Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer

Making Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow

Distributed OS and Algorithms

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Scaling-Out with Oracle Grid Computing on Dell Hardware

Lecture 9: MIMD Architectures

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

CHAPTER-1: INTRODUCTION TO OPERATING SYSTEM:

PROXIMITY AWARE LOAD BALANCING FOR HETEROGENEOUS NODES Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh Kumar

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science.

Lecture 9: MIMD Architectures

Some popular Operating Systems include Linux Operating System, Windows Operating System, VMS, OS/400, AIX, z/os, etc.

Fundamentals. CHflPTCR WHAT IS A DISTRISUTCD COMPUTING SVSTCM?

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

Lecture 1: January 23

Lecture 9: MIMD Architecture

Load Balancing in Distributed System through Task Migration

Load Balancing by Allocation of User Login Sessions. Peter Smith and Paul Ashton TR COSC 05/92

Distributed and Operating Systems Spring Prashant Shenoy UMass Computer Science.

Distributed Systems LEEC (2006/07 2º Sem.)

IUT Job Cracker Design and Implementation of a Dynamic Job Scheduler for Distributed Computation

S i m p l i f y i n g A d m i n i s t r a t i o n a n d M a n a g e m e n t P r o c e s s e s i n t h e P o l i s h N a t i o n a l C l u s t e r

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Part I Overview Chapter 1: Introduction

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

Unit 2 : Computer and Operating System Structure

A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations

The MOSIX Scalable Cluster Computing for Linux. mosix.org

THE IMPACT OF E-COMMERCE ON DEVELOPING A COURSE IN OPERATING SYSTEMS: AN INTERPRETIVE STUDY

Operating Systems. studykorner.org

Next Generation Operating Systems Architecture

Distributed and Cloud Computing

Assignment 5. Georgia Koloniari

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

A Job Brokering Shell for Interactive Single System Image Support

Job Re-Packing for Enhancing the Performance of Gang Scheduling

(HT)Condor - Past and Future

Performance Impact of I/O on Sender-Initiated and Receiver-Initiated Load Sharing Policies in Distributed Systems

Chapter 14 Operating Systems

Chapter 14 Operating Systems

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Load Balancing with Random Information Exchanged based Policy

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi

1 What is an operating system?

Resolving Load Balancing Issue of Grid Computing through Dynamic Approach

Dynamic Balancing Complex Workload in Workstation Networks - Challenge, Concepts and Experience

Improving Http-Server Performance by Adapted Multithreading

Transparent Process Migration for Distributed Applications in a Beowulf Cluster

Study of Load Balancing Schemes over a Video on Demand System

OVERVIEW ABOUT THE RESOURCE SCHEDULING IN THE WEB OPERATING SYSTEM (WOS )

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification:

ANSAwise - Distributed and Networked Operating Systems

Overview of MOSIX. Prof. Amnon Barak Computer Science Department The Hebrew University.

Multiprocessor Scheduling. Multiprocessor Scheduling

Multiprocessor Scheduling

Performance of PVM with the MOSIX Preemptive Process Migration Scheme *

DISTRIBUTED COMPUTING

SDS: A Scalable Data Services System in Data Grid

Unit 5: Distributed, Real-Time, and Multimedia Systems

processes based on Message Passing Interface

High-availability services in enterprise environment with SAS Grid Manager

DBMS (FYCS) Unit - 1. A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information.

A Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec.

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications

Enhancing the Performance of Feedback Scheduling

CHAPTER 1 Fundamentals of Distributed System. Issues in designing Distributed System

Dynamic Translator-Based Virtualization

UNIVERSITY OF MINNESOTA. This is to certify that I have examined this copy of master s thesis by. Vishwas Raman

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN

OPERATING SYSTEM. Functions of Operating System:

Introduction to Parallel Processing

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

An optically transparent ultra high speed LAN-ring employing OTDM

Chapter 2 Operating-System Structures

Introduction to Grid Computing

OPERATING SYSTEMS. Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne

Contents. Today's Topic: Introduction to Operating Systems

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Operating systems Architecture

TDP3471 Distributed and Parallel Computing

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

Transcription:

A Distributed System with a Centralized Organization Mahmoud Mofaddel, Djamshid Tavangarian University of Rostock, Department of Computer Science Institut für Technische Informatik Albert-Einstein-Straße 21, D-18059 Rostock Tel.: ++49 (381) 498-3391, Fax: ++49 (381) 498-3440 E-Mail: mahm tav@informatik.uni-rostock.de Abstract One of the most important potential benefits of workstation clusters (client/server computing systems) is resource sharing. By interconnecting a number of workstations using a suitable network, a large number of hardware and software resources can be made available to users. In comparison with centralized computer systems (mainframes) and shared memory multiprocessors systems, the availability of such resources can be achieved at a fraction of the costs entailed by mainframe solutions, since workstations allow the use of standard components. Another advantage of workstation clusters is their scalability and flexibility towards an upgrade or extension with some other components. The concept of a mainframe system is sharing centralized resources among all users by switching between users processes (time sharing). The main advantage of such systems is that they are usually easy to administrate. In contrast to this concept, we will introduce a distributed system that does not only offer time sharing, but also sharing of computing nodes (workstations) according to the users application requirements (workstation sharing). Nevertheless, the organization of that system remains centralized in order to reduce the expense for management and maintenance. In this paper, we will explain the architecture and mechanism of a transparent system that consists of some Unix workstations running a network file system (NFS). Keywords: Workstation Clusters, Mainframe, Resource Management Systems, Distributed Operating Systems, Load Balancing. 1 Introduction The increasing use of workstation clusters (distributed systems) as a substitute for traditional stand-alone computer systems (mainframes) has gained a widely spread popularity due to the existence of inexpensive and powerful workstations. Recent advances in technology allow for the building of clusters of workstations by connecting high-performance workstations and/or PCs using a suitable network. Many experimental distributed systems with different types of networks have already been built at universities and research laboratories.

Usually, a distributed system always contains some idle workstations, whose computing power is not used. Douglis [1] for instance estimated that one-third of all workstations are usually idle in the Sprite system. In order to exploit that unused computing power, several software tools have been developed which offer remote executions of processes. Moreover, the user community of such a distributed system is usually not homogeneous: A. Bricker et al. [2] for instance, have observed three types of users: Type 1 users mostly use their workstations for sending and receiving mail or preparing papers, whereas type 2 users are frequently involved in the debugging cycle where they alternately edit and compile software. Such users have phases where their computing capacity is more than sufficient (e.g. during the editing of source code) and some other phases by turns (e.g. during the compilation) where they can make good use of more computing power. Finally, type 3 are users who frequently do large numbers of simulations. These users are usually never satisfied with the offered performance of just one single workstation and wish to distribute their load onto several machines working in parallel. Recent research is focusing on different solutions to exploit the unused processing power of these workstations. Resource Management Systems and Distributed Operating Systems are some solutions of this problem already found. Some well-known examples will be introduced briefly in section 3. In contrast to these solutions, we will show another attempt to exploit the overall performance of a workstation cluster. Within our concept, the cluster is regarded as a pool of workstations which in common take over the tasks of a mainframe. Thus, the system concept is similar to that of the traditional centralized system (mainframe), but with the exception that it is actually implemented as a distributed system and that workstations within that cluster are allocated depending on the requirements of the users applications. The basic idea of our system is therefore sharing a pool of UNIX workstations among all users and allocating the most appropriate workstation in order to meet the applications requirements. The main targets of this project are: Developing a software system that hides the system s topology from the users and eases the access to the system (i.e. user should not become aware of the fact that the underlying system is a distributed system). Designing a dynamic resource allocation made available for each users application according to its requirements (i.e. dynamically allocating the best suited workstation to the user depending on his application requirements). Section 4 will give a detailed overview of our system. In the next section we will discuss both centralized and decentralized systems, and introduce the design concepts as well as advantages and disadvantages of each concept. Section 5 will finally close this paper with some conclusions.

2 Computation Models This section describes the concepts of the centralized and the decentralized system model. Advantages as well as disadvantages of both systems will be given. 2.1 Centralized System Model (Mainframe) A mainframe system architecture is designated by some kind of centralized environments. Users of the mainframe are attached to central computer resources (e.g. the CPU or disk drives) via terminals. These terminals work as a user interface between the user and the mainframe, so that all commands received by the terminal are sent to the mainframe s CPU for execution. Since there may be hundreds of users working on the mainframe during the day, it must be able to handle multitask operations. This is done by continually switching between users processes (time sharing), so that the mainframe appears as if it executes each application (user process) simultaneously. In a centralized system, all computing power may be allocated to one user when no other users are attached to the system. As a consequence, the execution time of all users applications will be increased if the mainframe serves many users. The main advantages and disadvantages are listed in table 1. advantages It has a central and simple system management. It has a relatively high degree of security, because of the existence of only one kind of access to the system. Centralized Systems disadvantages The system performance for each user decreases when many users try to attach simultaneously. The system is relatively expensive (hardware and software). The scalability of the mainframe systems is extremely low. Table 1: Some advantages and disadvantages of a centralized system (incomplete) 2.2 Decentralized (Distributed) System Model A distributed system in our view is a collection of autonomous workstations and/or PCs connected by a suitable network, enabling client/server architectures. In contrast to centralized system architectures, the client-server model (distributed system) has gained a widely spread acceptance due to its advantages, like its flexibility and superior performance/costs ratio.however, a client-server system does not appear as a single computing device like the mainframe, since users must explicitly execute special commands to exploit the processing power of another workstation (e.g. rlogin, rsh) or special software tools which carry out this task. With such software tools, distributed systems provide high availability of resources at low costs, thus an application can be executed on different workstations at the same time in order to shorten the application response time. Some considerable advantages and disadvantages of distributed systems are listed in table 2.

advantages It is less expensive (hardware and software) than the centralized system. Flexibility, i.e. possibility to add some new and powerful workstations to enhance system performance in a cost-efficient way. Performance: one user can use all workstations of the system to perform his job faster by assigning each application to the best suited workstation. Fault tolerance in this system is extremely high, e.g. when one workstation goes down, the system will go on operating, but with less processing power. Scalability of distributed system is very high. Decentralized systems may consist of some specialized workstations for specific applications (like applications that need a large amount of processing power). 3 Distributed System Approach Decentralized (Distributed) Systems disadvantages It needs more complex system management. Users must execute special commands / software to exploit the unused performance of other workstations. It needs a complex software to recover faults in the intercommunication network. Table 2: Some advantages and disadvantages of a distributed system Managing distributed systems means controlling and supervising system resources in order to satisfy the user requirements by sharing resources among all users and balancing the load of the whole system. In comparison with traditional centralized systems, it is much more difficult to manage distributed computing systems, since the resources are distributed over a set of separate workstations and/or different sites and can only be accessed via the network. Therefore, a resource management system or another kind of software tool is needed in order to exploit the overall performance of the distributed system and to minimize the average response time of the users applications. Such a software tool is either located between the user applications and the operating system (like resource management systems which are capable of exploiting the overall performance of distributed systems while hiding the distribution aspect from the user) or between user applications and the hardware (like distributed operating systems, which are capable of hiding the distribution aspect from the user and create ease of use of the underlying system). 3.1 Resource Management Systems Resource management systems (e.g. LSF, Condor,...) are software tools that allow users to execute their applications on the most lightly loaded workstations without the need for the users to arrange a remote execution by using special commands for the underlying operating system.

Such systems are popular due to their benefits, like distributing the workload onto each workstation of the distributed system or the checkpointing mechanism for systems that offer process migration (task migration rsp.). With the help of resource management systems, distributed systems appear to the user as a single computing system providing a high throughput and better performance. There are many resource management systems, either for research or commercial. They differ in their implementations and mechanisms to treat the lack of the resource management in distributed systems. Examples of such systems are: Condor Condor [2][3][4] is a distributed batch queuing system for sharing the workload within a pool of UNIX workstations connected by a network. Codine Another resource management system targeted to optimize the utilization of software and hardware resources in a heterogeneous networked environment, similar to DQS. Load Sharing Facility (LSF) The abbreviation "LSF" [5][6] stands for Load Sharing Facility, a general purpose distributed queuing system for heterogeneous UNIX environments from Platform Computing Corporation. LSF unites a group of computers to a single system in order to improve the utilization of the common resources. As we can see from the sections ahead, resource management systems offer a lot of advantages in optimizing the use of the overall performance of workstation clusters. The most important advantage in this respect is that they help the distributed system to appear as a single computing device whose resources are commonly available. Unfortunately, not all user requirements are met by already existing systems. Some of these tools support process migration, load balancing, message passing, fault tolerance etc., others do not. This opens the way to new solutions, like the solution that is introduced at the end of this paper. (For more information about resources management systems see [13][14][15].) 3.2 Distributed Operating Systems Distributed Operating Systems are software environments that attempt to make the underlying system architecture act as a transparent system. Like a conventional operating system (OS), distributed operating systems consist of a uniform process space providing the basic functions of the OS, but with the difference that these functions are realized by different servers/machines which are usually part of the cluster. Hereby, each OS command is sent to the serving machine (responsible for the execution of that command) that eventually returns the results. This is done using the network between the nodes. From the users point of view, the cluster looks like a single time-sharing system. There are many distributed operating systems. They differ in their architectures and implemented mechanisms. Some of them have been invented on the basis of traditional operating systems like UNIX, others have been designed completely from the start. These distributed operating systems are built to support either a processor pool model or a workstation

pool model. The former model consists of a pool of processors, each with its own memory, and may be accessed from the user via X-Terminals. The latter system consists of some workstations connected by a LAN. Here, the user can use any workstation as a user interface to the system. In the following we will introduce some well-known distributed operating systems, show their architectures and concepts and point out their advantages and disadvantages. Amoeba Amoeba is a distributed operating system which allows a set of computers (CPUs) to act as a single conventional time-sharing system [7][8][9][10]. Sprite Sprite is a distributed operating system compatible with UNIX. Its basic design goal was the development of new technologies for the implementation of UNIX-like facilities in a workstation cluster [11][12], assuming a traditional model of computation. V Distributed Operating System The V distributed system [16] is an operating system for a cluster of workstations connected via a high speed intercommunication network. MOSIX MOSIX [17] is a distributed operating system that allows to enhance the utilization of a collection of workstations and that makes them appear as a single-machine UNIX environment. 4 A System of Virtual Workstations (ViWo) Over the past few years, research has been focused on finding solutions exploiting the scattered workstation performance either by use of a resource management system or a distributed operating system. ViWo system attempts to solve this problem by allocating the spare workstation performance of type 1 and type 2 users to be exploited by type 3 users [2]. In contrast to Condor which allocates idle workstations to type 3 users, ViWo system automatically assigns appropriate workstations to each user according to his applications requirements. For example, if a user simply wants to edit a (program) text, a workstation with low processing power will be assigned to him. If the same user needs to compile the edited program text, the system will assign another suited workstation to him if the workstation the user himself is logged in is not suited for this job. Therefore, the user has access to a virtual workstation that is the most useful machine for his topical task. ViWo system is a software tool capable of integrating a transparent system from a collection of workstations running UNIX with a distributed file system (Fig. 2). It provides the user with a single computing system device that is actually implemented as a distributed system. This means that ViWo does not follow the conception of special home workstations for each user, but the conception of a "public" workstation pool that may be accessed by all users. The basic idea is similar to that of a traditional mainframe, with the difference that the "mainframe" is now implemented in form of a cluster of workstations. Applications are executed on different workstations according to their requirements and the underlying workstation performance. The

system transparently assigns each user application to the most appropriate workstation according to its requirements (e.g. demands for CPU time or memory space). For this, users do not need to give explicit commands in order to execute some applications on another (remote) workstation. The systems allows users to exploit the processing power of the cluster not really knowing where his/her applications are eventually executed. As already mentioned, the mechanism of the time-sharing system is sharing one powerful computing unit among all users by switching between the users processes. The mechanism of the ViWo system however is sharing a pool of UNIX workstations among all users to meet solely their specific job requirements. As within other systems, ViWo system software (Fig. 1) is located between the user (applications) level and the operating system level, that means the system will operate in a way that the user has no information that the underling system is a collection of autonomous workstations but a transparent system (single computing device). Moreover, it operates without any modification of the underlying operating system kernel. User (Applications) level ViWo System level Operating system level Hardware level Fig. 1: System (Software) Layers Our system that may also be denoted as a resource sharing system supporting batch and interactive applications wants to combine the advantages of both, Resource Management Systems and Distributed Operating Systems. Furthermore, it is considered as a solution of the idle time problem in distributed systems in which usually 30%-50% of workstations are idle or lightly loaded. In order to achieve such a system, load balancing is needed, too. In the next subsections we will introduce our load balancing policies. In addition, we will discuss the problem of selecting a suitable load index that can be used for dynamic load balancing. 4.1 ViWo System Load Balancing Load balancing is usually defined as a technique that attempts to balance the workload among all workstations in order to achieve maximal utilization of the system potential resources and minimal system response time. Load balancing policies can be divided into two categories: static and dynamic. Static load balancing denotes the initial distribution of tasks (processes) among the workstations at their submission, while dynamic load balancing denotes the distribution and redistribution (migration) of processes even at run time according to the current system (load) information.

WS1 WS2 WS3 WS4 Printer Compute Server File Server WSn LAN XT1 XT2 XT3 XTm WS: Workstation XT: X-Terminal Fig. 2: ViWo System Topology Moreover, dynamic load balancing strategies can be further divided into centralized and decentralized policies. In the centralized strategy, one workstation acts as the system load balancer (manager) which must collect the load information of all workstations and migrate processes between workstations according to their load. In the decentralized one, each workstation acts as a load balancer exchanging load information with other machines in order to decide where and when to migrate processes. Dynamic load balancing depends on some factors: distributed system configuration (processors type and architecture, network type and topology) application requirements (CPU, memory,...) load balancing tools (load information, process migration) There are two major components of the mechanism required for dynamic load balancing: Load information which is responsible for the collection of information about each resource in order to decide which task should be migrated to which workstation, and process migration which is responsible for transferring tasks between workstations according to the system load information. In the next subsections we will introduce these mechanisms. For more information about taxonomies and classifications of load balancing policies see [19][20][21]. This section describes the load balancing mechanism that is intended to be implemented within the ViWo system. Load balancing in our system is strictly depending on the accuracy of the load information. Fig. 3 illustrates the components of ViWo system load balancing. As within LSF, the load on each workstation is measured by a Load Information Manager

(daemon) which runs on every workstation. This load information is sent to the Main Manager. The Main Manager manipulates this information determining a load information summary. That summary is then used by the Main Manager making the next decision for load balancing. According to the topical user activities and the load of the workstations, the Main Manager makes the decision which machine is the most appropriate workstation satisfying the job requirements of the next submitted job. Workstation Selection and Allocation Main Manager Decision Making Load Information Workstation Pool Fig. 3: Components of the ViWo System Load Balancing 4.1.1 Load Information Getting load information means collecting the current load states of the machines. The user activity is the usage of the workstations resources like CPU performance and memory space usage. Load on each workstation is the most usage of its resources like how many users in this workstation and the total load in its CPU and memory. Load Metric The system load is measured by using a Load Information Manager (LIM) (daemon) which collects periodically the load of each workstation. Each workstation sends its load information to the Main Manager (MM) that refers this load to a reference machine in order to eliminate the heterogeneity and stores all load information of the distributed system in form of a list. Within another list, the Main Manager stores workstations according to their topical free memory space. The Workstation which has the most memory space will be listed as number one and the last one will be the workstation which has the lowest memory space. Load Index In order to determine the actual load on each workstation, we must define a load metric, the so-called load index. The performance of the load balancing is dependent on this load index, i.e. load balancing may only be as good as the underlying load information it relies on. In distributed systems with heterogeneous workstations, there is another aspect concerning different CPU speeds and/ or memory spaces. This problem arises for example if there are two workstations each of them offering 20% free CPU power, but one provides double CPU speed. Therefore, we need a load index to determine the topical performance of each workstation.

We consider following quantities as load indices: CPU utilization (load average), free memory space, number of processes on each workstation (with respect to their owners), amount of memory space with process. By executing a benchmark program, we can determine the performance of each workstation as follows: Suppose that we have three workstations A, B, and C. Suppose in addition that we have executed the same benchmark program on each workstation. Workstation A has an execution time of 10 sec, workstation B 20 sec. and workstation C 30 sec. The load-factor of workstations A, B and C are: 10 Load FactorWS = ----- = 1 A 10 10 1 Load FactorWS = ----- = -- = B 20 2 1 -- Load Factor 2 WS A 10 1 1 Load FactorWS = ----- = -- = -- Load Factor C 30 3 3 WS A A similar ranking is possible for the static memory equipment of the machines. 4.2 ViWo System Software Components As illustrated in Fig. 4, ViWo system has five main components. The first component is the Load Information Manager (LIM daemon). The LIM runs on each workstation and is responsible for the measurement of the load. It observes and monitors all user activities. Each Load Information Manager sends its results to the second component, the Main Manager (MM daemon), which is responsible for the assignment of each user to the most appropriate workstation by comparing the user s application requirements with the available hardware. Besides, the incoming load information is used for the creation of and updating of a special user database and a resource database. The third component is the user database which contains a profile of the users with information gained by observation of user activities. The fourth component is the resource database which contains all information about each workstation concerning their current loads. This information is regularly updated by measuring the workstation load every period of time (polling). The last component is the user interface which will be located on each network computer or X-Terminal. With this interface, users can connect to the system by entering their

user name and password (login). User-Interface UD Main Manager RD User Database Resource Database Workstation Pool Fig. 4: ViWo System Organization 4.3 Concepts of the ViWo System The main task of the ViWo system is the control of the access to the workstation pool by assigning the most appropriate workstation to each user and monitoring and supervising user activities. As already mentioned, our system conception is similar to that of a mainframe, but with the difference that it attempts to use each workstation at its full utilization by assigning each user to the most appropriate workstation according to his applications requirements. This means that due to the task assignment offered by our system, there is an upper boundary of the number of essential workstations which is usually sufficient to satisfy all user requirements. In other words: Within our system, not every user does really need an own workstation to process his jobs, the workstation pool offers that computing power increasing the exploitation of its machines. It is therefore imaginable that some users are content with some simple, cost-efficient terminals not missing any performance of the underlying system that operates from the user s point of view as a transparent system. 4.4 ViWo System Architecture ViWo system consists of four main hardware components (Fig. 5). The basic and central component in this respect is a pool of workstations. Users are connected to this pool by use of X-Terminals or network computers. Another workstation acts as a file-server, a further one may be used as the home for the Main Manager. All these components are connected by a LAN. 4.5 ViWo System Mechanism Our system is based on a client-server model. The client is located on a network computer or X-Terminal which is activated by a user when he logs into the system. Users can submit job requests to the ViWo system server by defining their application requirements (CPU performance, memory space). The client requests a workstation from the server (Main Manager, Fig. 6), which in turn looks at the authorization of the user. If the user is not authorized, the

... XT XT XT System Manager WS Server LAN XT: X-Terminal WS: Workstation... WS WS WS Fig. 5: ViWo System Architecture server terminates the connection, otherwise it looks within the user database (UD) for any information about this user. If there is no information, the user must predefine some information about the resource he needs (e.g. CPU performance or memory space requirements). The server looks into the resource database (RD), compares the user requirements with the available resources and finally allocates the best suited workstation to the user. If there is some information about former activities of that user, e.g. from the last login to the system, the server assigns the best suited workstation to the user that meets the user requirements. Thus, each user can be simultaneously connected to more than one workstation. For example, if the user needs to write and compile a program, ViWo system will first assign a workstation with small performance (processing power) for the writing and editing of the source code, but when the user tries to compiles it, ViWo system will transparently compile it on another workstation that has more performance and memory according to the size of the program. This means, that perhaps one user is content with one workstation, and another user needs to work on more than one machine. Therefore, the users do not possess special home workstations, but a workstation pool public to all users. With this system we can reduce the number of workstations in any cluster to meet only users requirements, in order to avoid the idle time of workstations. The following lists some features (advantages) of the proposed workstation pool system: The costs of the system are small by use of already existing workstations. The overall system performance is maximized by assigning the most appropriate workstation depending on the kind of application. There is no concept of a special "home workstation", i.e. workstations are not owned by users. All users share a single "Workstation Pool". This system supports heterogeneous systems, i.e. different workstation architectures or equipments. Independence of the underlying system architecture (e.g network architecture). Support of load balancing. Superior performance/costs ratio in comparison with other centralized systems (mainframes). Scalability (e.g. we can add some new high performance workstations). Fault tolerance, i.e. when one workstation go down, the system may still operate, although with less processing power.

User WS Request Compression UD and RD Workstation Cluster WS1 User has no permission to access the system UD RD WS Load Connection Failed User has permission to access the system WS2 Request connection to the user Main Manager Connection WSn Fig. 6: Connection Protocol Enhancement of the utilization of the underlying system. No modifications of the operating system required. No modifications of applications required. Some disadvantages of the workstation pool system are: The system needs more complex management because of the resource distribution on different workstations. The security in ViWo system is extremely low because of the existence of more than one access to the system. 5 Conclusion Compared with conventional centralized systems (mainframes), distributed computing systems offer some considerable advantages (e.g. scalability, performance/costs ratio etc.), however at the expense of a much more complicated system management that is necessary for the exploitation of unused resources. Up to now, most attempts to replace mainframes by distributed computer systems are based on conventional time-sharing-conceptions.

In contrast to these attempts, we have introduced a new approach to a distributed system that operates with workstation-sharing (ViWo system). Within this system, each user is working on a machine that is suitable for the execution of his topical application (workstation sharing). If the user works simultaneously with different applications, he is also simultaneously attached to different machines, each chosen depending on the applications requirements, without being aware of it (transparency). Thus, ViWo system provides a solution to reduce and exploit idle times in workstation clusters. It is a software tool which tries to make a better use of the processing power of each workstation by assigning the most appropriate workstation(s) to each user to meet only his/her application requirements automatically. Using this system, we can reduce the number of workstations in each cluster that are essential for the settlement of all user tasks. ViWo system as a transparent system is supported by a network file system (NFS) to facilitate the access to any file from any workstation. Checkpointing and process migration is not intended for the first version. It tries to combines the best aspects of the resource management systems and distributed operating system to enhance the utilization of workstation clusters with the intention to reduce the number of essential workstations. 6 References [1] Douglis F., Ousterhout J.: "Transparent process migration: Design alternatives and the Sprite implimentation", Software Practice and Experience, 21(8), pp. 757-785, August 1991. [2] A. Bricker; M. Litzkow; M. Livny: "Condor Technical Summary", Technical Report 1096, Computer Science Department, University of Wisconsin-Madison, January 1992. [3] Epema; Livny; van Dantzig; Evers; Pruyne: "A worldwide flock of Condors: load sharing among workstation clusters", Journal on Future Generations of Computer Systems, Volume 12, 1996. [4] M.J. Litzkow and M.Livny: "Experience with the CONDOR Distributed Batch System", Proc. of the IEEE Workshop on Experimental Distributed Systems, Huntsville, AL, USA, 1990. [5] S. Zhou, J. Wang, X. Zheng, P. Delisle: "UTOPIA: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems", Computer Systems Research Institute, University of Toronto, Technical Report CSRI-257, April 1992. [6] Platform Computing Corporation, "LSF User s Guide", Toronto, Canada, Third Edition, February 1996. [7] Tanenbaum A.S., Kaashoek M.F., Renesse R. van, Bal H.E.; "The Amoeba Distributed Operating System - A Status Report", Computer Communications,Vol. 14, July/Aug., 1991, pp. 324-335. [8] Tanenbaum A.S.: "The Amoeba Distributed Operating System", Department of Mathematics and Computer Science, Vrije Universiteit, The Netherlands, 1994.

[9] Tanenbaum A.S.: "Modern Operating System", Prentice Hall, 1992. [10] Tanenbaum A.S.: "Distributed Operating System", Prentice Hall, 1995. [11] Douglis F., Ousterhout, J.K., Kaashoek, M.F., and Tanenbaum, A.S.: "A Comparison of two Distributed Systems: Amoeba and Sprite", Computing Systems Journal Vol. 4, pp. 353-384, Fall 1991. [12] Douglis F.: "Transparent Process Migration in the Sprite Operating System", Ph.D. Thesis, University of California (Berkeley), Computer Science Division, September 1990. [13] J. S. Kaplan, M. L. Nelson: "A Comparison of Queuing, Cluster and Distributed Computing Systems", NASA Langley Research Center, Technical Report, June 1994. [14] M. A. Baker, G. C. Fox, H. W. Yau: "A Review of Commercial and Research Cluster Management Software", Northeast Parallel Architectures Center, Syracuse University, Version 1.2, June 1996. [15] L. H. Turcotte.: "A Survey of Software Environments for Exploiting Networked Computing Resources", MSSU-EIRS-ERC-93-2, Engineering Research Center for Computational field Simulation, Mississippi State University, February 1993. [16] Cheriton D.R.: "The V Distributed System", Communication of the ACM, 31(3)314-333, March 1988. [17] Barak A., Guday S., and Wheeler R.G.: "The MOSIX Distributed Operating System load balancing for UNIX", Lecture Notes in Computer Science Vol. 672, Springer-Verlag, 1993. [18] Barak A., Braverman A., Gilderman L., and La adan O.: "The MOSIX Multicomputer Operating System for Scalable NOW and its Dynamic Resource Sharing Algorithms", Insititute of Computer Science, The Hebrew University of Jerusalem, Report 96-11, July 1996. [19] T. L. Casavant and J. G. Kuhl.: "A taxonomy of scheduling in general-purpose distributed computing systems", IEEE Transaction on Software Engineering, SE-14:141-154, 1988. [20] R. Lüling, B. Monien, F. Ramme.: "Load Balancing in Large Networks: A Comparative Study", 3rd IEE Symposium on Parallel and Distributed Processing 1991, pp. 686-689. [21] Anna Hac.: "Load Balancing in Distributed Systems: A Summary", Performance Evaluation Review Vol. 16, 2-4, February 1989.