ayaz ali Micro & Macro Scheduling Techniques Ayaz Ali Department of Computer Science University of Houston Houston, TX

ayaz ali Micro & Macro Scheduling Techniques Ayaz Ali Department of Computer Science University of Houston Houston, TX 77004 ayaz@cs.uh.edu

1. INTRODUCTION Scheduling techniques has historically been one of the most important research areas of operating systems. However little could be added to the performance of existing scheduling policies, as the underlying hardware architecture remained stagnant over the years. In order to provide high performance computation power to serve the increasing need of large applications, people strive to improve the single machine's capacity or construct a distributed system composed of a scalable set of machines. Recent developments in system architecture and rebirth of cluster computing are bound to demand some changes of current scheduling implementations. The project is divided into two distinct sections. Section1, which talks about micro scheduling, deals with current scheduling strategies and their analysis for two main classes of applications (interactive and CPU). Second part of the project focuses on a rather new area called macro scheduling. It discusses the possibilities of changes that could be made in operating system executives/kernels in order to get the best scheduling policy (load balanced task distribution) for multiprocessor architectures. Ayaz Ali (ayaz@cs.uh.edu) 2/14

2. MICRO SCHEDULING Existing scheduling policies employed by operating systems to allocate processor resource to tasks are termed as micro scheduling policies. One of the two main features of this project was to study those micro scheduling techniques and extend their ideas into better strategies suitable for both micro and macro schedulers. Scheduling policies can be measured based on following factors: Turnaround time Latency Throughput. Two main strategies can be employed for task scheduling, preemptive and nonpreemptive. In this project I have focused on the analysis of three scheduling policies: Round Robin preemptive First Come First serve non-preemptive A third strategy Shortest Job first requires prior knowledge of processes but theoretically is more efficient that the other two. It is important to note that a scheduling policy is not suitable for all types of processes. There is variety of application with variety of response time requirements. Before we could understand the application of a certain algorithm we need to understand what separates one class of application from another. 2.1 Process characteristics Performance of a scheduling policy greatly depends on the characteristics of tasks being scheduled and their interdependencies. There is no universally best scheduling policy that gives ideal performance for all sorts of tasks. In the project I have focused on interactive and cpu processes. Another main class of processes, the real-time processes is not discussed in this report. However techniques outlined here can be extended to work for processes with temporal constraints. Ayaz Ali (ayaz@cs.uh.edu) 3/14

Micro scheduling techniques have to take into account task characteristics in order to select best policy for a given scenario. Process inter-relationship takes center stage in selecting macro-scheduling policy. Based on inter-dependency of tasks, a set of tasks can be characterized as: Data dependant Resource dependant Precedence dependant Conditionally dependant. 2.2 Algorithms 2.2.1 Round Robin Most modern computer systems use a variation on the Round-Robin scheduling algorithm. The term 'preemptive multitasking' basically means switching between tasks by giving them equal share of resources. The way that the RR algorithm works is that the operating system lets each task run for a certain amount of time, called a time quantum, and then interrupts that task and passes control of the CPU on to some other task. For interactive systems, this scheme is generally best because it reduces latency on heavily loaded systems. The disadvantage to the Round Robin algorithm is that it has lower throughput. Every time the operating system interrupts a running task and gives the CPU to another task, it takes up a little bit of CPU time. 2.2.2 FIFO or FCFS (non-preemptive) First Come First Serve is useful only for a limited number of applications where throughput is more critical and latency and response time are virtually unimportant. It is generally not used on home computers, although the Linux scheduler can run this way if you tell it to. The FCFS algorithm is the most basic one, and it is similar to the way older systems such as MS-DOS used to schedule. Basically, the first program that comes along gets the CPU. No other process can have it until the first process is done. When it finishes, control Ayaz Ali (ayaz@cs.uh.edu) 4/14

passes to the next process in line. That's it. This results in the highest possible throughput because the currently running process gets 100% of the CPU time until it is finished; lesser number of context switches. However, latency suffers the most. A user has to wait till all processes ahead of it are done before it could get a response to his key press or mouse drag or a display to appear. 2.2.3 Shortest Job First (non-preemptive) The two scheduling policies discussed above have contrasting features. Round Robin gives probably the best latency while the throughput suffers. In FCFS case latency is terrible while we have the best possible throughput. There have been plenty of variations to these algorithms; one of the better ones that try to balance latency and throughput requirements is Shortest Job First. It however is not suitable for all sorts of processes. It is particularly suitable for real-time processes and high performance computing tasks where we can reasonably guess about the size of such process ahead of time. The trick is to give the shortest processes a chance to run and get completed before allocating the CPU to others. Preemptive as well as non-preemptive strategies are possible for SJF algorithm. Ayaz Ali (ayaz@cs.uh.edu) 5/14

3. SIMULATION Two batches of tasks were run for the three types of scheduling algorithms and their data collected as shown in following pages. Batch1 Processes: 1-15 Duration of each process: Uniformly distributed between 10.0 and 15.0. CPU burst: 10.0 IO burst: uniformly distributed between 10 and 20 Batch2 Processes: 16-30 Duration of each process: constant 4.0 CPU burst: 1.0 IO burst: uniformly distributed between 10 and 20 Ayaz Ali (ayaz@cs.uh.edu) 6/14

Round Robin First Come First Serve Ayaz Ali (ayaz@cs.uh.edu) 7/14

Shortest Job First Statistics Ayaz Ali (ayaz@cs.uh.edu) 8/14

3.1 Analysis of Simulation Results In the scheduling policies simulation shown above, Round Robin uses preemptive scheduling with a quantum of 1.1 units. As we can see RR algorithm makes the most number of context switches resulting in increased overhead and less throughput. However its latency is better than FCFS because each process gets regular attention of processor and that makes the response time better and latency lower. The other important factor to note is improved turnaround time, which is the time it takes for a process to finish from the time it first gets started. SJF has the best of both latency and turnaround time; even the throughput is comparable to FCFS, which makes it the best algorithm to use if process characteristics are known ahead of time. However in a realistic scenario, given the fact that for most processes we cannot predict their execution time in advance, Round Robin has an obvious edge over other algorithms. Round Robin can be bettered even further if quantum is kept adaptive to load and requirement of a process. This has give birth to multilevel queues based on priority of processes, with each queue having different quantum of scheduling. Linux uses these variations of Round Robin in its kernel. Ayaz Ali (ayaz@cs.uh.edu) 9/14

4. MACRO SCHEDULING Second part of this project focuses on macro-scheduling techniques for systems capable of running multiple processes at any given time. Contrary to microscheduling policies, macro-schedulers have limited applications. Only a particular class of processes can be efficiently scheduled using these techniques. Most of the research in this field has been carried out in the form of load balancing and task distribution on clusters. 4.1 Host Architectures Performance and selection of a scheduling algorithm is dependant on the architecture that they are running on. While a set of micro scheduling policies may remain consistent throughout, macro-scheduling policies will vary depending on the variations in underlying system architecture. Single processor systems have been there for quite long and most operating systems have successfully employed set of scheduling policies to work efficiently on such architectures. Recent advances have brought cluster and grid architectures of computation in picture. Processor vendors like Intel, HP and AMD have already made their dual processor chips available. In order to get maximum out of those hyper-threading capable architectures, macro-scheduling policies need to be embedded in existing operating systems; perhaps as extensions or hooks. One of the most popular forms of extracting maximum computational power out of existing resources is by arranging the processing units in the form of clusters. However, most of the task distribution techniques have been employed outside of operating systems hence adding a layer between applications and operating system kernels. Ayaz Ali (ayaz@cs.uh.edu) 10/14

Kernel Global Macro Scheduling Server Micro Schedulers Cluster Node Information Process Selection Module Macro Scheduler VMM Load Balancing Module Resource Broker Module Machine N Task Migrator Module Task Migration / Node Information CPU 1.1 CPU 1.2 Task n CPU N.1 Task 1 Task 2 4.2 Components of a Macro-scheduler A primitive architecture diagram of macro scheduler extension inside the kernel and its interaction with other modules is given above. As given in the diagram the proposed system should typically have four components: 1) Selection Policy It determines which task should be run/transferred to a processing node. Not all processes are suitable for remote scheduling. Processes with high coupling with other processes need to be scheduled on same machines. Since migration of tasks adds to latency, interactive and IO dependent processes are better suited to run locally. Introduction of this module takes much of functionality out of micro scheduler as well. In an ideal scenario, selection policy module could completely replace micro-scheduler. Ayaz Ali (ayaz@cs.uh.edu) 11/14

This module makes all the decisions about which processes to select to run locally and which processes are suitable to run remotely. Tasks that could effectively run locally are directed to Local Load balancing module. For all other tasks, they are routed to go through Resource broker module, which then decides which machine to migrate the task to. For a given local task, thread level scheduling can be effectively performed in Selection policy module because of high coupling among threads of same processes. One of the variations of micro scheduling policies can be used for tasks suitable to run locally. For those tasks that could run remotely, Largest Job First algorithm could be more suitable as it has least communication time to service time ratio. 2) Resource Broker It determines the availability of resource for the task to be transferred. Resource broker forms the core of macro scheduler as it presents an abstraction of N processing units to micro scheduler. Each time the local task queue exceeds a threshold, selection policy module passes the task to load balancing module which in turn queries this module and gets updated information about available resource. Resource broker keeps all the information about available nodes by querying a central server. In case of server failure past static information (average) is used to predict load at registered nodes. 3) Load Balancing It determines the best node out of available local resources. Load balancing module uses resource availability information of resource broker to find the best node for task transfer. 4) Task Migratory It handles the migration of task and collection of results. One of the biggest handicaps of task distribution algorithms has been the communication overhead, which makes a big proportion of real service time. The other problem that has been addressed efficiently by PVM and MPI is marshalling of data and issues related to consistency. As discussed above only tasks with considerably large service time S are chosen, where t s is considerably larger than migration time t m. Ayaz Ali (ayaz@cs.uh.edu) 12/14

5. LESSONS LEARNT Nachos Experience The test bed to carry out this project was planned to be Nachos instructional operating system. There was a steep learning curve involved for getting acquainted to the Nachos architecture. Most of the initial time was spent learning about Nachos and porting it to Linux, which in itself was an uphill task involving installation of cross compiler for MIPS. However once Nachos was ported to Linux, there were plenty of other surprises because of unavailability of a fully functional Nachos distribution. Apparently all the distributions lacked thread and network module implementations. Although I could not use Nachos, a lot was learnt about the nachos architecture and cross compiler construction. Scheduling Simulator Instead of putting more time and effort building thread and networking features into Nachos, I tried to develop a scheduling simulator along with resource broker, which included a clock (to simulate machines of varying processor speeds), a scheduler with two policies FCFS and RR. Two classes of processes were defined - interactive and CPU process. Unfortunately I could not get time to implement macro-scheduling part (resource broker and task migration) of this considering that lots of time was spent on making this project run on nachos. Third Party Simulators The simulation and its results given above were generated using Scheduling simulator by Prof. Steve Robins of UT San Antonio http://vip.cs.utsa.edu/nsf/index.html. Another tool Cheddar was used to simulate multiple processor resources for variety of tasks. The tool however did not produce relevant results, which could be used to explain task distribution among processing units. Cheddar is maintained by university of Brest and can be found at http://beru.univbrest.fr/~singhoff/cheddar. Ayaz Ali (ayaz@cs.uh.edu) 13/14

6. REFERENCES [1] H.D. Karatza, A Simulation Model of Task Cluster Scheduling in Distributed Systems, The Seventh IEEE Workshop on Future Trends of Distributed Computing Systems, Tunisia, South Africa, December 20-20, 1999 [2] Marei S. Al-Amri and E. A. Rana, New job selection and location policies for load-distributing algorithms, Intl. J. Network Mgmt 2002; 12: 165-178 [3] M.A. Schaar, Kemal Efe and Weijia Shang, Queueing performance analysis of co-scheduling in a pool of processors environment, Proceedings of the 8th international conference on Supercomputing, 1994. [4] Steven Robbins and Kay A. Robbins, Empirical Exploration In Undergraduate Operating Systems. [5] F. Singhoff, J. Legrand, L. Nana, L. Marcé, Cheddar: an Open and Flexible Real Time Scheduling Framework. [6] G. A. Geist, V. S. Sunderam, The Evolution of the PVM Concurrent Computing System, 1993 Ayaz Ali (ayaz@cs.uh.edu) 14/14