Makuhari, Chiba 273, Japan Kista , Sweden. Penny system [2] can then exploit the parallelism implicitly

Size: px
Start display at page:

Download "Makuhari, Chiba 273, Japan Kista , Sweden. Penny system [2] can then exploit the parallelism implicitly"

Transcription

1 Dynamic Scheduling in an Implicit Parallel System Haruyasu Ueda Johan Montelius Institute of Social Information Science Fujitsu Laboratories Ltd. Swedish Institute of Computer Science Makuhari, Chiba 273, Japan Kista , Sweden Abstract Penny is a system that exploits ne-grained parallelism in an AKL program. During execution, a set of workers (processes) share a set of tasks that are created dynamically. To achieve good parallel speedup, a worker must be assigned a new task as soon as it becomes idle. Since no compiler support nor user annotations are available to guide the execution, the Penny system needs a very ecient dynamic scheduler. We have developed a congurable scheduler in order to experiment with dierent approaches. We evaluated and analyzed the approaches by running a small set of benchmarks for which statistics and performance are reported. Keywords: Dynamic scheduling, implicit parallel, concurrent logic language, concurrent constraint, auto-scheduler 1 Introduction The use of parallel computers has so far been restricted to systems written in programming languages that make explicit use of the machine resources. Therefore large regular problems, that is easily divided into independent parts, are the main applications of parallel computers. However, as parallel computers are increasing in popularity, parallelizing even smaller irregular problems is becoming more interesting. The parallelizing of irregular problems by handis very dicult. The rst obstacle is to code the problem in a way that allows parallel execution. This is probably the hardest part since it often requires both the redesigning of the algorithms and the hazard of shared data and locks that can (and will) create both erroneous results and deadlocks. The second problem is to divide the program into a suitable numberoflarge parts and to assign these to processors. If the problem is irregular, the task can be very dicult. The work is made easier by using a concurrent language, such as AKL [1], that allows the programmer to implement communicating processes without having to use locks or barriers for synchronization. The Penny system [2] can then exploit the parallelism implicitly available in the program, i.e. there is no need for user annotations. This places high demands on the implementation of the Penny system it must be able to handle the scheduling of tasks very eciently. 2 Penny System The Penny system is a parallel implementation of the concurrent constraint language AKL[1] on a shared memory parallel computer. The features of AKL are deep guards and encapsulated search. Both so-called AND- and OR-parallelisms are exploited bythepenny system. This is a very simple model of the Penny system for understanding the scheduler [1, 2]. In an execution of Penny, asetofworkers is rst spawned by the system. A worker is implemented as a thread in an operating system and is dynamically scheduled to a processor by the operating system. A worker should always be assigned to a processor so there is no need to create more workers than the number of available processors. All workers are equal, i.e., there is no Master-Slave relationship. Each worker will execute a given task, and while doing so it will possibly generate new tasks. When a task is completed, a new task must be found and assigned to the worker. The problem is how anew task should be found by the scheduler. We have divided the scheduler into two parts: a local scheduler and a global scheduler. The local scheduler takes care of the scheduling of tasks generated by aworker itself. As long as a worker has enough tasks, it will only call the local scheduler. When a worker runs out of tasks, it will call the global scheduler. It will try to locate a new task either in a global pool of tasks or by taking a task from another worker. The local scheduler must exist also in a sequential implementation since the tasks also reect the concur-

2 rent nature of the system. However, the global scheduler is only needed in a parallel implementation. 2.1 The Execution State The execution state can be divided into a global structure that is shared among all workers and local structures that are owned by individual workers. The local structures can be accessed by the other workers but they are controlled by the owner. The global structure is a representation of all available processes. Each process is either new, running or suspended. The local structures of a worker consist of a pointer to at most one running process, a continuation task stack,andawake task stack. A busy worker that is executing a running process can generate new processes. A pointer to a such process is a continuation task. The worker can also generate data that will allow a suspended process to continue its work. A Pointer to a such process is a wake task. Eachtaskis pushed on the task stack for it owned by the worker. 2.2 The Local Scheduler Aworker will continue to execute the running process until it either terminates or suspends. There is no preemption of processes in the system. When a worker has nished the execution of a process, it must select a new process from the last one created or the last woken one. Some systems have chosen a strategy called eagerwake, where a worker stops the execution of a running process as soon as a wake task is generated. This approach has some benets but it requires very often context switch of high cost. As well as the cost, in a implicit parallel system, it should be avoided since this is not under the control of the programer. Therefore we have decided to choose the last created process than waken process. This paper will not discuss the strategies of the local scheduler more since this would require a deeper understanding of the execution mechanism and semantics of the AKL language. 2.3 The Global Scheduler In the global scheduler, both types of tasks can be moved from a busy worker to an idle worker. There is no dierence among these types of tasks in terms of switching overhead. The global scheduler does however try to prioritize distribution of wake tasks since these are more expensive for a busy worker to handle. The main diculty with the global scheduler is that the performance of a scheduler will drastically change depending on the executed program. Dierent programs can have very dierent mixes of parallel and sequential parts. A scheduler that excels for one type of program can perform poorly on another. The goal is not to nd a scheduler that outperforms other schedulers on certain programs, but to nd a scheduler that has a predictable behavior for all types of programs. There is no methodology for designing a scheduler without having a real system to experiment with. It is very hard to predict performance for a cache-based shared-memory multiprocessor because the memory usage and cache hit-ratio is often the limiting factor. The Penny systemwas therefore developed with a con- gurable global scheduler to enable to experiment with dierent approaches. The system is designed so that dierent scheduler approaches can be tested without changing the basic execution mechanism. We have experimented with four dierent approaches two using a global pool of tasks and two working directly with the local stacks. An advantage of using a global task pool is that it makes load balancing easier. The disadvantage is it complicates the implementation and in some cases the overhead is greater than the eects of a better balancing of the load. The schedulers also dier in who is responsible for the distribution of tasks. This responsibility canbe placed on idle or busy workers. If the responsibility is placed on busy workers, it is questionable how much overhead a busy worker is allowed to pay in order to keep idle workers from waiting. The four schedulers are: The busy workers periodically check if the global pool is empty and if there are any idle workers. If so, some of its tasks are moved to the global pool. An idle worker will then collect them from here. The scheduler is driven by busy workers without the use of a global task pool. Busy workers will check if there are any idleworkers each a new task is created. If so, an idle worker will be given the task directly. The scheduler is driven by idle workers without the use of a global task pool. An idle worker will look for a task directly in the stacks of the busy workers. When a task is found, the idle worker will steal it. The scheduler is similar to the but there is only one \thief" active at any given moment.

3 This thief will however steal as many tasks as possible, take some for itself but place the rest in the global pool. The other idle workers will simply wait for task to be placed in the global pool Execution (Life benchmark) response Hood 2.4 Implementation There is no central scheduler process. Instead, each worker performs the necessary operations to distribute tasks. Since the workers will access some shared datastructures, some locks have to be used. The locks are implemented as spin locks using atomic-swap instructions. The task pool is implemented as a FIFO queue and is protected by a lock. The lock overhead is very small and few collisions occur. In both the and schedulers, an idle worker will access the local stacks of a busy worker directly. Since the task stacks are accessed both by the owner worker and idle workers, the stacks are protected by a lock. In the scheduler driven by thebusy workers, no lock is necessary for the local stacks. The overhead of the lock could cause a large overhead but it turns out that as long as a worker is left alone, the overhead is very small. In the scheduler a busy worker must detect if there are any idle workers. This test is integrated with the garbage collection test that has to be done anyway. By setting the garbage collection ag, an idle worker will stop the busy workers. The busy workers must then detect if actual garbage collection was necessary or if an idle worker was requesting tasks. This scheme induces very low overheads as long as all workers are busy. In the scheduler, a worker must determine if there is an idle worker each a task is created. This is much more frequent than the test for garbage collection and does induce a noticeable overhead. 3 Evaluation We used a Sun SparcCenter with 8 processors running the Solaris 2.4 operating system. Each experiment was done while no one else used the machine. Up to fourteen workers were allocated in spite of the fact that only eight processors were available. This is done in order to simulate what will happen on a loaded system. A scheduler that performs well on a lightly loaded machine can have a very bad performance on a loaded machine. For each scheduler and for each benchmark, we measured the total execution for dierent numbers of workers. Each experiment was done forty s Figure 1: Execution of Life benchmark. Exec. Total busy Total idle Av. busy Av. idle No. of scheduling Table 1: Statistics of Life benchmark with 7 workers and the shortest was taken for the evaluation. In addition to the execution, statistics were gathered from an execution log. A specially compiled version of the system will log the each worker becomes idle and when it resumes execution. This information is then used to count the total busy-, idle-, and global scheduling operations. This can then be used to estimate the average for and between scheduling operations. All will be reported in milliseconds. 3.1 The Game of Life The benchmark is an implementation of the \game of life" where each cell is implemented as an AKL process. Each process has to communicate with all of its neighbors to determine its next state. This causes an abundance of tasks that can be executed in parallel. Figure 1 shows the execution with each scheduler according to the number of workers. Table 1 shows the statistics gathered from an experiment using seven workers. The execution reported in Table 1 diers signicantly from one in Figure 1 since the overhead of generating the log-le for this benchmark can not be ignored.

4 Towers of Hanoi (solution only) Towers of Hanoi Figure 2: Execution of Hanoi benchmark, solution only. Up to eight workers, the dierences between the schedulers are not very large. There is an initial overhead payed by the and schedulers using one worker, but this is regained when more workers are used. The scheduler does not perform as well as the other schedulers when between four to eight workers are used. One reason for this performance is found in Table 1. The scheduler performs more than three s as many scheduling operations as the others. This can be explained both the and the schedulers can use the global pool to move more tasks in a scheduling operation. The most interesting thing is the dramatic decrease in performance of the voluntary scheduler when more than eight workers are used. In the scheduler, the idle workers must waitforabusyworker to distribute tasks. If a busy worker is not scheduled by the operating system, it can not distribute its task and the idle workers must spend their -slot waiting. The scheduler does not suer dramatically by this eect probably because a busy worker will detect idle workers more quickly. 3.2 Towers of Hanoi The benchmarks are two implementations of the \Towers of Hanoi" puzzle. The rst one only generates a list of all necessary plate movements of a solution. As can be seen in Figure 2, this benchmark does not show any dierences between the schedulers apart from a small overhead for the scheduler. In the other benchmark, a procedure that traverses Figure 3: Execution of Hanoi benchmark, with counting. Number of busy workers Number of busy workers for each 100 ms Elapsed (msec) Figure 4: Number of busy workers for Hanoi benchmark (3 workers). the list and counts the number of moves was added. This procedure can be done in parallel with the rst part, since the list of solutions is produced incrementally, but it can in itself not be parallelized. The counting of the list sets an upper bound on the obtainable speedup, no strategy can execute the program in less than 900 ms., which is the to traverse the list. The and schedulers now perform signicantly better than the other schedulers when four workers are used. There is also a dierence when more than eight workers are used but then in favor for the scheduler. Figure 4 shows the number of busy workers in each 50 ms. interval during an execution using three work-

5 Matrix multiplication 2800 Smith-Waterman algorithm Penny gcc 1 gcc-o2 gcc-o Figure 5: Execution of Matrix benchmark Figure 6: Smith-Waterman. ers. As can be seen, the and schedulers terminate the parallel part much earlier but then spend more to complete the execution. The and schedulers have scheduled the counting procedure at an early phase and havetraversed the list at the almost same speed as it has been produced. To achieve good parallel performance for this benchmark, the counting procedure would have tobe scheduled with a higher priority but there is no way to annotate an AKL program for the programmer. This is the drawback of a completely implicit parallel system. 3.3 Matrix Multiplication The benchmark is a multiplication of a matrix and avector. A worker will start working on the rst row of the matrix and create a continuation task for the remaining rows. This task has to be assigned to another worker who in turn will. After starting with the second row, create a new continuation task etc. There is at most one task available and it will create some strange behavior. Figure 5 shows that the and the schedulers have very good parallel performance. The schedulers are not disturbed even when more than eight workers are used. The scheduler and scheduler have very poor performance. The statistics gathered from the executions does not fully explain why but it is clear that the single task does not propagate quickly enough. Both schedulers perform very few scheduling operations. 3.4 Smith-Waterman The Smith-Waterman algorithm is used when DNA sequences are compared. A typical application is to nd a sequence which is the best match with a certain sequence from a database. This is an obvious parallel application since all comparison are independent. It is more challenging to parallelize the Smith- Waterman algorithm itself. In the Penny system, this is done automatically without any changes of the original AKL implementation of the algorithm. In order to be competitive withacprogram, an extra builtin was added to the Penny system. The builtin performs the most primitive arithmetic operation in the algorithm and increases the overall performance with a factor three. In the development of the Penny system, very little eort has been spent on compiling arithmetic operations. Two sequences with 600 elements were compared to each other. The experiments were executed on asparccenter- with twenty processors and the minimum execution of 100 runs are reported. The C program was allowed to do the execution ten s to avoid the initial cost of lling the caches and initializing memory blocks, this is not reported by the Penny system. Figure 6 shows the performance of the Penny system compared to a optimized C program compiled with gcc with dierent optimizations levels. The Penny system with two processors outperforms the plain gcc compiled program. With three processors, it almost performs as fast as the C program compiled with -O4 option. The results are very encouraging. Although a high

6 Proc KLIC Penny Penny/KLIC Table 2: The game of life ( in milliseconds) level language such asaklwillhave a hard to compete with a C compile, the results show thatan implicit parallel system can match the sequential C program even with as little as two processors. 4 Related works 4.1 KLIC One of the best concurrent logic program systems is the KLIC compiler [4]. It compiles KL1 programs into C and produces very fast code. The KL1 language is almost a compleat subset of the AKL language and the benchmarks presented in this paper uses almost same program constructs as of KL1 language. For the above benchmarks, the main dierence between KLIC and Penny is that KLIC is an explicit parallel system. The programmer has to annotate the code to make it run in parallel and the gained performance is depending on the skill of the programmer. To compare the two systems, a \life" benchmark of KLIC was selected. The KL1 version is a reduced game of life, where each cell only has four neighbors. The grid is divided into clusters which are then distributed on the available processors. We ran the experiment with a 30x40 grid divided into twelve clusters of 10x10 cells each. The division allows an even distribution of clusters on available processors. The timings were the best in ten consecutive runs, thus avoiding the extra it takes to boot the system. In the Penny version, no annotations are necessary to parallelize the program. The program is simplied since there is no need to divide the program up into clusters all cells are treated equally. The gures in the Table 2 show that the Penny system, for this benchmark, is only half as slow as the KLIC system while the normal factor between the KLIC system and Penny is around four to six (somes up to ten). The main reason why the life benchmark shows so good performance is: Although the KLIC system is much faster at decoding instructions, since it does not have the overhead of an emulator, a large part of the execution is spent in other parts of the system. The instruction decoding handler is not so important for this benchmark. In the KLIC system, binding shared variables (shared between nodes) is considerably more expensive than binding non-shared variables. In the life benchmark, about 10% of the communication (through variables) is performed with shared variables. In the Penny system all variables are potentially shared and much eort has been spent making the binding operation as ecient as possible. 4.2 Auto-scheduler The combination of a global task pool and local task stacks is similar to the idea of the distributed task queue in an auto-scheduling [3] environment. In auto-scheduling, compiler and explicit denotation of the program helps the destination of the task. In the Penny system, all tasks are rst pushed in the local stack and moved by the global scheduler only when needed. It does not need any explicit indication for the destination processor in a program. Since this scheduler is congurable depending on the program instead of xed in the system, it can be more ecient and exible. 5 Summary In the development of the schedulers, the statistics gathered from the system during executions have been very important. It has explained many, but not all, strange behavior. Only measuring the execution or number of scheduling events is not enough. One need to have a trace of the execution in order to understand the behavior. Gathering the statistics must be performed with a minimum of interference since the execution otherwise will behave dierently. The scheduler was the best allround scheduler. It did not always out perform the other schedulers but had a predictable behaviour. The problems with the schedulers do not show up when highly parallel benchmarks are executed. The problems emerge when there is little parallelism, when the obtained speedup is depending on the distribution of a single task or when the machine has a high load. A scheduler that performs well on a unloaded machine can break down if the executed on a machine with high load. This is often neglected since it is more convenient to run the benchmarks on a unloaded system.

7 Acknowledgements We thank people at SICS for discussions, especially Prof. Seif Haridi and Dr. Sverker Janson. References [1] Sverker Janson and Seif Haridi, \Programming paradigms of the Andorra Kernel Language Programming," Logic Programming: Proc. of the 1991 Int'l Logic Programming Symposium. MIT Press, [2] Johan Montelius and Khayri Ali, \An and/or parallel implementation of AKL," New Generation Computing, 13(4), [3] Jose E. Moreira and Constantine D. Polychronopoulos, \Autoscheduling in a Distributed Shared-Memory Environment," Languages and Compilers for Parallel Computing, 7th Int'l Workshop Proc., LNCS 892, Springer-Verlag, [4] KLIC. klic-requests@icot.or.jp, ICOT.

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

Virtual Memory COMPSCI 386

Virtual Memory COMPSCI 386 Virtual Memory COMPSCI 386 Motivation An instruction to be executed must be in physical memory, but there may not be enough space for all ready processes. Typically the entire program is not needed. Exception

More information

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

Introduction. 1 Measuring time. How large is the TLB? 1.1 process or wall time. 1.2 the test rig. Johan Montelius. September 20, 2018

Introduction. 1 Measuring time. How large is the TLB? 1.1 process or wall time. 1.2 the test rig. Johan Montelius. September 20, 2018 How large is the TLB? Johan Montelius September 20, 2018 Introduction The Translation Lookaside Buer, TLB, is a cache of page table entries that are used in virtual to physical address translation. Since

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Java Virtual Machine

Java Virtual Machine Evaluation of Java Thread Performance on Two Dierent Multithreaded Kernels Yan Gu B. S. Lee Wentong Cai School of Applied Science Nanyang Technological University Singapore 639798 guyan@cais.ntu.edu.sg,

More information

Course Syllabus. Operating Systems

Course Syllabus. Operating Systems Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

Today s class. Scheduling. Informationsteknologi. Tuesday, October 9, 2007 Computer Systems/Operating Systems - Class 14 1

Today s class. Scheduling. Informationsteknologi. Tuesday, October 9, 2007 Computer Systems/Operating Systems - Class 14 1 Today s class Scheduling Tuesday, October 9, 2007 Computer Systems/Operating Systems - Class 14 1 Aim of Scheduling Assign processes to be executed by the processor(s) Need to meet system objectives regarding:

More information

Review: Creating a Parallel Program. Programming for Performance

Review: Creating a Parallel Program. Programming for Performance Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)

More information

Low-Cost Support for Fine-Grain Synchronization in. David Kranz, Beng-Hong Lim, Donald Yeung and Anant Agarwal. Massachusetts Institute of Technology

Low-Cost Support for Fine-Grain Synchronization in. David Kranz, Beng-Hong Lim, Donald Yeung and Anant Agarwal. Massachusetts Institute of Technology Low-Cost Support for Fine-Grain Synchronization in Multiprocessors David Kranz, Beng-Hong Lim, Donald Yeung and Anant Agarwal Laboratory for Computer Science Massachusetts Institute of Technology Cambridge,

More information

Efficient Work Stealing for Fine-Grained Parallelism

Efficient Work Stealing for Fine-Grained Parallelism Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n

More information

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Processes and Non-Preemptive Scheduling. Otto J. Anshus Processes and Non-Preemptive Scheduling Otto J. Anshus Threads Processes Processes Kernel An aside on concurrency Timing and sequence of events are key concurrency issues We will study classical OS concurrency

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

1 Multiprocessors. 1.1 Kinds of Processes. COMP 242 Class Notes Section 9: Multiprocessor Operating Systems

1 Multiprocessors. 1.1 Kinds of Processes. COMP 242 Class Notes Section 9: Multiprocessor Operating Systems COMP 242 Class Notes Section 9: Multiprocessor Operating Systems 1 Multiprocessors As we saw earlier, a multiprocessor consists of several processors sharing a common memory. The memory is typically divided

More information

Multiprocessor and Real-Time Scheduling. Chapter 10

Multiprocessor and Real-Time Scheduling. Chapter 10 Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems

More information

T H. Runable. Request. Priority Inversion. Exit. Runable. Request. Reply. For T L. For T. Reply. Exit. Request. Runable. Exit. Runable. Reply.

T H. Runable. Request. Priority Inversion. Exit. Runable. Request. Reply. For T L. For T. Reply. Exit. Request. Runable. Exit. Runable. Reply. Experience with Real-Time Mach for Writing Continuous Media Applications and Servers Tatsuo Nakajima Hiroshi Tezuka Japan Advanced Institute of Science and Technology Abstract This paper describes the

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems DM510-14 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance 13.2 Objectives

More information

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process

More information

director executor user program user program signal, breakpoint function call communication channel client library directing server

director executor user program user program signal, breakpoint function call communication channel client library directing server (appeared in Computing Systems, Vol. 8, 2, pp.107-134, MIT Press, Spring 1995.) The Dynascope Directing Server: Design and Implementation 1 Rok Sosic School of Computing and Information Technology Grith

More information

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser HPX High Performance CCT Tech Talk Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 What s HPX? Exemplar runtime system implementation Targeting conventional architectures (Linux based SMPs and clusters) Currently,

More information

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) Past & Present Have looked at two constraints: Mutual exclusion constraint between two events is a requirement that

More information

Preview. Process Scheduler. Process Scheduling Algorithms for Batch System. Process Scheduling Algorithms for Interactive System

Preview. Process Scheduler. Process Scheduling Algorithms for Batch System. Process Scheduling Algorithms for Interactive System Preview Process Scheduler Short Term Scheduler Long Term Scheduler Process Scheduling Algorithms for Batch System First Come First Serve Shortest Job First Shortest Remaining Job First Process Scheduling

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central

More information

IT 540 Operating Systems ECE519 Advanced Operating Systems

IT 540 Operating Systems ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (5 th Week) (Advanced) Operating Systems 5. Concurrency: Mutual Exclusion and Synchronization 5. Outline Principles

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition Chapter 13: I/O Systems Silberschatz, Galvin and Gagne 2013 Chapter 13: I/O Systems Overview I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations

More information

Intel Thread Building Blocks, Part IV

Intel Thread Building Blocks, Part IV Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

Chapter 12: I/O Systems

Chapter 12: I/O Systems Chapter 12: I/O Systems Chapter 12: I/O Systems I/O Hardware! Application I/O Interface! Kernel I/O Subsystem! Transforming I/O Requests to Hardware Operations! STREAMS! Performance! Silberschatz, Galvin

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance Silberschatz, Galvin and

More information

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition Chapter 12: I/O Systems Silberschatz, Galvin and Gagne 2011 Chapter 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS

More information

task object task queue

task object task queue Optimizations for Parallel Computing Using Data Access Information Martin C. Rinard Department of Computer Science University of California, Santa Barbara Santa Barbara, California 9316 martin@cs.ucsb.edu

More information

Processes and Threads. Processes: Review

Processes and Threads. Processes: Review Processes and Threads Processes and their scheduling Threads and scheduling Multiprocessor scheduling Distributed Scheduling/migration Lecture 3, page 1 Processes: Review Multiprogramming versus multiprocessing

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Week 05 Lecture 18 CPU Scheduling Hello. In this lecture, we

More information

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Effectiveness of parallel computing depends on the performance of the primitives used to express

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

Following are a few basic questions that cover the essentials of OS:

Following are a few basic questions that cover the essentials of OS: Operating Systems Following are a few basic questions that cover the essentials of OS: 1. Explain the concept of Reentrancy. It is a useful, memory-saving technique for multiprogrammed timesharing systems.

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2 Hazard Pointers Store pointers of memory references about to be accessed by a thread Memory allocation checks all hazard pointers to avoid the ABA problem thread A - hp1 - hp2 thread B - hp1 - hp2 thread

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Request Network Reply Network CPU L1 Cache L2 Cache STU Directory Memory L1 cache size unlimited L1 write buer 8 lines L2 cache size unlimited L2 outs

Request Network Reply Network CPU L1 Cache L2 Cache STU Directory Memory L1 cache size unlimited L1 write buer 8 lines L2 cache size unlimited L2 outs Evaluation of Communication Mechanisms in Invalidate-based Shared Memory Multiprocessors Gregory T. Byrd and Michael J. Flynn Computer Systems Laboratory Stanford University, Stanford, CA Abstract. Producer-initiated

More information

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems Processes CS 475, Spring 2018 Concurrent & Distributed Systems Review: Abstractions 2 Review: Concurrency & Parallelism 4 different things: T1 T2 T3 T4 Concurrency: (1 processor) Time T1 T2 T3 T4 T1 T1

More information

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) 1/32 CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) When do we make decision? 2/32 CPU Scheduling Scheduling decisions may take

More information

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections ) CPU Scheduling CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections 6.7.2 6.8) 1 Contents Why Scheduling? Basic Concepts of Scheduling Scheduling Criteria A Basic Scheduling

More information

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Objectives of Chapter To provide a grand tour of the major computer system components:

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

Chapter 5 Concurrency: Mutual Exclusion and Synchronization

Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles Chapter 5 Concurrency: Mutual Exclusion and Synchronization Seventh Edition By William Stallings Designing correct routines for controlling concurrent

More information

1 Introduction. 2 total-store-order. Take me for a spin. 2.1 Peterson's algorithm. Johan Montelius HT2016

1 Introduction. 2 total-store-order. Take me for a spin. 2.1 Peterson's algorithm. Johan Montelius HT2016 Take me for a spin Johan Montelius HT2016 1 Introduction We will rst experience that we can not implement any synchronization primitives using regular read and write operations. Then we will implement

More information

q ii (t) =;X q ij (t) where p ij (t 1 t 2 ) is the probability thatwhen the model is in the state i in the moment t 1 the transition occurs to the sta

q ii (t) =;X q ij (t) where p ij (t 1 t 2 ) is the probability thatwhen the model is in the state i in the moment t 1 the transition occurs to the sta DISTRIBUTED GENERATION OF MARKOV CHAINS INFINITESIMAL GENERATORS WITH THE USE OF THE LOW LEVEL NETWORK INTERFACE BYLINA Jaros law, (PL), BYLINA Beata, (PL) Abstract. In this paper a distributed algorithm

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell School of Business and Informatics University of Borås, 50 90 Borås E-mail: Hakan.Sundell@hb.se Philippas

More information

residual residual program final result

residual residual program final result C-Mix: Making Easily Maintainable C-Programs run FAST The C-Mix Group, DIKU, University of Copenhagen Abstract C-Mix is a tool based on state-of-the-art technology that solves the dilemma of whether to

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

Computer Architecture Lecture 24: Memory Scheduling

Computer Architecture Lecture 24: Memory Scheduling 18-447 Computer Architecture Lecture 24: Memory Scheduling Prof. Onur Mutlu Presented by Justin Meza Carnegie Mellon University Spring 2014, 3/31/2014 Last Two Lectures Main Memory Organization and DRAM

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 What is an Operating System? What is

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

The former pager tasks have been replaced in 7.9 by the special savepoint tasks.

The former pager tasks have been replaced in 7.9 by the special savepoint tasks. 1 2 3 4 With version 7.7 the I/O interface to the operating system has been reimplemented. As of version 7.7 different parameters than in version 7.6 are used. The improved I/O system has the following

More information

!! How is a thread different from a process? !! Why are threads useful? !! How can POSIX threads be useful?

!! How is a thread different from a process? !! Why are threads useful? !! How can POSIX threads be useful? Chapter 2: Threads: Questions CSCI [4 6]730 Operating Systems Threads!! How is a thread different from a process?!! Why are threads useful?!! How can OSIX threads be useful?!! What are user-level and kernel-level

More information

Multiprocessor Support

Multiprocessor Support CSC 256/456: Operating Systems Multiprocessor Support John Criswell University of Rochester 1 Outline Multiprocessor hardware Types of multi-processor workloads Operating system issues Where to run the

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Silberschatz, Galvin and Gagne 2013! Chapter 3: Process Concept Process Concept" Process Scheduling" Operations on Processes" Inter-Process Communication (IPC)" Communication

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

Chapter 3: Processes. Operating System Concepts 8 th Edition,

Chapter 3: Processes. Operating System Concepts 8 th Edition, Chapter 3: Processes, Silberschatz, Galvin and Gagne 2009 Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin and Gagne 2009

More information

Performance Modeling of a Parallel I/O System: An. Application Driven Approach y. Abstract

Performance Modeling of a Parallel I/O System: An. Application Driven Approach y. Abstract Performance Modeling of a Parallel I/O System: An Application Driven Approach y Evgenia Smirni Christopher L. Elford Daniel A. Reed Andrew A. Chien Abstract The broadening disparity between the performance

More information

QUESTION BANK UNIT I

QUESTION BANK UNIT I QUESTION BANK Subject Name: Operating Systems UNIT I 1) Differentiate between tightly coupled systems and loosely coupled systems. 2) Define OS 3) What are the differences between Batch OS and Multiprogramming?

More information

CS 326: Operating Systems. CPU Scheduling. Lecture 6

CS 326: Operating Systems. CPU Scheduling. Lecture 6 CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating

More information

Chapter 8: Virtual Memory. Operating System Concepts

Chapter 8: Virtual Memory. Operating System Concepts Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

CSL373: Lecture 6 CPU Scheduling

CSL373: Lecture 6 CPU Scheduling CSL373: Lecture 6 CPU Scheduling First come first served (FCFS or FIFO) Simplest scheduling algorithm cpu cpu 0 0 Run jobs in order that they arrive Disadvantage: wait time depends on arrival order. Unfair

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Operating Systems Lecture #9: Concurrent Processes

Operating Systems Lecture #9: Concurrent Processes : Written by based on the lecture series of Dr. Dayou Li and the book Understanding 4th ed. by I.M.Flynn and A.McIver McHoes (2006) Department of Computer Science and Technology,., 2013 15th April 2013

More information

Virtual Memory Outline

Virtual Memory Outline Virtual Memory Outline Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel Memory Other Considerations Operating-System Examples

More information

Last Class: Demand Paged Virtual Memory

Last Class: Demand Paged Virtual Memory Last Class: Demand Paged Virtual Memory Benefits of demand paging: Virtual address space can be larger than physical address space. Processes can run without being fully loaded into memory. Processes start

More information

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics. Question 1. (10 points) (Caches) Suppose we have a memory and a direct-mapped cache with the following characteristics. Memory is byte addressable Memory addresses are 16 bits (i.e., the total memory size

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

minute xed time-out. In other words, the simulations indicate that battery life is extended by more than 17% when the share algorithm is used instead

minute xed time-out. In other words, the simulations indicate that battery life is extended by more than 17% when the share algorithm is used instead A Dynamic Disk Spin-down Technique for Mobile Computing David P. Helmbold, Darrell D. E. Long and Bruce Sherrod y Department of Computer Science University of California, Santa Cruz Abstract We address

More information

An Ecient Scheduling Algorithm for Multiprogramming on Parallel Computing Systems

An Ecient Scheduling Algorithm for Multiprogramming on Parallel Computing Systems An Ecient Scheduling Algorithm for Multiprogramming on Parallel Computing Systems Zhou B. B., Brent R. P. and Qu X. Computer Sciences Laboratory The Australian National University Canberra, ACT 0200, Australia

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: SYSTEM ARCHITECTURE & SOFTWARE [CPU SCHEDULING] Shrideep Pallickara Computer Science Colorado State University OpenMP compiler directives

More information

Chapter 3: Processes

Chapter 3: Processes Chapter 3: Processes Silberschatz, Galvin and Gagne 2013 Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin and Gagne 2013

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

CHAPTER 2: PROCESS MANAGEMENT

CHAPTER 2: PROCESS MANAGEMENT 1 CHAPTER 2: PROCESS MANAGEMENT Slides by: Ms. Shree Jaswal TOPICS TO BE COVERED Process description: Process, Process States, Process Control Block (PCB), Threads, Thread management. Process Scheduling:

More information

Relative Reduced Hops

Relative Reduced Hops GreedyDual-Size: A Cost-Aware WWW Proxy Caching Algorithm Pei Cao Sandy Irani y 1 Introduction As the World Wide Web has grown in popularity in recent years, the percentage of network trac due to HTTP

More information

Resource management. Real-Time Systems. Resource management. Resource management

Resource management. Real-Time Systems. Resource management. Resource management Real-Time Systems Specification Implementation Verification Mutual exclusion is a general problem that exists at several levels in a real-time system. Shared resources internal to the the run-time system:

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Algorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4

Algorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4 Algorithms Implementing Distributed Shared Memory Michael Stumm and Songnian Zhou University of Toronto Toronto, Canada M5S 1A4 Email: stumm@csri.toronto.edu Abstract A critical issue in the design of

More information

Advanced Topic: Efficient Synchronization

Advanced Topic: Efficient Synchronization Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is

More information

1. Background. 2. Demand Paging

1. Background. 2. Demand Paging COSC4740-01 Operating Systems Design, Fall 2001, Byunggu Yu Chapter 10 Virtual Memory 1. Background PROBLEM: The entire process must be loaded into the memory to execute limits the size of a process (it

More information

TABLES AND HASHING. Chapter 13

TABLES AND HASHING. Chapter 13 Data Structures Dr Ahmed Rafat Abas Computer Science Dept, Faculty of Computer and Information, Zagazig University arabas@zu.edu.eg http://www.arsaliem.faculty.zu.edu.eg/ TABLES AND HASHING Chapter 13

More information

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract Implementations of Dijkstra's Algorithm Based on Multi-Level Buckets Andrew V. Goldberg NEC Research Institute 4 Independence Way Princeton, NJ 08540 avg@research.nj.nec.com Craig Silverstein Computer

More information

Lecture 1 Introduction (Chapter 1 of Textbook)

Lecture 1 Introduction (Chapter 1 of Textbook) Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 1 Introduction (Chapter 1 of Textbook) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The slides

More information

Concurrency: Deadlock and Starvation

Concurrency: Deadlock and Starvation Concurrency: Deadlock and Starvation Chapter 6 E&CE 354: Processes 1 Deadlock Deadlock = situation in which every process from a set is permanently blocked, i.e. cannot proceed with execution Common cause:

More information