Heuristic scheduling algorithms to access the critical section in Shared Memory Environment
|
|
- Emma Palmer
- 5 years ago
- Views:
Transcription
1 Heuristic scheduling algorithms to access the critical section in Shared Memory Environment Reda A. Ammar Computer Science Department University of Connecticut Storrs, CT , USA. Ali I. El-Desouky Computer & Control Department Faculty of Engineering Mansoura University, Egypt. Tahany A. Fergany Engineering Mahematics Departmenrt Faculty of Engineering Cairo University, Egypt. Mohamed M. Hefeeda Computer & Control Department Faculty of Engineering Mansoura University, Egypt. Abstract In shared memory parallel processing environment, shared variables facilitate communication among processes. To protect shared variables from concurrent access by more than one process at a time, they placed in a critical section. Scheduling a set of parallel processes to access this critical section with the aim of minimizing the time spent to execute these processes is a crucial problem in parallel processing. This paper presents heuristic scheduling algorithms to access this critical section. 1 Introduction The increasing demands for faster computers have led to the availability of many parallel computers. It is hoped that the impracticable computationallyintensive applications will be practicable by their execution on highly parallel computers. A number of factors prevent the growth of parallel computing. First, the substantial investment in sequential programming tools that aid in program testing, execution profiling, and interactive debugging. Second, the lack of a single, predominant, parallel architecture. Third, the difficulty of developing efficient programs for parallel computers. This paper addresses one of the obstacles that hinders producing efficient parallel programs; that is accessing the shared variables. In parallel programs, parallelism is gained through process creation. One of the most common mechanisms proposed for creation of processes is FORK/JOIN mechanism [1, 9], where the FORK statement spawns several processes and JOIN statement is used to synchronize the termination of processes. The portion of program between the FORK and JOIN is called the parallel structure. The semantics of the parallel structure require that exactly those processes created by FORK operation terminate at the associated JOIN operation and no operations after JOIN can start until all processes created by FORK are completed. The cooperation of n processes to solve a problem is useful only if the partial results are efficiently exchanged between processes. Shared variables facilitate communication among the processes. But they must be protected from nondeterminism, which can result from concurrent access by more than one process at a time. In order to protect the shared variables from nondeterminism, the code that hnadles these variables is placed in a critical section [1, 6, 9]. The critcal section is a section of code which can be executed by only one process at a time and which, once started, will be able to finish without interruption. Unfortunately, accessing the critical section by different processes will create a serial bottleneck that can seriously impair the performance of the software. Since shared memory multiprocessors are becoming more important in commercial environment, it becomes necessary to schedule shared memory access in the most efficient way. The scheduling problem [2-5, 7, 8, 10, 11] is complicated by the fact that each branch of parallel structure resulted from the FORK operation includes the time to process the portion of the code before accessing the shared variables, the time to access to the shared variables, and the time to process the portion of the code after using the shared variables, which may all be different. In order to make optimization possible, it is necessary to have an approach to quantify the time costs of parallel computations. After that, the time cost of processes which require access to the critical section can be minimized by using a suitable scheduling methods. The computation structure model [9] is used to represent the detailed time cost of a parallel structure. It is assumed in this model that the underlying computer system has a finite number of processors with the same speed and they communicate with each other through a shared 1
2 memory. In the computation structure model, the lock nodes are used to obtain locks on shared data and unlock nodes are used for releasing these locks. These locks facilitate protection of the shared variables. PLJ LASV RJ B1 Lock1 S1 Unlock1 A1 In Fig. 1 we have a parallel structure with n branches that all are in conflict, i.e. they need to access the critical section simultaneously. In this parallel structure we can classify the operations into the following three categories: 1. The operation before accessing the critical section is defined as Pre-Lock Job, PLJ. 2. The operation of accessing the critical section which contains three sub-operations. These suboperations are: the lock operation to prevent other branches to access the critical section; the access of the shared variables operation; and the unlock operation to free the critical section for the other branches. So that, this combined operation is defined as Lock and Access Shared Variables, LASV. 3. The operation after accessing the critical section which is defined as Remaining Job, RJ. Algorithms were developed to schedule the access of the critical section [1, 6, 9 ]. Branch and Bound algorithm [7, 8] was used to find the optimal order in which the conflicted processes access the critical section. Branch and bound algorithm produces the optimal solution but it may take a long time to find it especially for large number of processes, greater than 8. So that, other heuristic algorithms were suggested [2, 4 ] which can produce optimal or near optimal solutions in short time. Those algorithms are called comparison and adjustment algorithms. This paper Fork Locki Unlocki Fig. 1 Parallel structure model Bi Si Ai Join Join Bn Lockn Sn Unlockn An first evaluates these algorithms by simulation programs and compares between them. Second, it presents a new algorithm which gives a better results. 2 Previous Research Efforts Previously [2, 4, 7, 8], Algorithms were developed for accessing the critical section based on the time cost of the operations before the lock nodes, the time cost of the operations between the lock and unlock nodes, and the time cost of the operations after the unlock nodes. In the parallel structure in Fig. 1, assume that every two lock nodes are in conflict, and let: Time cost of the Pre-lock Job = PLJ i Time cost of the Lock and Access Shared Variables = LASV i Time cost of the Remaining Job = RJ i In order to schedule the operations between FORK/JOIN nodes (That are, the PLJs, the LASVs, and RJs) we considered eight possible cases which may arise in the parallel structure. These cases are listed in Table 1 along with their scheduling algorithms. In table 1, The = indicates that all jobs have the same time cost; and <> indicates that at least one job has a time cost different from the others. Algorithms for cases (I, II, III, IV, V, and VII) were mathematically proved to give the optimal solutions [7 ]. For cases VI, and VIII the Branch and Bound algorithm was developed which yields the minimum time cost for the parallel structure [7]. Although the Branch and Bound approach is widely acceptable technique [7], it is computationally expensive, especially when the problem size grows. So that heuristic algorithms were introduced which can produce optimal or near optimal solutions. Case PLJ LASV RJ Scheduling Algorithm I = = = FCFS or LRJF II = = <> LRJF III = <> = FCFS or LRJF IV = <> <> LRJF V <> = = FCFS VI <> = <> Branch and Bound VII <> <> = FCFS VIII <> <> <> Branch and Bound FCFS: First Come First Served, LRJF: Longest Remaining Job First Table 1 Scheduling Methods 2.1 Algorithm A heuristic algorithm, i.e. not mathematically proved, that finds optimal solutions in some cases and near optimal solutions in the others. It is simple compared to Branch and Bound algorithm therefore it takes less time. For the parallel structure in Fig. 1 2
3 with n conflicted branches, the comparison algorithm is applied as follows: 1. Use the Longest Remaining Job First, LRJF, scheduling policy to order the branches of the given parallel structure. 2. If for every i = 2, 3,..., n, PLJ i-1 < PLJ i, then the branches follows First Come First Served, FCFS, policy at the same time. No additional movements will be considered and the resulting order provides an optimal (or near optimal ) solution. 3. If for an i = 2, 3,..., n, PLJ i-1 > PLJ i, and PLJ i-1 - PLJ i < RJ i-1 - RJ i, we reverse the order of the branch i-1 with branch i. 4. Repeat step 3 until no more movements. 2.2 Adjustment Algorithm The comparison algorithm is easy to apply but we need to add another round of adjustments to produce the optimal solution. the adjustment process is based upon the following two phases of movements: 1. Look for a branch that follows the current maximum branch and whose communication cost is smaller than the communication cost of a branch that precedes the current maximum branch. Swapping of these two branches may reduce the execution time of the parallel structure. 2. Move the maximum branch, the branch whose execution time is the longest, to the front of the waiting queue. In this way it can access the critical section earlier and hence its execution time reduces. This adjustment process is an iterative process and will continue until no more improvements is possible. The comparison algorithm is used to derive the initial solution for the adjustment algorithm. 3 The New Adjustment Algorithm The adjustment algorithm produces optimal solutions in many cases and near optimal solutions in the others. Yet, we can add another round of enhancement, phase 3, which enhances the original adjustment algorithm and produces better results. Phase 3 states that: Moving the longest waiting branch, the branch that finishes its PLJ operation and waits the longest time to access the critical section, to the front of the waiting queue of the critical section may reduce the overall execution time. Thus, the new algorithm consists of the following three steps: 1. Look for a branch that follows the current maximum branch and whose communication cost is smaller than the communication cost of Apply phase 2 a branch that precedes the current maximum branch. Swapping of these two branches may After swaping branch 4 with branch 2. reduce the execution time of the parallel structure. Move the maximum branch, the branch whose execution time is the longest, to the front of the waiting queue. In this way it can access the critical section earlier and hence its execution time reduces. Move the longest waiting branch, the branch that finishes its PLJ operation and waits the longest time to access the critical section, to the front of the waiting queue of the critical section. Thus, it can access the critical section earlier and reduces its execution time and the overall execution time. Simulation results, see section 4, showed that applying the new algorithm with the order: phase 1, phase 2, and finally phase 3 gave better results than the original algorithm. Moreover, when we changed the order of the phases to be phase 2, phase 1, and finally phase 3 the algorithm gave much better results. But another combinations of the three phases gave worse results than the original algorithm. We tried the following combinations: (phase 2, phase 3, phase 1); (phase 2, phase 3, phase 1, phase 3); (phase 1, phase 3, phase 2, phase 3); (phase 1, phase 2, phase 3, phase 2, phase 3) and all of them gave worse results. 3.1 Example: This example describes the application of the new algorithm on a parallel structure consists of five branches each branch has three time costs, PLJ, LASV, and RJ. The comparison algorithm is used to derive the initial solution. The following figure shows the application of the new algorithm. Initial solution Apply phase PLJ LASV RJ Total Time Cost of each branch Max. branch After swaping branch 4 with branch 3. 3
4 Apply phase Waiting time of each branch. Longest waiting branch After swaping branch 4 with branch 1. Note that in the above figure, useless steps are omitted. The new adjustment algorithm can be written in steps as follows: 1. Find the branch k of the parallel structure, after applying the comparison algorithm, whose path has the longest execution time. 2. If k = 1, then the current parallel structure has the minimum possible execution time. 3. If the execution time of the parallel structure equals to the sum of execution times of PLJ k, LASV k, and RJ k, then the scheduling order we have is optimal and no additional improvement is possible. 4. Apply Phase 2 as follows: One)Initialize a displacement variable i to be 1. Two)Swap branch k with branch k-i. Evaluate the new execution times. Three)If the new order has larger overall execution time then keep the previous order, increment i, and go to step 4.b. Four)Evaluate the longest path of the parallel structure with the new order. If there is more than one branch has the same maximum value we use the back most one. Assume that the new maximum branch is j. Five)If j = 1 or the execution time of the current parallel structure equals to the sum of execution times of PLJ j, LASV j, and RJ j, then the scheduling order we have is optimal and no additional improvement is possible. Otherwise, apply phase 2 again until no more improvement is achieved. 5. Apply Phase 1 as follows: a) Set two pointers i (the front index) and j (the back index). The front index changes from 1 to k-1 and the back index changes from k+1 to n. For every value of j change i from 1 to k-1. If LASV i > LASV j then swap the two branches, evaluate the execution times of different branches, and test to see if the new order is better than the previous one. b) If the new order has larger overall execution time then keep the previous order, and try another swapping. c) Evaluate the longest path of the parallel structure with the new order. Assume that the new maximum branch is branch k. d) If k = 1 or the execution time of the current parallel structure equals to the sum of execution times of PLJ j, LASV j, and RJ j, then the scheduling order we have is optimal and no additional improvement is possible. Otherwise, apply phase 1 again until no more improvement is achieved. 6. Apply Phase 3 as follows: a) Find the branch w that has the maximum waiting time. The waiting time of a branch x is evaluated by subtracting the time cost of PLJ x of that branch from the time needed for the previous branch x-1 to finish the critical section. b) Initialize a displacement variable i to be 1. c) Swap the branch w with branch w-i. Evaluate the new execution time. d) If the new order has larger overall execution time then retrieve the previous order, increment i, and go to step 6.c. e) Evaluate the branch with maximum waiting time of the new parallel structure. Assume that the new branch is branch j. f) If j = 1 or the execution time of the current parallel structure equals to the sum of execution times of PLJ j, LASV j, and RJ j, then the scheduling order we have is optimal and no additional improvement is possible. Otherwise, apply phase 3 again until no more improvement is achieved. 4 Simulation Results This section, firstly, shows the effect of scheduling the critical section on the execution of parallel programs. Secondly, evaluates the scheduling algorithms and compares between them. 4.1 Effect of Scheduling To show the benefits of scheduling the access to the critical section, we developed a C++ simulation program. The program generates different number of branches, from 3 t0 8. For each branch, the program generates 500 sets of random values for PLJ, LASV, and RJ. Then, for each set it evaluates the execution time. Also, it finds the optimal order for the branches to access the critical section, this is done by trying all possisble permutations which equal the factorial of the number of branches. Then, it evaluates the optimal execution time. Eventually, it aggregates and averages the execution time and the 4
5 optimal execution time over the 500 sets. The following pseudo-code describes the structure of the main body of the program. for( branches=3;branches<=8; branches++) total_exec_t = 0; total_opt_t = 0; for(sets=1; sets<=500; sets++) gen_rand(); /* generates random values for PLJ, LAV, and RJ */ exec_t=exec_time(); /*evaluate execution time*/ total_exec_t += exec_t; opt_exec_t=find_opt(); /* evaluate optimal execution time*/ total_opt_t += opt_exec_t; average_exec_t = total _exec_t/500; average_opt_t = total_opt_t /500; diff = average_exec_t - average_opt_t; The results produced by the program shows the importance of scheduling the accessing to the critical section. Table 2 and Fig. 2 emphasize this fact. Branches Average execution time Time diff. % No Sched. Opt. Sched Table 2 Benefits of scheduling Average execution time randomly. It starts with LASV range which is double the range of the PLJ and RJ until LASV range reaches only 1% of PLJ and RJ ranges; the last cases is likely to appear in practice. For each range, it generates different number of branches, from 3 to 8. For each branch it generates 500 sets of random values for PLJ, LASV, and RJ. Then, for each set it orders the branches according to the scheduling algorithm,, Adjustment, or New Adjustment, and evaluates the execution time. Then, it finds the optimal execution time by exhaustive search, i.e. trying all possible permutations which equal the factorial of no. of the branches, to compare with. If the optimal time is not equal to the time resulted after applying the algorithm, the program counts this case as a not-optimal one and evaluates the time difference between the time of optimal and not optimal cases. Then, it aggregates the time differences resulted from the not optimal cases out of the overall 500 cases. After that, the program evaluates the percentage of the total time difference to the total optimal time. The following pseudo-code describes the structure of the main body of the program. for ( different ranges ) for(branches=3;branches<=8; branches++) total_exec_t = 0; total_opt_t = 0; not_opt = 0; for(sets=1; sets<=500; sets++) gen_rand(); /* generates random values for PLJ, LAV, and RJ within the current range */ sched_algorithm(); /*order branches according to the algorithm */ exec_t=exec_time(); /*evaluate execution time*/ total_exec_t += exec_t; opt_exec_t=find_opt(); /* evaluate optimal execution time*/ if(exec_t>opt_exec_t) not_opt++; total_opt_t += opt_exec_t; without scheduling with optimal scheduling Fig. 2 Average execution time without scheduling and with optimal scheduling t_diff = total_exec_t - total_opt_t; t_diff_percent = (t_diff/total_opt_t)*100; print(branches,not_opt,t_diff_percent); 4.2 Algorithms Evaluation To assess the scheduling algorithms, comparison, adjustment, and new adjustment, we developed a C++ simulation program. The program chooses different ranges, for accurately evaluating the algorithms, from which the values of PLJ, LASV, and RJ are selected The results are shown below, in the tables and the figures, with following notations: Not. opt.: is the number of not optimal cases, out of 500, resulted after applying the algorithm. Time diff.%: is the percentage of the total time difference to the total optimal time. 5
6 No. of Adjustment 1-2 Adjustment branches Not. opt. Time diff. % Not. opt. Time diff. % Not. opt. Time diff. % Table 3 Results for time costs ranges: 0 PLJ 100, 0 LASV 200, 0 RJ No. of not optimal cases Time diff. percentage No of branches (a) (b) Fig. 3 Comparing among the scheduling algorithms, for time costs ranges: 0 PLJ 100, 0 LASV 200, 0 RJ 100, w.r.t. a) no. of not optimal cases and b) time difference percentages. No. of Adjustment Adjustment branches Not. opt. Time diff. % Not. opt. Time diff. Not. opt. Time diff. % % Table 4 Results for time costs ranges: 0 PLJ 200, 0 LASV 200, 0 RJ 200 6
7 No. of not optimal cases Time diff. percentage (a) (b) Fig. 4 Comparing among the scheduling algorithms, for time costs ranges: 0 PLJ 200, 0 LASV 200, 0 RJ 200, w.r.t. a) no. of not optimal cases and b) time difference percentages. No. of Adjustment 1-2 Adjustment branches Not. opt. Time diff. % Not. opt. Time diff. % Not. opt. Time diff. % Table 5 Results for time costs ranges: 0 PLJ 1000, 0 LASV 200, 0 RJ No. of not optimal cases Time diff. percentage (a) (b) Fig. 5 Comparing among the scheduling algorithms, for time costs ranges: 0 PLJ 1000, 0 LASV 200, 0 RJ 1000, w.r.t. a) no. of not optimal cases and b) time difference percentages. 7
8 No. of Adjustment 1-2 Adjustment branches Not. opt. Time diff. % Not. opt. Time diff. % Not. opt. Time diff. % Table 6 Results for time costs ranges: 0 PLJ 2000, 0 LASV 50, 0 RJ Conclusion This paper summarizes the previous work in the area of scheduling algorithms, the algorithm and the Adjustment algorithm [2,4], to order processes that are competing to access shared variables. We added a new adjustment phase to the adjustment algorithm and found the best order of applying the three developed phases. Simulation results show the merits of the comparison algorithm and the adjustment algorithm. It also shows that the new approach adds more improvements to them. Although this algorithm does not give the optimal solution in some cases, the error level is very minor. These results suggest including the developed algorithms in designing parallel compilers. 6 References [1] Abraham Silberchatz, and Peter B. Galvin, Operating system concepts, Fourth edition, Addison-Wesley Inc., [2] R.A. Ammar, T.A. Fergany, and E.A. Maksoud, A fast algorithm to find the optimal accessing order of a critical section by parallel processes within a Fork-Join structure, [3] T. Casavant and J. Kuhl, A taxonomy of scheduling in general purpose distributed computing systems, IEEE Trans. Software Engineering, SE-14, 2, February [4] Ehab Yehia A. Maksoud, Optimal scheduling methods for competing processes within a parallel structure, Ph. D. dissertation, Faculty of Engineering, Cairo Univ., [5] H. El-Rewini and T. G. Lewis, Scheduling Parallel Program Tasks onto Arbitrary Target Machines, J. Par & Distr. Computing 9 (1990), pp [6] Kai Hwang and Faye A. Briggs, Computer Architecture and parallel processing, McGraw Hill, [7] Mohamad R. Neilforoshan-Daradshti, On time cost optimization of parallel structure within shared memory environment, P.h. D. dissertation, Computer Science & Engineering dept., Univ. of Connecticut, [8] Mohamad R. Neilforoshan-Daradshti, R. Ammar, and T.A. Fergany, Optimizing the time cost of parallel structure by scheduling parallel processes to access the critical section, Proceedings of the fourth international conference on computing and information (ICCI 92), Toronto, Canada, May 28-30, 1992, pp [9] B. Qin, H.A. Sholl, and R.A. Ammar, Micro time cost analysis of parallel computations, IEEE Trans. on Computers, vol. 40, No. 5, May 1991, pp [10] V. Sarkar and J. Henesy, Compile time partitioning and scheduling of parallel programs, In Proc. of Symp Compiler Construction, 1986, pp [11] P.L. Shaffer, Minimization of inter-processor synchronization in multiprocessor with shared and private memory, P proceedings of international conference on parallel processing, St. Charles, IL., August 8-12, 1989, vol. III, pp
Developing Scheduling Algorithms to Access the Critical Section in Shared-Memory Parallel Computers
Mansoura University Faculty of Engineering Computers & Automatic Control Dept. Developing Scheduling Algorithms to Access the Critical Section in Shared-Memory Parallel Computers A Thesis Submitted in
More informationSearch Algorithms for Discrete Optimization Problems
Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. 1 Topic
More informationChapter 11 Search Algorithms for Discrete Optimization Problems
Chapter Search Algorithms for Discrete Optimization Problems (Selected slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003.
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationUsing Genetic Programming to Evolve a General Purpose Sorting Network for Comparable Data Sets
Using Genetic Programming to Evolve a General Purpose Sorting Network for Comparable Data Sets Peter B. Lubell-Doughtie Stanford Symbolic Systems Program Stanford University P.O. Box 16044 Stanford, California
More informationImplementation of Process Networks in Java
Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a
More informationTASK ALLOCATION IN A MULTIPROCESSOR SYSTEM USING FUZZY LOGIC
Jurnal Teknologi, bil. 25, Disember 1996 him. 69-79 @Universiti Teknologi Malaysia TASK ALLOCATION IN A MULTIPROCESSOR SYSTEM USING FUZZY LOGIC SHAHARUDDIN SALLEH Department of Mathematics BAHROM SANUGI
More informationLecture 9: Load Balancing & Resource Allocation
Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently
More informationCS5314 RESEARCH PAPER ON PROGRAMMING LANGUAGES
ORCA LANGUAGE ABSTRACT Microprocessor based shared-memory multiprocessors are becoming widely available and promise to provide cost-effective high performance computing. Small-scale sharedmemory multiprocessors
More informationScheduling Real Time Parallel Structure on Cluster Computing with Possible Processor failures
Scheduling Real Time Parallel Structure on Cluster Computing with Possible Processor failures Alaa Amin and Reda Ammar Computer Science and Eng. Dept. University of Connecticut Ayman El Dessouly Electronics
More informationParallel prefix scan
Parallel prefix scan Now we consider a slightly different problem: Given an array , we want to compute the sum of every possible prefix of the array:
More informationTheory of 3-4 Heap. Examining Committee Prof. Tadao Takaoka Supervisor
Theory of 3-4 Heap A thesis submitted in partial fulfilment of the requirements for the Degree of Master of Science in the University of Canterbury by Tobias Bethlehem Examining Committee Prof. Tadao Takaoka
More informationRuled Based Approach for Scheduling Flow-shop and Job-shop Problems
Ruled Based Approach for Scheduling Flow-shop and Job-shop Problems Mohammad Komaki, Shaya Sheikh, Behnam Malakooti Case Western Reserve University Systems Engineering Email: komakighorban@gmail.com Abstract
More informationJob Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search
A JOB-SHOP SCHEDULING PROBLEM (JSSP) USING GENETIC ALGORITHM (GA) Mahanim Omar, Adam Baharum, Yahya Abu Hasan School of Mathematical Sciences, Universiti Sains Malaysia 11800 Penang, Malaysia Tel: (+)
More informationMulti-Processor / Parallel Processing
Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms
More informationThe p-sized partitioning algorithm for fast computation of factorials of numbers
J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006
More informationProvably Efficient Non-Preemptive Task Scheduling with Cilk
Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore 639798. Abstract We consider the
More informationDistributed minimum spanning tree problem
Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with
More informationStacks, Queues (cont d)
Stacks, Queues (cont d) CSE 2011 Winter 2007 February 1, 2007 1 The Adapter Pattern Using methods of one class to implement methods of another class Example: using List to implement Stack and Queue 2 1
More informationPARALLEL ALGORITHM FOR SECOND ORDER RESTRICTED WEAK INTEGER COMPOSITION GENERATION FOR SHARED MEMORY MACHINES
Electronic version of an article published as Parallel Processing Letters, Volume 23, Issue 3, 2013, Pages 1350010 DOI: 10.1142/S0129626413500102 c World Scientific Publishing Company, http://www.worldscientific.com/toc/ppl/23/03
More informationChapter 4: Multithreaded Programming
Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading
More informationLOW-DENSITY PARITY-CHECK (LDPC) codes [1] can
208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationSpeeding Up Evaluation of Powers and Monomials
Speeding Up Evaluation of Powers and Monomials (Extended Abstract) Hatem M. Bahig and Hazem M. Bahig Computer Science Division, Department of Mathematics, Faculty of Science, Ain Shams University, Cairo,
More informationLoopback: Exploiting Collaborative Caches for Large-Scale Streaming
Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Ewa Kusmierek Yingfei Dong David Du Poznan Supercomputing and Dept. of Electrical Engineering Dept. of Computer Science Networking Center
More informationQuantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study
Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,
More informationMidterm Exam Solutions Amy Murphy 28 February 2001
University of Rochester Midterm Exam Solutions Amy Murphy 8 February 00 Computer Systems (CSC/56) Read before beginning: Please write clearly. Illegible answers cannot be graded. Be sure to identify all
More informationHigh-level Variable Selection for Partial-Scan Implementation
High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose
More informationHybrid Constraint Programming and Metaheuristic methods for Large Scale Optimization Problems
Hybrid Constraint Programming and Metaheuristic methods for Large Scale Optimization Problems Fabio Parisini Tutor: Paola Mello Co-tutor: Michela Milano Final seminars of the XXIII cycle of the doctorate
More informationProcess Management And Synchronization
Process Management And Synchronization In a single processor multiprogramming system the processor switches between the various jobs until to finish the execution of all jobs. These jobs will share the
More informationSimulation of Petri Nets in Rule-Based Expert System Shell McESE
Abstract Simulation of Petri Nets in Rule-Based Expert System Shell McESE F. Franek and I. Bruha Dept of Computer Science and Systems, McMaster University Hamilton, Ont., Canada, L8S4K1 Email: {franya
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationPACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS
PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PAUL BALISTER Abstract It has been shown [Balister, 2001] that if n is odd and m 1,, m t are integers with m i 3 and t i=1 m i = E(K n) then K n can be decomposed
More informationCourse Syllabus. Operating Systems
Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling
More informationHeuristic Algorithms for Multiconstrained Quality-of-Service Routing
244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service
More informationEFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS
EFFICIENT ATTACKS ON HOMOPHONIC SUBSTITUTION CIPHERS A Project Report Presented to The faculty of the Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements
More informationWhy testing and analysis. Software Testing. A framework for software testing. Outline. Software Qualities. Dependability Properties
Why testing and analysis Software Testing Adapted from FSE 98 Tutorial by Michal Young and Mauro Pezze Software is never correct no matter what developing testing technique is used All software must be
More informationA Novel Task Scheduling Algorithm for Heterogeneous Computing
A Novel Task Scheduling Algorithm for Heterogeneous Computing Vinay Kumar C. P.Katti P. C. Saxena SC&SS SC&SS SC&SS Jawaharlal Nehru University Jawaharlal Nehru University Jawaharlal Nehru University New
More informationMidterm Exam Amy Murphy 19 March 2003
University of Rochester Midterm Exam Amy Murphy 19 March 2003 Computer Systems (CSC2/456) Read before beginning: Please write clearly. Illegible answers cannot be graded. Be sure to identify all of your
More informationA Survey of Concurrency Control Algorithms in the Operating Systems
A Survey of Concurrency Control Algorithms in the Operating Systems Hossein Maghsoudloo Department of Computer Engineering, Shahr-e- Qods Branch, Islamic Azad University, Tehran, Iran Rahil Hosseini Department
More informationTHE LOGICAL STRUCTURE OF THE RC 4000 COMPUTER
THE LOGICAL STRUCTURE OF THE RC 4000 COMPUTER PER BRINCH HANSEN (1967) This paper describes the logical structure of the RC 4000, a 24-bit, binary computer designed for multiprogramming operation. The
More informationChapter 4: Threads. Operating System Concepts 9 th Edition
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationINTRODUCTION TO ALGORITHMS
UNIT- Introduction: Algorithm: The word algorithm came from the name of a Persian mathematician Abu Jafar Mohammed Ibn Musa Al Khowarizmi (ninth century) An algorithm is simply s set of rules used to perform
More informationNew algorithm for analyzing performance of neighborhood strategies in solving job shop scheduling problems
Journal of Scientific & Industrial Research ESWARAMURTHY: NEW ALGORITHM FOR ANALYZING PERFORMANCE OF NEIGHBORHOOD STRATEGIES 579 Vol. 67, August 2008, pp. 579-588 New algorithm for analyzing performance
More informationBuilding Efficient Concurrent Graph Object through Composition of List-based Set
Building Efficient Concurrent Graph Object through Composition of List-based Set Sathya Peri Muktikanta Sa Nandini Singhal Department of Computer Science & Engineering Indian Institute of Technology Hyderabad
More informationOptimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology
Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:
More informationLife, Death, and the Critical Transition: Finding Liveness Bugs in System Code
Life, Death, and the Critical Transition: Finding Liveness Bugs in System Code Charles Killian, James W. Anderson, Ranjit Jhala, and Amin Vahdat Presented by Nick Sumner 25 March 2008 Background We already
More informationA Partial Correctness Proof for Programs with Decided Specifications
Applied Mathematics & Information Sciences 1(2)(2007), 195-202 An International Journal c 2007 Dixie W Publishing Corporation, U. S. A. A Partial Correctness Proof for Programs with Decided Specifications
More informationChapter 4: Threads. Chapter 4: Threads
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationLecture Notes: Euclidean Traveling Salesman Problem
IOE 691: Approximation Algorithms Date: 2/6/2017, 2/8/2017 ecture Notes: Euclidean Traveling Salesman Problem Instructor: Viswanath Nagarajan Scribe: Miao Yu 1 Introduction In the Euclidean Traveling Salesman
More informationChapter 4: Threads. Operating System Concepts 9 th Edition
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationOperating Systems Unit 3
Unit 3 CPU Scheduling Algorithms Structure 3.1 Introduction Objectives 3.2 Basic Concepts of Scheduling. CPU-I/O Burst Cycle. CPU Scheduler. Preemptive/non preemptive scheduling. Dispatcher Scheduling
More informationChapter 13. The ISA of a simplified DLX Why use abstractions?
Chapter 13 The ISA of a simplified DLX In this chapter we describe a specification of a simple microprocessor called the simplified DLX. The specification is called an instruction set architecture (ISA).
More informationREDUCTION IN RUN TIME USING TRAP ANALYSIS
REDUCTION IN RUN TIME USING TRAP ANALYSIS 1 Prof. K.V.N.Sunitha 2 Dr V. Vijay Kumar 1 Professor & Head, CSE Dept, G.Narayanamma Inst.of Tech. & Science, Shaikpet, Hyderabad, India. 2 Dr V. Vijay Kumar
More informationOperating Systems. Lecture 09: Input/Output Management. Elvis C. Foster
Operating Systems 141 Lecture 09: Input/Output Management Despite all the considerations that have discussed so far, the work of an operating system can be summarized in two main activities input/output
More informationAdvanced Topics UNIT 2 PERFORMANCE EVALUATIONS
Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors
More informationPetri Nets ~------~ R-ES-O---N-A-N-C-E-I--se-p-te-m--be-r Applications.
Petri Nets 2. Applications Y Narahari Y Narahari is currently an Associate Professor of Computer Science and Automation at the Indian Institute of Science, Bangalore. His research interests are broadly
More informationA Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.
A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic
More informationConcurrency Problems in Databases
Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Concurrency Problems in Databases Preeti Sharma Parul Tiwari Abstract
More informationINDIAN INSTITUTE OF TECHNOLOGY GUWAHATI
INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI COMPUTER SCIENCE AND ENGINEERING Course: CS341 (Operating System), Model Solution Mid Semester Exam 1. [System Architecture, Structure, Service and Design: 5 Lectures
More informationGeneral Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to:
F2007/Unit5/1 UNIT 5 OBJECTIVES General Objectives: To understand the process management in operating system Specific Objectives: At the end of the unit you should be able to: define program, process and
More informationMarco Danelutto. May 2011, Pisa
Marco Danelutto Dept. of Computer Science, University of Pisa, Italy May 2011, Pisa Contents 1 2 3 4 5 6 7 Parallel computing The problem Solve a problem using n w processing resources Obtaining a (close
More informationLecture Notes for Chapter 2: Getting Started
Instant download and all chapters Instructor's Manual Introduction To Algorithms 2nd Edition Thomas H. Cormen, Clara Lee, Erica Lin https://testbankdata.com/download/instructors-manual-introduction-algorithms-2ndedition-thomas-h-cormen-clara-lee-erica-lin/
More informationScheduling with Bus Access Optimization for Distributed Embedded Systems
472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex
More informationAC59/AT59/AC110/AT110 OPERATING SYSTEMS & SYSTEMS SOFTWARE DEC 2015
Q.2 a. Explain the following systems: (9) i. Batch processing systems ii. Time sharing systems iii. Real-time operating systems b. Draw the process state diagram. (3) c. What resources are used when a
More informationElementary maths for GMT. Algorithm analysis Part I
Elementary maths for GMT Algorithm analysis Part I Algorithms An algorithm is a step-by-step procedure for solving a problem in a finite amount of time Most algorithms transform input objects into output
More informationComprehensive Review of Data Prefetching Mechanisms
86 Sneha Chhabra, Raman Maini Comprehensive Review of Data Prefetching Mechanisms 1 Sneha Chhabra, 2 Raman Maini 1 University College of Engineering, Punjabi University, Patiala 2 Associate Professor,
More informationList Sort. A New Approach for Sorting List to Reduce Execution Time
List Sort A New Approach for Sorting List to Reduce Execution Time Adarsh Kumar Verma (Student) Department of Computer Science and Engineering Galgotias College of Engineering and Technology Greater Noida,
More information(Refer Slide Time: 1:27)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data
More informationData Flow Graph Partitioning Schemes
Data Flow Graph Partitioning Schemes Avanti Nadgir and Harshal Haridas Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802 Abstract: The
More informationSHARED MEMORY VS DISTRIBUTED MEMORY
OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors
More informationDecoupled Software Pipelining in LLVM
Decoupled Software Pipelining in LLVM 15-745 Final Project Fuyao Zhao, Mark Hahnenberg fuyaoz@cs.cmu.edu, mhahnenb@andrew.cmu.edu 1 Introduction 1.1 Problem Decoupled software pipelining [5] presents an
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More informationMulti-Way Number Partitioning
Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,
More informationA CSP Search Algorithm with Reduced Branching Factor
A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il
More informationNetwork Routing Protocol using Genetic Algorithms
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:0 No:02 40 Network Routing Protocol using Genetic Algorithms Gihan Nagib and Wahied G. Ali Abstract This paper aims to develop a
More informationDigital System Design Using Verilog. - Processing Unit Design
Digital System Design Using Verilog - Processing Unit Design 1.1 CPU BASICS A typical CPU has three major components: (1) Register set, (2) Arithmetic logic unit (ALU), and (3) Control unit (CU) The register
More informationIntroduction to Algorithms
Introduction to Algorithms An algorithm is any well-defined computational procedure that takes some value or set of values as input, and produces some value or set of values as output. 1 Why study algorithms?
More informationinfix expressions (review)
Outline infix, prefix, and postfix expressions queues queue interface queue applications queue implementation: array queue queue implementation: linked queue application of queues and stacks: data structure
More informationProcess Synchronization
CSC 4103 - Operating Systems Spring 2007 Lecture - VI Process Synchronization Tevfik Koşar Louisiana State University February 6 th, 2007 1 Roadmap Process Synchronization The Critical-Section Problem
More informationSearch Algorithms for Discrete Optimization Problems
Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic
More informationTrees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.
Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial
More informationCHAPTER 3 DEVELOPMENT OF HEURISTICS AND ALGORITHMS
CHAPTER 3 DEVELOPMENT OF HEURISTICS AND ALGORITHMS 3.1 INTRODUCTION In this chapter, two new algorithms will be developed and presented, one using the Pascal s triangle method, and the other one to find
More informationLecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory
Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section
More informationThe Cheapest Way to Obtain Solution by Graph-Search Algorithms
Acta Polytechnica Hungarica Vol. 14, No. 6, 2017 The Cheapest Way to Obtain Solution by Graph-Search Algorithms Benedek Nagy Eastern Mediterranean University, Faculty of Arts and Sciences, Department Mathematics,
More informationMultithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa
CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is
More informationIntelligent, real-time scheduling for FMS
Intelligent, real-time scheduling for FMS V. Simeunovi}, S. Vrane{ Computer System Department, Institute Mihajlo Pupin, Volgina 15, 11060 Belgrade, Yugoslavia Email: vlada@lab200.imp.bg.ac.yu, sanja@lab200.imp.bg.ac.yu
More informationChapter 4: Multi-Threaded Programming
Chapter 4: Multi-Threaded Programming Chapter 4: Threads 4.1 Overview 4.2 Multicore Programming 4.3 Multithreading Models 4.4 Thread Libraries Pthreads Win32 Threads Java Threads 4.5 Implicit Threading
More informationOPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems, 3rd edition. Uses content with permission from Assoc. Prof. Florin Fortis, PhD
OPERATING SYSTEMS #5 After A.S.Tanenbaum, Modern Operating Systems, 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD General information GENERAL INFORMATION Cooperating processes
More informationStudy of Load Balancing Schemes over a Video on Demand System
Study of Load Balancing Schemes over a Video on Demand System Priyank Singhal Ashish Chhabria Nupur Bansal Nataasha Raul Research Scholar, Computer Department Abstract: Load balancing algorithms on Video
More informationProcess size is independent of the main memory present in the system.
Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationMassively Parallel Approximation Algorithms for the Traveling Salesman Problem
Massively Parallel Approximation Algorithms for the Traveling Salesman Problem Vaibhav Gandhi May 14, 2015 Abstract This paper introduces the reader to massively parallel approximation algorithms which
More informationIndexable and Strongly Indexable Graphs
Proceedings of the Pakistan Academy of Sciences 49 (2): 139-144 (2012) Copyright Pakistan Academy of Sciences ISSN: 0377-2969 Pakistan Academy of Sciences Original Article Indexable and Strongly Indexable
More informationProceedings of the 5th WSEAS International Conference on Telecommunications and Informatics, Istanbul, Turkey, May 27-29, 2006 (pp )
A Rapid Algorithm for Topology Construction from a Set of Line Segments SEBASTIAN KRIVOGRAD, MLADEN TRLEP, BORUT ŽALIK Faculty of Electrical Engineering and Computer Science University of Maribor Smetanova
More informationFrom Task Graphs to Petri Nets
From Task Graphs to Petri Nets Anthony Spiteri Staines Department of Computer Inf. Systems, Faculty of ICT, University of Malta Abstract This paper describes the similarities between task graphs and Petri
More informationParallel Auction Algorithm for Linear Assignment Problem
Parallel Auction Algorithm for Linear Assignment Problem Xin Jin 1 Introduction The (linear) assignment problem is one of classic combinatorial optimization problems, first appearing in the studies on
More informationIMPROVED A* ALGORITHM FOR QUERY OPTIMIZATION
IMPROVED A* ALGORITHM FOR QUERY OPTIMIZATION Amit Goyal Ashish Thakral G.K. Sharma Indian Institute of Information Technology and Management, Gwalior. Morena Link Road, Gwalior, India. E-mail: amitgoyal@iiitm.ac.in
More informationOperating Systems 2 nd semester 2016/2017. Chapter 4: Threads
Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition
More informationARITHMETIC operations based on residue number systems
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,
More information