A,B,C,D 1 E,F,G H,I,J 3 B 5 6 K,L 4 M,N,O
|
|
- Kenneth Sims
- 5 years ago
- Views:
Transcription
1 HYCORE: A Hybrid Static-Dynamic Technique to Reduce Communication in Parallel Systems via Scheduling and Re-routing æ David R. Surma Edwin H.-M. Sha Peter M. Kogge Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN Technical Report TR October 1997 With the advent of massively parallel machines there have been considerable gains made in reducing task processing times. However, these gains are signiæcantly diminished by the inherent communication overhead. As one of the point design teams to develop Petaæop supercomputers sponsored by NSF, our research group encountered such a problem while implementing a parallel solution for simulating partial diæerential equations, representing æuid dynamics problems. With the platform being a tightly-coupled architecture such as the processor-inmemory EXECUBE ë1ë, we realized that the communication overhead impeded our eæorts to obtain an optimized execution time. To reduce this overhead, we present a study of the communication incurred when nodes transfer information. Our novel technique involves both compile-time analysis and run-time scheduling. Experiments show signiæcant improvement compared to baseline approaches. The creation of a new scheduling technique was required since most existing scheduling methods do not consider the communication characteristics of the problem ë2, 3ë and are unable to achieve an optimal schedule. Furthermore, most techniques developed for parallel compilers do not consider this overhead ë2, 4ë. This research assumes that a suitable task allocation scheme has been used and deals speciæcally with the ordering and routing of the message transmissions. Therefore, the new scheduling technique is much diæerent from traditional multiprocessor scheduling ë3ë because it schedules at a lower level. Static techniques, while being able to achieve an optimal or near-optimal solution, require known information about the message traæc. Unfortunately, this a priori information may be unavailable or inaccurate. Dynamic scheduling techniques suæer from being unable to utilize information that might be known æ This work was supported in part by NSF MIP and NSF ACS
2 A A,B,C,D 1 E,F,G H,I,J 3 B E H K M 2 C F I L N D G J O 4 M,N,O 5 6 K,L Figure 1: èaè Task Flow DAG. èbè Tasks assigned to processing nodes Schedule 1 Schedule 2 Re-routed Schedule A! E; A! M A! K; A! M A! K 0 ; A! H A! H A! H; L! J; L! O A! E; A! M; I! G; L! O; L! J A! KI! G A! E; I! G L! O; L! J Table 1: Example Communication Schedules about the processing environment. Thus, this research presents a hybrid technique utilizing the appealing components of both approaches. To exemplify this type of scheduling, consider the task directed acyclical graph, or DAG, of Figure 1. Figure 1èbè shows one possible assignment of this graph to a two-dimensional mesh network of six processors. While tasks assigned to the same processor require no internode communication, this assignment scheme indicates that messages must be exchanged. For example, node 1 sends messages to nodes 2, 3, 4, and 5 corresponding to edges A! E; A! H; A! M; and A! K of the DAG. Since there is only a single bidirectional link between each node, network collisions occur. By collisions we mean that messages will compete for at least one physical link in the network. The ærst two columns of Table 1 give possible orderings of the resulting message traæc when XY-routing is used. Messages on the same line may be sent in parallel without collisions. In worm-hole routed networks, the time to transmit a message is relatively distance insensitive ë5ë so we can assume that equal length messages will take the same amount of time, t, to traverse the network. Thus, schedule 1 gives an ordering which completes at time 4t while schedule 2 completes at time 3t, a savings of 25è based on the communication schedule. An even greater amount of improvement can be obtained if message èa! Kè is re-routed to traverse in a YX direction. The third column shows this new schedule with the re-routed message denoted as A! K 0. The completion time of this new ordering is 2t. Thus, this work addresses the ordering or scheduling of the messages as well as the re-routing of some of them to reduce the overall completion time. The term used for this research is communication scheduling. It not only encompass routing aspects and path selection issues as discussed in ë5, 6ë, it also determines the order that the messages in the system should be sent. There have been several studies related to this problem. One eæort develops a `traæc scheduling' algorithm for multi-processor networks to balance the network links based on the fact that a large number of messages must eventually be delivered ë7ë. Their work, however, uses a First-Come First-Served, FCFS, approach and does not perform any scheduling of the individual message transmissions. Lee and Kim perform path selection 2
3 Message Est. Departure Source Destination ID Time è3,1è è7,7è è2,5è è5,7è è3,4è è5,6è è2,2è è7,8è è3,1è è6,4è è2,1è è6,3è è2,1è è5,4è Table 2: Example message list in a wormhole routed network but they search for unique paths for pairs of communicating nodes ë6ë. Kandlur and Shin ë8ë present a work similar to ë6ë in that dedicated paths are found. The problem with these techniques is that the dedicated paths can cause other messages to follow longer paths even though the dedicated links are unused. Additionally, no scheduling is done which can improve the overall performance. Recent work by Eberhart and Li ë9ë does perform a type of dynamic communication scheduling. However, their work is restricted to analyzing communication patterns that are commonly used in data parallel applications. The work presented here can apply to any type of message-passing activity. This paper presents a hybrid technique which uses known information about the required message traæc to statically determine priorities for the individual messages. Then, at run time when a node has several messages to transmit along the same physical link, preference is given to the message with the highest priority. The basis for the priority determination is the recently developed collision graph model ë10ë. The communication scheduling problem has been addressed previously in a purely static manner using æxed routing and a speciæc message traæc model ë11ë. This research greatly improves this eæort by presenting a technique for a general model of message traæc which allows re-routing of messages and operates in a dynamically. This starting point is a list of N messages to be transmitted by the network nodes. The goal is to ænd an optimal communication schedule which reduces the overall processing time. Table 2 shows a sample message list to be executed on a 10X10 two-dimensional mesh processor network. This work considers single packet messages composed of an arbitrary number of æits. Nodes of the multiprocessor system are attached to all-port routers and the routing scheme is XY as the default or a re-routed scheme which will be discussed shortly. Deænition 1 A message is deæned to be M = èm edt ;m S ;m D è where m edt is the estimated departure time of the message, m S is the source node of the message, and m D is the destination node of the message. PRIMAR Algorithm The ærst step in arriving at the communication schedule is to determine the priorities for each message. The algorithm to do this is called the Priority Mapping and Re-routing, or PRIMAR algorithm and it begins by transforming the problem into a graph model, called a collision 3
4 MSG EDT Src Dest (3,1) (7,7) (2,5) (5,7) (3,4) (5,6) (2,2) (7,8) (3,1) (6,4) (2,1) (6,3) (2,1) (5,4) Window = Figure 2: Collision Graph for S with window = 4. graph or CG. Deænition 2 A CG is deæned as G = èv; Eè where V is the set of nodes v1;v2; :::v N representing messages M1;M2; :::M N ; and E = fèv i ;v j èj the paths of M i and M j intersect.g. Since the estimated departure times vary throughout the message list, it is possible that two messages can traverse the same paths without colliding if these times are suæciently far apart. Consequently, a CG is not constructed for the entire message list. Rather, the message list is ærst sorted by estimated departure time and then processed in sections determined by a user input parameter called a window. This window is used as the range for the message traæc departure times to be operated on as a set, S. Figure 2 shows a CG constructed for the nodes in S from Table 2 when the window parameter is 4. To get the ordering from the undirected CG, arrows indicating message precedence must be added to the graph. An edge directed from v1! v2 denotes that the message corresponding to v1 is to be scheduled before the message corresponding to v2. If no edge exists between any two nodes they may be scheduled in parallel. Once an edge orientation has been established, the actual priorities are determined by ærst ænding the nodeèsè without any incoming edges and assigning them the highest priority. Next, these nodes and their edges are removed from the graph, and the process repeats assigning the next highest priority and so on for all messages. Thus, the major problem is determining the edge orientation for the CG that yields a priority scheme which produces the best performance. Central to getting the best performance is ænding the maximum number of messages that can be transmitted in parallel at any one time. This correlates to ænding the maximum independent set from the CG. Since ænding a maximum independent set is an NP-Complete problem, our problem is also NP-Complete, and heuristics are needed to arrive at a solution. Consider again the CG of Figure 2. The maximum independent set is 3 comprised of nodes 2, 3, 5. Those messages will be assigned priority 0 èhighestè and are said to be in S 0. The other nodes in S, S 0 have collisions with the nodes in S 0. Therefore, to enlarge S 0 re-routing of the messages in S, S 0 is considered. Re-routing in a process where the message routing path is changed from XY to YX. However, since deadlocks are a concern in wormhole routed networks, some restrictions are required. 8 turns are possible in two-dimensional mesh networks and XY routing is deadlock free by prohibiting 4 turns. We only restrict the 2 turns shown in Figure 3. Thus, our term for this type of routing is XY and restricted YX routing. 4
5 Figure 3: Illustration of allowable routing turns In the example, nodes 1 and 4 are eligible to be re-routed since they do not violate the turn restrictions. Node 1 is arbitrarily selected ærst for re-routing and it can be routed in a YX direction without colliding with any message in S 0. Thus, it will be assigned priority 0,added to S 0, and its routing æag set to YX. This æag is part of the æit header and each router must be able to interpret it for proper routing. Next node 4 is considered. Since if it is re-routed it will collide with a member of S 0 ère-routed message 1è, it cannot be re-routed. After the nodes with top priority have been determined, they will be eliminated from the graph and the nodes in S, S 0 will be aged. ènode 4 is this example.è Aging is a process where messages have their departure times updated to a later time. The value used for aging is determined by the length of the standard message. Next, the entire list of remaining messages are resorted and the process repeats assigning priority 1. The algorithm is executed with several window sizes, a metric produced and the best priority scheme used. Algorithm 1 PRIMAR Input: G=èV,Eè, and M Output: Mèvèpri 8v 2 V begin pri = 0; Input window from user; I=;; repeat until V =0; sort V by estimated departure time, edt; limit1 = earliest estimated departure time of a node v 2 V ; limit2 = limit1 + window; Build Gt =èvt,etè such that Vt = fvj limit1 ç Mèvè edt ç limit2g and E t = fej u,! e v and u,v 2 V tg; Determine the maximum independent set, I ç G t; 8v 2 I, Mèvèpri = pri; 8v 2 Gt =2 I, Explore re-routing for each v If re-routing can be done, Mèvèpri = pri path direction = YX, and add Mèvè to I; 8v 2 èv t, Iè Mèvè edt = Mèvè edt + age; pri = pri + 1; V = V, I; end loop; end algorithm PRIMAR HYCORE Technique and Results The Hybrid Communication Scheduling with Re-routing, or HYCORE, technique utilizes the results of the PRIMAR algorithm. At run-time each node selects a message to transmit based on several factors. If a node has only one message ready to transmit, it checks the routing æag and if the appropriate link is available the message is transmitted. However, if the node has several messages that are ready to be transmitted, the priority is used as the arbiter. A 5
6 Operation Msgs SCORE FCFS HYSTAD Re-routed HYCORE è HYCORE Sent FCFS Improvement LU Factorization Matrix Multiply Bitonic Sorting Table 3: Comparison of scheduling techniques without variance simulation program was developed to determine the time a message reaches its destination and a performance metric established. This metric is the average completion time, or ACT, for all messages transmitted. The ACT is used because our focus is on the individual message transfers. While we are interested in having the shortest ænal completion time we also want to have as many messages transmit as soon as possible. Thus, by using the ACT we can distinguish between two schedules which have equivalent ænal schedule completion times. In the example message list of Table 2, the ACT for a statically determined schedule is A FCFS approach has a time of while our hybrid approach without re-routing, ècalled HY STAD in the tableè, yields a value of Utilizing re-routing the static approach value decreases to while the HY CORE technique is Thus, the improvement gained by the HY CORE technique over a FCFS approach is a signiæcant 23.28è. The statically determined algorithm being the best makes sense because if exact information is known a priori about the message traæc a schedule can be optimized. However, obtaining this information with much accuracy is diæcult. Consequently, in experiments a variance is introduced which takes into account network uncertainties, congestion, and other performance æuctuations. This variance is distributed uniformly over the estimated departure times of all messages and experiments were performed to study its eæects. Two models of message traæc were considered in our experiments. First, LU factorization, matrix multiplication, and bitonic sorting were analyzed to determine the message passing that occurs when they are mapped to a two-dimensional mesh architecture. ACT values are given in Table 3 for the results of the SCORE static scheduling algorithm utilizing re-routing ë12ë, a FCFS approach both with and without re-routing, the HY STAD and the HY CORE techniques. In this table the variance is 0 so the static approach again performed the best. Further note that the HYCORE technique outperforms the FCFS approach by approximately 20è. Table 4 shows the results when the variance is 4. Static scheduling no longer works best as it must compensate for worst case times, and HYCORE still works better than FCFS although the percentage is not as great. This is due to the deteriorating accuracy of the information used to determine the priorities. It is still better indicating that having some knowledge, albeit not totally accurate, improves the performance. Table 5 shows results obtained when applying the æve scheduling techniques to randomly generated traæc patterns consisting of 30 messages. A hotspot index was used to vary the amount of collisions by causing the message destinations to be in a certain area with a given percent. The results are averages of 100 trials for each case. Note that the diæerences in the 6
7 Operation Msgs SCORE FCFS HYSTAD Re-routed HYCORE è HYCORE Sent FCFS Improvement LU Factorization Matrix Multiply Bitonic Sorting Table 4: Comparison of scheduling techniques with variance = 4 Hotspot SCORE FCFS HYSTAD Rescheduled HYCORE Percent Index FCFS Improvement 10è è è è è Table 5: Experiments with 30 messages and variance = 0 amount of improvement that can be obtained depends on the nature of the message traæc. The HY CORE technique works best on traæc where there is a moderate amount of collisions. At low collisions è10è hotspot index in the tableè, there is not much parallelism to exploit and consequently the improvement that can be obtained, while still signiæcant, is comparably low. At high amounts of collisions, the CG resembles a clique where the FCFS approach will begin to work as well as other approaches. Since the comparison is with this FCFS approach, as the amount of collisions increases, the amount of improvement that can be obtained decreases. In the table note the falloæ in improvement when the hotspot index exceeds 75è. In between these extremes, however, the improvement obtained by the HYCORE technique steadily increases to a maximum of 21è. Two parameters are changed to study the eæects of additional messages transmissions and also the introduction of a variance. Table 6 shows results for experiments using a 40è hotspot index and varying the amount of messages transmitted when the variance is 4. From this table it can be seen that the static SCORE technique performs poorly while the HYCORE technique is again better than the FCFS approach. Note that the amount of improvement begins to diminish when the number of messages is greater than 40. This is the case because more messages results in more collisions for a æxed hotspot index. As shown in the previous analysis, once the number of collisions becomes great, the performance begins to diminish. This paper presents a framework for studying communication scheduling. The HY CORE technique combines static and run-time elements along with re-routing to reduce the commu- Msgs SCORE FCFS HYCORE Percent sent Improvement Table 6: Experiments with 40è hotspot index and variance = 4 7
8 nication overhead by over 20è for both application-speciæc message traæc and for randomly generated message traæc. This technique will almost always perform better than a FCFS approach due to its using re-routing and since it acts ærst to schedule its messages on a FCFS basis. In the presence of variances, this technique will outperform baseline static scheduling techniques as well. References ë1ë P. M. Kogge, ëexecube- A New Architecture for Scalable MPPs," in 1994 International Conference on Parallel Processing, vol. I, pp. 77í84, August ë2ë H. Kasahara and S. Narita, ëpractical multiprocessor scheduling algorithms for eæcient parallel processing," IEEE Transactions on Computers, vol. c-33, November ë3ë H. El-Rewini, T. G. Lewis, and H. H. Ali, Task Scheduling in Parallel and Distributed Systems. Englewood Cliæs, NJ: Prentice Hall, ë4ë S. Shukla, B. Little, and A. Zaky, ëa compile-time technique for controlling real-time execution of task-level data-æow graphs.," in 1992 International Conference on Parallel Processing, vol. II, pp. 49í56, ë5ë L. M. Ni and P. McKinley, ëa survey of wormhole routing techniques in direct networks," IEEE Computer, vol. 26, February ë6ë S. Lee and J. Kim, ëpath selection for communicating tasks in a wormhole-routed multicomputer," in 1994 International Conference on Parallel Processing, vol. 3, pp. 172í175, ë7ë R. P. Bianchini and J. P. Shen, ëinterprocessor traæc scheduling algorithm for multipleprocessor networks," IEEE Transactions on Computers, vol. C-36, pp. 396í409, April ë8ë D. D. Kandlur and K. G. Shin, ëtraæc routing for multicomputer networks with virtual cut-through capability," IEEE Transactions on Computers, vol. c-41, pp. 1257í1270, October ë9ë A. Eberhart and J. Li, ëcontention-free communication scheduling on 2d meshes," in 1996 International Conference on Parallel Processing, pp. 44í51, ë10ë D. R. Surma and E. Sha, ëcollision graph based communication scheduling for parallel systems," to be published in Journal of Computers and their Applications, December ë11ë D. R. Surma and E. Sha, ëeæcient communication scheduling with re-routing based on collision graphs," in International Symposium on High Performance Computing Systems, July
9 ë12ë D. R. Surma and E. Sha, ëscore: An eæcient technique to reduce congestion in parallel systems," in To be presented at the Tenth International Conference on Parallel and Distributed Computing Systems, September
A,B,G,L F,K 4 E,J,N G H I J K L M
Collision Graph based Communication Scheduling with Re-routing in Parallel Systems æ David Ray Surma Edwin Hsing-Mean Sha Dept. of Computer Science & Engineering University of Notre Dame Notre Dame, IN
More informationA Hybrid Interconnection Network for Integrated Communication Services
A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.
More informationDeadlock-free XY-YX router for on-chip interconnection network
LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ
More informationTECHNICAL RESEARCH REPORT
TECHNICAL RESEARCH REPORT A Simulation Study of Enhanced TCP/IP Gateways for Broadband Internet over Satellite by Manish Karir, Mingyan Liu, Bradley Barrett, John S. Baras CSHCN T.R. 99-34 (ISR T.R. 99-66)
More informationPerformance of Multihop Communications Using Logical Topologies on Optical Torus Networks
Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,
More informationFAST IEEE ROUNDING FOR DIVISION BY FUNCTIONAL ITERATION. Stuart F. Oberman and Michael J. Flynn. Technical Report: CSL-TR
FAST IEEE ROUNDING FOR DIVISION BY FUNCTIONAL ITERATION Stuart F. Oberman and Michael J. Flynn Technical Report: CSL-TR-96-700 July 1996 This work was supported by NSF under contract MIP93-13701. FAST
More information3. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments
. Xilinx, Inc., The Programmable Logic Data Book, 99.. G. G. Lemieux and S. D. Brown, ëa detailed router for allocating wire segments in æeld-programmable gate arrays," in Proceedings of the ACM Physical
More informationinsertion wcet insertion 99.9 insertion avgt heap wcet heap 99.9 heap avgt 200 execution time number of elements
Time-Constrained Sorting A Comparison of Diæerent Algorithms P. Puschner Technische Universitíat Wien, Austria peter@vmars.tuwien.ac.at A. Burns University of York, UK burns@minster.york.ac.uk Abstract:
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationTECHNICAL RESEARCH REPORT
TECHNICAL RESEARCH REPORT Hierarchical Loss Network Model for Performance Evaluation by Mingyan Liu, John S. Baras CSHCN T.R. 2000-1 (ISR T.R. 2000-2) Sponsored by: NASA A Hierarchical Loss Network Model
More informationBandwidth Aware Routing Algorithms for Networks-on-Chip
1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering
More informationOptimizing Data Scheduling on Processor-In-Memory Arrays y
Optimizing Data Scheduling on Processor-In-Memory Arrays y Yi Tian Edwin H.-M. Sha Chantana Chantrapornchai Peter M. Kogge Dept. of Computer Science and Engineering University of Notre Dame Notre Dame,
More informationTraffic Control in Wormhole Routing Meshes under Non-Uniform Traffic Patterns
roceedings of the IASTED International Conference on arallel and Distributed Computing and Systems (DCS) November 3-6, 1999, Boston (MA), USA Traffic Control in Wormhole outing Meshes under Non-Uniform
More informationContention-Aware Scheduling with Task Duplication
Contention-Aware Scheduling with Task Duplication Oliver Sinnen, Andrea To, Manpreet Kaur Department of Electrical and Computer Engineering, University of Auckland Private Bag 92019, Auckland 1142, New
More informationModule 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth
Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012
More informationA Novel Task Scheduling Algorithm for Heterogeneous Computing
A Novel Task Scheduling Algorithm for Heterogeneous Computing Vinay Kumar C. P.Katti P. C. Saxena SC&SS SC&SS SC&SS Jawaharlal Nehru University Jawaharlal Nehru University Jawaharlal Nehru University New
More informationBARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs
-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The
More informationNetworks. November 23, Abstract. In this paper we consider systems with redundant communication paths, and show how
Soft Real-Time Communication Over Dual Non-Real-Time Networks èextended Abstractè Ben Kao æ Hector Garcia-Molina y November 23, 1992 Abstract In this paper we consider systems with redundant communication
More informationwhere C is traversed in the clockwise direction, r 5 èuè =h, sin u; cos ui; u ë0; çè; è6è where C is traversed in the counterclockwise direction èhow
1 A Note on Parametrization The key to parametrizartion is to realize that the goal of this method is to describe the location of all points on a geometric object, a curve, a surface, or a region. This
More informationæ When a query is presented to the system, it is useful to ænd an eæcient method of ænding the answer,
CMPT-354-98.2 Lecture Notes July 26, 1998 Chapter 12 Query Processing 12.1 Query Interpretation 1. Why dowe need to optimize? æ A high-level relational query is generally non-procedural in nature. æ It
More informationA Level-wise Priority Based Task Scheduling for Heterogeneous Systems
International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract
More information19.2 View Serializability. Recall our discussion in Section?? of how our true goal in the design of a
1 19.2 View Serializability Recall our discussion in Section?? of how our true goal in the design of a scheduler is to allow only schedules that are serializable. We also saw how differences in what operations
More informationJournal of Universal Computer Science, vol. 3, no. 10 (1997), submitted: 11/3/97, accepted: 2/7/97, appeared: 28/10/97 Springer Pub. Co.
Journal of Universal Computer Science, vol. 3, no. 10 (1997), 1100-1113 submitted: 11/3/97, accepted: 2/7/97, appeared: 28/10/97 Springer Pub. Co. Compression of Silhouette-like Images based on WFA æ Karel
More informationEncoding Time in seconds. Encoding Time in seconds. PSNR in DB. Encoding Time for Mandrill Image. Encoding Time for Lena Image 70. Variance Partition
Fractal Image Compression Project Report Viswanath Sankaranarayanan 4 th December, 1998 Abstract The demand for images, video sequences and computer animations has increased drastically over the years.
More informationA Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.
A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic
More informationclassify all blocks into classes and use a class table to record the memory accesses of the ærst repetitive pattern. By using the class table, they de
Eæcient Address Generation for Aæne Subscripts in Data-Parallel Programs Kuei-Ping Shih Department of Computer Science and Information Engineering National Central University Chung-Li 32054, Taiwan Email:
More informationDUE to the increasing computing power of microprocessors
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and
More informationA Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems
A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems Yi-Hsuan Lee and Cheng Chen Department of Computer Science and Information Engineering National Chiao Tung University, Hsinchu,
More informationEvaluation of NOC Using Tightly Coupled Router Architecture
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router
More informationsimply by implementing large parts of the system functionality in software running on application-speciæc instruction set processor èasipè cores. To s
SYSTEM MODELING AND IMPLEMENTATION OF A GENERIC VIDEO CODEC Jong-il Kim and Brian L. Evans æ Department of Electrical and Computer Engineering, The University of Texas at Austin Austin, TX 78712-1084 fjikim,bevansg@ece.utexas.edu
More informationSOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*
SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School
More informationThis chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research
CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks
More informationtask type nodes is entry initialize èwith mailbox: access to mailboxes; with state: in statesè; entry ænalize èstate: out statesè; end nodes; task bod
Redistribution in Distributed Ada April 16, 1999 Abstract In this paper we will provide a model using Ada and the Distributed Annex for relocating concurrent objects in a distributed dataæow application.
More information372 M. H. Goldwasser & R. Motwani 1. Introduction Given a set of parts and a geometric description of their relative positions in a product, the assem
International Journal of Computational Geometry & Applications Vol. 9, Nos. 4 & 5 è1999è 371í417 cæ World Scientiæc Publishing Company COMPLEXITY MEASURES FOR ASSEMBLY SEQUENCES æ MICHAEL H. GOLDWASSER
More informationMassively Parallel Computation for Three-Dimensional Monte Carlo Semiconductor Device Simulation
L SIMULATION OF SEMICONDUCTOR DEVICES AND PROCESSES Vol. 4 Edited by W. Fichtner, D. Aemmer - Zurich (Switzerland) September 12-14,1991 - Hartung-Gorre Massively Parallel Computation for Three-Dimensional
More informationFault-Tolerant Routing Algorithm in Meshes with Solid Faults
Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University
More informationWorst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems
Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems J.M. López, M. García, J.L. Díaz, D.F. García University of Oviedo Department of Computer Science Campus de Viesques,
More informationformulation Model Real world data interpretation results Explanations
Mathematical Modeling Lecture Notes David C. Dobson January 7, 2003 1 Mathematical Modeling 2 1 Introduction to modeling Roughly deæned, mathematical modeling is the process of constructing mathematical
More informationScheduling Algorithms to Minimize Session Delays
Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of
More informationInænitely Long Walks on 2-colored Graphs Which Don't Cover the. Graph. Pete Gemmell æ. December 14, Abstract
Inænitely Long Walks on 2-colored Graphs Which Don't Cover the Graph Pete Gemmell æ December 14, 1992 Abstract Suppose we have a undirected graph G =èv; Eè where V is the set of vertices and E is the set
More informationAbstract. circumscribes it with a parallelogram, and linearly maps the parallelogram onto
173 INTERACTIVE GRAPHICAL DESIGN OF TWO-DIMENSIONAL COMPRESSION SYSTEMS Brian L. Evans æ Dept. of Electrical Engineering and Computer Sciences Univ. of California at Berkeley Berkeley, CA 94720 USA ble@eecs.berkeley.edu
More informationGeneric Methodologies for Deadlock-Free Routing
Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University
More informationCommunication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1
Communication Networks I December, Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page Communication Networks I December, Notation G = (V,E) denotes a
More informationRED behavior with different packet sizes
RED behavior with different packet sizes Stefaan De Cnodder, Omar Elloumi *, Kenny Pauwels Traffic and Routing Technologies project Alcatel Corporate Research Center, Francis Wellesplein, 1-18 Antwerp,
More informationAchieving Distributed Buffering in Multi-path Routing using Fair Allocation
Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois
More informationA Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,
More informationApproximating a Policy Can be Easier Than Approximating a Value Function
Computer Science Technical Report Approximating a Policy Can be Easier Than Approximating a Value Function Charles W. Anderson www.cs.colo.edu/ anderson February, 2 Technical Report CS-- Computer Science
More informationGuernsey Post 2013/14. Quality of Service Report
Guernsey Post 2013/14 Quality of Service Report The following report summarises Guernsey Post s (GPL) quality of service performance for the financial year April 2013 to March 2014. End-to-end quality
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationCHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT
CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT This chapter discusses software based scheduling and testing. DVFS (Dynamic Voltage and Frequency Scaling) [42] based experiments have
More informationREDUCTION CUT INVERTED SUM
Irreducible Plane Curves Jason E. Durham æ Oregon State University Corvallis, Oregon durhamj@ucs.orst.edu August 4, 1999 Abstract Progress in the classiæcation of plane curves in the last æve years has
More informationÕ(Congestion + Dilation) Hot-Potato Routing on Leveled Networks
Õ(Congestion + Dilation) Hot-Potato Routing on Leveled Networks Costas Busch Rensselaer Polytechnic Institute buschc@cs.rpi.edu July 23, 2003 Abstract We study packet routing problems, in which we route
More informationFace whose neighbors are to be found. Neighbor face Bounding box of boundary layer elements. Enlarged bounding box
CHAPTER 8 BOUNDARY LAYER MESHING - FIXING BOUNDARY LAYER INTERSECTIONS When boundary layer elements are generated on model faces that are too close to each other the layers may run into each other. When
More informationEstimate the Routing Protocols for Internet of Things
Estimate the Routing Protocols for Internet of Things 1 Manjushree G, 2 Jayanthi M.G 1,2 Dept. of Computer Network and Engineering Cambridge Institute of Technology Bangalore, India Abstract Internet of
More informationSCO ACL SCO ACL MASTER SCO ACL SCO SLAVE 1 ACL ACL SLAVE 2 ACL SLAVE 3
1 MAC Scheduling and SAR policies for Bluetooth: A Master Driven TDD Pico-Cellular Wireless System Manish Kalia, Deepak Bansal, Rajeev Shorey IBM Research Center, Block 1,Indian Institute of Technology,
More informationANALYSIS OF THE CORRELATION BETWEEN PACKET LOSS AND NETWORK DELAY AND THEIR IMPACT IN THE PERFORMANCE OF SURGICAL TRAINING APPLICATIONS
ANALYSIS OF THE CORRELATION BETWEEN PACKET LOSS AND NETWORK DELAY AND THEIR IMPACT IN THE PERFORMANCE OF SURGICAL TRAINING APPLICATIONS JUAN CARLOS ARAGON SUMMIT STANFORD UNIVERSITY TABLE OF CONTENTS 1.
More informationLoad Balanced Link Reversal Routing in Mobile Wireless Ad Hoc Networks
Load Balanced Link Reversal Routing in Mobile Wireless Ad Hoc Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE Department RPI Costas Busch CSCI Department RPI Mobile Wireless Networks Wireless nodes
More informationDynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks
Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks ENSC 833 Project Final Report Zhenhua Xiao (Max) zxiao@sfu.ca April 22, 2001 Department of Engineering Science, Simon Fraser
More informationpendent instruction streams, memory layout. control strategy. æ Node Architecture: instruction eæciency, application speciæc features, suitability to
A Supercomputer for Neural Computation Krste Asanoviçc, James Beck, Jerome Feldman, Nelson Morgan, and John Wawrzynek Abstract The requirement to train large neural networks quickly has prompted the design
More informationSlow Path. Output Buffers 1 N N. Fast Path Switch Fabric. Slow Path. Output Buffers. Fast Path
High-Speed Policy-based Packet Forwarding Using Eæcient Multi-dimensional Range Matching T.V. Lakshman and D. Stiliadis Bell Laboratories Crawfords Corner Rd. Holmdel, NJ 7733 flakshman, stiliadi g@bell-labs.com
More informationSplitter Placement in All-Optical WDM Networks
plitter Placement in All-Optical WDM Networks Hwa-Chun Lin Department of Computer cience National Tsing Hua University Hsinchu 3003, TAIWAN heng-wei Wang Institute of Communications Engineering National
More informationSLALoM: A Scalable Location Management Scheme for Large Mobile Ad-hoc Networks
SLALoM A Scalable Location Management Scheme for Large Mobile Ad-hoc Networks Christine T. Cheng *, Howard L. Lemberg, Sumesh J. Philip, Eric van den Berg and Tao Zhang * Institute for Math & its Applications,
More informationEE 382C Interconnection Networks
EE 8C Interconnection Networks Deadlock and Livelock Stanford University - EE8C - Spring 6 Deadlock and Livelock: Terminology Deadlock: A condition in which an agent waits indefinitely trying to acquire
More informationENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AND RESOURCE CONSTRAINTS
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AND RESOURCE CONSTRAINTS Santhi Baskaran 1 and P. Thambidurai 2 1 Department of Information Technology, Pondicherry Engineering
More informationComputer Science Engineering Sample Papers
See fro more Material www.computetech-dovari.blogspot.com Computer Science Engineering Sample Papers 1 The order of an internal node in a B+ tree index is the maximum number of children it can have. Suppose
More informationRouting. Information Networks p.1/35
Routing Routing is done by the network layer protocol to guide packets through the communication subnet to their destinations The time when routing decisions are made depends on whether we are using virtual
More informationP(a) on off.5.5 P(B A) = P(C A) = P(D B) = P(E C) =
Inference in Belief Networks: A Procedural Guide Cecil Huang Section on Medical Informatics Stanford University School of Medicine Adnan Darwiche æ Information Technology Rockwell Science Center Address
More informationGeneralized Multiple Description Vector Quantization æ. Abstract. Packet-based data communication systems suæer from packet loss under high
Generalized Multiple Description Vector Quantization æ Michael Fleming Michelle Eæros Abstract Packet-based data communication systems suæer from packet loss under high network traæc conditions. As a result,
More informationImplementation of Dynamic Level Scheduling Algorithm using Genetic Operators
Implementation of Dynamic Level Scheduling Algorithm using Genetic Operators Prabhjot Kaur 1 and Amanpreet Kaur 2 1, 2 M. Tech Research Scholar Department of Computer Science and Engineering Guru Nanak
More informationScheduling in Multiprocessor System Using Genetic Algorithms
Scheduling in Multiprocessor System Using Genetic Algorithms Keshav Dahal 1, Alamgir Hossain 1, Benzy Varghese 1, Ajith Abraham 2, Fatos Xhafa 3, Atanasi Daradoumis 4 1 University of Bradford, UK, {k.p.dahal;
More informationWe approve the thesis of Ki Hwan Yum. Date of Signature Chita R. Das Professor of Computer Science and Engineering Thesis Adviser, Chair of Committee
The Pennsylvania State University The Graduate School Department of Computer Science and Engineering QUALITY OF SERVICE PROVISIONING IN CLUSTERS A Thesis in Computer Science and Engineering by Ki Hwan
More informationUniversity of Texas at Austin. Austin, TX Nathaniel Dean. Combinatorics and Optimization Research. Bell Communications Research
Implementation of Parallel Graph Algorithms on a Massively Parallel SIMD Computer with Virtual Processing Tsan-sheng Hsu æy & Vijaya Ramachandran æ Department of Computer Sciences University of Texas at
More informationCONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK
IADIS International Conference on Applied Computing CONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK Ahmad.H. ALqerem Dept. of Comp. Science ZPU Zarka Private University Zarka Jordan ABSTRACT Omega
More informationHigh Speed Switch Scheduling for Local Area Networks. Susan S. Owicki, James B. Saxe, and Charles P. Thacker. Systems Research Center.
High Speed Switch Scheduling for Local Area Networks Thomas E. Anderson Computer Science Division University of California Berkeley, CA 94720 Susan S. Owicki, James B. Saxe, and Charles P. Thacker Systems
More informationHARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS
HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS An Undergraduate Research Scholars Thesis by DENISE IRVIN Submitted to the Undergraduate Research Scholars program at Texas
More informationQuantiles. IBM Almaden Research Center. Abstract. one pass over the data; iiè it is space eæcient it uses a small bounded amount of
A One-Pass Space-Eæcient Algorithm for Finding Quantiles Rakesh Agrawal Arun Swami æ IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120 Abstract We present an algorithm for ænding the quantile
More informationA CPLD-based RC-4 Cracking System. short ètypically 32 or 40 bitsè sequence of bits. As long as. thus can not decrypt the message.
A CPLD-based RC-4 Cracking System Paul D. Kundarewich and Steven J.E. Wilton Dept. of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada kundarew@ieee.org, stevew@ece.ubc.ca
More informationMODULE Example Inputs a[3:0], b[3:0], c[3:0], s0; Clock clk; Outputs x[1:0], y[3:0]; begin main for i = 0 to 3 do
A General Approach for Regularity Extraction in Datapath Circuits Amit Chowdhary Sudhakar Kale Phani Saripella Naresh Sehgal Intel Corporation Santa Clara, CA 9505 Rajesh Gupta University of California
More informationBoosting the Performance of Myrinet Networks
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations
More informationA Genetic Algorithm for Multiprocessor Task Scheduling
A Genetic Algorithm for Multiprocessor Task Scheduling Tashniba Kaiser, Olawale Jegede, Ken Ferens, Douglas Buchanan Dept. of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB,
More informationprocess variable x,y,a,b,c: integer begin x := b; -- d2 -- while (x < c) loop end loop; end process; d: a := b + c
ControlData-æow Analysis for VHDL Semantic Extraction æ Yee-Wing Hsieh Steven P. Levitan Department of Electrical Engineering University of Pittsburgh Abstract Model abstraction reduces the number of states
More informationResource Deadlocks and Performance of Wormhole Multicast Routing Algorithms
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh
More informationLecture 3: Flow-Control
High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor
More informationNOC Deadlock and Livelock
NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,
More informationPerformance Comparison of Processor Scheduling Strategies in a Distributed-Memory Multicomputer System
Performance Comparison of Processor Scheduling Strategies in a Distributed-Memory Multicomputer System Yuet-Ning Chan, Sivarama P. Dandamudi School of Computer Science Carleton University Ottawa, Ontario
More informationFrom Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols
From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols Christian Scheideler Ý Berthold Vöcking Þ Abstract We investigate how static store-and-forward routing algorithms
More informationDynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution
Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Nishant Satya Lakshmikanth sailtosatya@gmail.com Krishna Kumaar N.I. nikrishnaa@gmail.com Sudha S
More informationDelayed reservation decision in optical burst switching networks with optical buffers
Delayed reservation decision in optical burst switching networks with optical buffers G.M. Li *, Victor O.K. Li + *School of Information Engineering SHANDONG University at WEIHAI, China + Department of
More informationDistributed Deadlock Detection for. Distributed Process Networks
0 Distributed Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Systems Abstract The distributed process network (DPN) model allows for greater scalability and performance
More informationRadio Transmission. Mobile Subscriber. Automatic. network design. Resource Allocation. System Architecture. Managing Module
Presented at VTC'97, Phoenix, USA, 5-7 May 1997, pp 765--769 ICEPT í An Integrated Cellular Network Planning Tool Kurt Tutschku, Kenji Leibnitz, and Phuoc TraníGia Institute of Computer Science, University
More informationHigh-level Variable Selection for Partial-Scan Implementation
High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose
More informationTIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation
TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation Charles Selvidge, Anant Agarwal, Matt Dahl, Jonathan Babb Virtual Machine Works, Inc. 1 Kendall Sq. Building
More informationA Real-Time Communication Method for Wormhole Switching Networks
A Real-Time Communication Method for Wormhole Switching Networks Byungjae Kim Access Network Research Laboratory Korea Telecom 62-1, Whaam-dong, Yusung-gu Taejeon, Korea E-mail: bjkim@access.kotel.co.kr
More informationQoS-Aware Hierarchical Multicast Routing on Next Generation Internetworks
QoS-Aware Hierarchical Multicast Routing on Next Generation Internetworks Satyabrata Pradhan, Yi Li, and Muthucumaru Maheswaran Advanced Networking Research Laboratory Department of Computer Science University
More informationExample of TORA operations. From last time, this was the DAG that was built. A was the source and X was the destination.
Example of TORA operations A Link 2 D Link 6 Y Link 1 Link 3 C Link 4 Link 8 B Link 5 E Link 7 X From last time, this was the DAG that was built. A was the source and X was the destination. Link 1 A B
More informationPerformance of Circuit Switched LANs. Pittsburgh, PA Pittsburgh, PA data transfers, assuming all N sources and all N destinations
Performance of Circuit Switched LANs under Diæerent Traæc Conditions Qingming Ma Peter Steenkiste School of Computer Science School of Computer Science Carnegie Mellon University Carnegie Mellon University
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationA Heuristic Algorithm for Designing Logical Topologies in Packet Networks with Wavelength Routing
A Heuristic Algorithm for Designing Logical Topologies in Packet Networks with Wavelength Routing Mare Lole and Branko Mikac Department of Telecommunications Faculty of Electrical Engineering and Computing,
More informationAdvanced Topics UNIT 2 PERFORMANCE EVALUATIONS
Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors
More informationA Comparison of Task-Duplication-Based Algorithms for Scheduling Parallel Programs to Message-Passing Systems
A Comparison of Task-Duplication-Based s for Scheduling Parallel Programs to Message-Passing Systems Ishfaq Ahmad and Yu-Kwong Kwok Department of Computer Science The Hong Kong University of Science and
More information