A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes

Size: px
Start display at page:

Download "A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes"

Transcription

1 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes Dianne R. Kumar, Member, IEEE, Walid A. Najjar, and Pradip K. Srimani, Fellow, IEEE AbstractÐMulticast communication is a key issue in almost all applications that run on any parallel architecture and, hence, efficient implementation of of multicast is critical to the performance of multiprocessor machines. Multicast is implemented in parallel architectures either via software or via hardware. Software-based approaches for implementing multicast can result in high message latencies, while hardware-based schemes can greatly improve performance. Deadlock freedom in multicast communication is much more difficult to achieve resulting in more involved routing algorithms and higher startup delays. Hardware tree-based algorithms do not require these high startup delays, but do suffer from high probabilities of message blocking leading to poor performance. In this paper, we propose a new hardware tree-based routing algorithm (HTA) for multicast communication under virtual cut-through switching in k-ary n-cubes that outperforms existing software and hardware path-based multicast routing schemes. Simulation results are compared against several commonly used multicast routing algorithms and show that HTA performs extremely well under many different conditions. Index TermsÐMulticast communication, path-based routing, tree-based routing, deterministic routing, adaptive routing, virtual cut-through switching. æ 1 INTRODUCTION EFFICIENT routing of multicast messages is extremely important to the performance of multiprocessors. Since most current multiprocessors only support unicast communication, multicast is therefore implemented as multiple unicast messages resulting in high message latencies. Hardware-based multicast support can greatly improve performance. Among the proposed hardware-based schemes for multicast are path-based and tree-based routing algorithms ([3], [22], [9], [14], [19], [2], 12], [5], [4]). Each one of these schemes uses multidestination messages, which are messages that have more than one header flit. The main difference between these two types of multicast schemes lies in how the header flits in a multidestination message are routed. In tree-based routing, a multidestination message is routed through the network at each intermediate node along a multidestination message's path using all header flits in the message. In path-based routing, a multidestination message is routed through the network using only the first header flit in the message. Once the first header flit reaches its destination and is absorbed by the node, the next header flit is routed. In tree-based routing, no ordering of the destinations is required before the message is injected into the network and the shortest paths between the source node and all destinations are always taken. However, this type of routing suffers from a high. D.R. Kumar is with the Department of Computer Science and Engineering, University of Colorado at Denver, Denver, CO dkumar@carbon.cudenver.edu.. W.A. Najjar is with the Department of Computer Science and Engineering, University of California Riverside, Riverside, CA najjar@cs.ucr.edu.. P.K. Srimani is with the Department of Computer Science, Clemson University, Clemson, SC srimani@cs.clemson.edu. Manuscript received 1 Sept. 2000; accepted 20 Mar For information on obtaining reprints of this article, please s to: tc@computer.org, and reference IEEECS Log Number probability of message blocking at intermediate nodes, leading to higher deadlock probability. Path-based routing does not suffer from this high probability of message blocking. However, it does require destinations to be ordered at the source and does not always provide the shortest path between the source node and each destination node. In this paper, we propose a hardware tree-based routing algorithm (HTA) which attempts to reduce the probability of message blocking, resulting in low message latencies. The probability of a message blocking is kept low by using virtual cut-through switching, indepent virtual channels (VCs) for unicast and multicast messages, several VCs per physical channel (PC), an efficient deadlock detection and recovery scheme, and delayed header flit routing. The Hardware Tree-based routing algorithm (HTA) is a fully adaptive and minimal tree-based routing scheme for multicast routing. The scheme is fully compatible with existing unicast routing schemes. Multicast routing is briefly described in Section 2. Section 3 describes our proposed HTA scheme, including its routing scheme, deadlock recovery mechanism, and router implementation. Experimental deadlock probabilities, as well as a comparison of HTA, Software Multicast, and Column-Path algorithms, are reported in Section 4. Section 5 discusses related work and concluding remarks are given in Section 6. 2 MULTICAST ROUTING ALGORITHMS The interconnection network model considered in this study is a k-ary n-cube using input buffering and virtual cut-through switching [8]: Message advancement is similar to worm-hole routing [17], except that the body of a message can continue to progress even while the message /01/$10.00 ß 2001 IEEE

2 2 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 head is blocked, and the entire message can be buffered at a single node. Note that a header flit can progress to the next node only if the whole message can fit in the destination buffer. For simplicity, all message lengths are equal. 2.1 Multicast Implementation in Software Implementing multicast in software is currently widely used since most current systems only support one-to-one communication. In software implementations of multicast, one or more unicast messages must be sent. The simplest implementation of multicast using unicast messages is to have one unicast message sent for every destination address in the multicast message. This scheme is referred to as Software Multicast in Section Multicast Implementation in Hardware Among the proposed hardware-based schemes for multicast are path-based and tree-based routing algorithms Path-Based Routing In path-based multicast, a multidestination message is routed through the network using only the first header flit in the message. Once this destination is reached, the first header flit is removed by the router and the next header flit is used to continue routing the message to the next destination. The data flits are simultaneously forwarded to both the destination node as well as to the next input queue required for the next header flit. This continues until all destinations are reached and the message is completely consumed by the node. To reduce the path length of a multicast message, the destination node set is divided into disjoint subsets. Each disjoint destination subset is then composed into separate submulticast messages and sent along separate multicast paths. By appropriately ordering the destinations within each set or subset at the source node, the path taken can be reduced and messages are routed more efficiently. In path-based routing schemes, the probability of message blocking is low since at most two channels are requested per message (one regular channel and/or one sink channel if a message has reached a destination). However, path-based routing does require an ordered list of destination addresses for each copy of a message before it is injected into the network. This destination ordering can be computed at compile time if the destinations are statically determined. The subpath between the source node and one of the destination nodes in a multicast path is often not the shortest path. Some path-based routing schemes include Dual-Path [11], Multipath [11], Column-Path [3], [2], and Hierarchical Leader-Based Scheme (HL) [18]. The Dual-Path and Multipath algorithms have the disadvantage of being incompatible with commonly used unicast routing algorithms, such as the e-cube routing algorithm, and therefore will not be used in the simulations here Tree-Based Routing In tree-based multicast routing, a multidestination message is routed through the network at each intermediate node along a multidestination message's path using all the header flits within it. The multidestination message is routed along a common path among all header flits in the message as far as possible. The header flits are then routed and moved onto different channels headed for a unique set of destination nodes. The data flits are simultaneously forwarded to each of the different channels already allocated for each header flit. This branching continues as necessary until all destination nodes have been reached. Tree-based routing has the advantage that no ordering of the destinations is required before the message is injected into the network. The shortest path between the source node and all destination nodes is always taken. However, tree-based routing has been shown to be prone to large blocking probabilities at intermediate nodes, resulting in poor overall performance [10], [5], [4]. The probability of blocking is much greater than that for path-based routing schemes because all branch channels must be available for the whole multidestination message to continue. Because of the probability of message blocking, tree-based routing algorithms suffer from higher multicast message latency. Some tree-based algorithms include Double-Channel XY [10], Tree-Based Multicast with Branch Pruning [13], Resumable Multicast [5], [4], Restricted Branch Multicast [5], [4], and Quad-Branch Multicast (QBM) [24]. The Double-Channel XY algorithm requires double channels for deadlock freedom and the number of channels between every pair of nodes grows exponentially with the number of mesh dimensions. In addition, this algorithm has been shown to perform worse than path-based algorithms for wormhole switching. Tree-Based Multicast with Branch Pruning does perform well. However, the data size for each message must be very small. The Resumable Multicast and Restricted Branch Multicast algorithms do not perform well unless the number of fanouts at each intermediate node is reduced to two (which results in an algorithm similar to path-based routing). QBM conforms to double-xy routing and is more suitable for bulk multicasts. 3 HARDWARE TREE-BASED MULTICAST ROUTING ALGORITHM (HTA) HTA is a routing scheme that combines two distinct routing algorithms, one for unicast communication and one for multicast communication. The well-known deadlock-free routing algorithm proposed in [6], [1] is used for unicast messages (briefly explained in Section 3.1). Multicast messages use the fully adaptive, tree-based routing algorithm explained in Sections 3.1 through 3.4. Both routing algorithms are implemented within the same network and each message is assigned to the appropriate routing algorithm when input into the network. Although the algorithm proposed in [6], [1] is used here for the unicast routing in HTA, many existing unicast routing schemes are fully compatible with HTA. The main characteristics of HTA are:. Virtual cut-through switching is used with distinct virtual paths for unicast and multicast messages, each path using three VCs per dimension (total of six VCs per dimension).

3 KUMAR ET AL.: A NEW ADAPTIVE HARDWARE TREE-BASED MULTICAST ROUTING IN K-ARY N-CUBES 3. Unicast messages are routed using the deadlockfree routing algorithm proposed in [6], [1] (see Section 3.1).. Multicast messages are routed using tree-based, fully adaptive routing along with a deadlock detection and recovery scheme (see Sections 3.1 and 3.2) and delayed header flit routing (see Section 3.1). Each message is composed of all header flits followed by all data flits. Each header flit holds one destination address and destinations do not need to be ordered within a multicast message. 3.1 The Routing Scheme Our proposed HTA scheme consists of two separate routing algorithms for unicast and multicast messages which are described below Unicast Communication Routing Algorithm The deadlock-free routing algorithm proposed in [6], [1] is used for unicast communication in HTA and is an adaptive routing algorithm based on dimension-order routing. In this adaptive routing scheme, a message is routed on any adaptive channel until it is blocked. Once blocked, a message is routed using dimension-order routing if possible. A message may return to the adaptive channels in the following routing decisions if the adaptive channels are available. When a message is routed using dimension-order routing, it is routed along decreasing dimensions with a dimension decrease occurring only when zero hops remain in all higher dimensions. By assigning an order to the network dimensions, no cycle exists in the channeldepency graph and the algorithm is deadlock-free. A minimum of three VCs per dimension is required for deadlock-free routing in k-ary n-cubes. Two VCs per dimension are used for dimension-order routing and all remaining VCs are used for adaptive routing Multicast Communication Routing Algorithm Multicast messages are routed through the network using a tree-based, delayed header flit routing algorithm along with a deadlock detection and recovery scheme. The first header flit of a multicast message is routed to any free channel using the following priority scheme: The message first requests any free channel in the dimension in which it has the greatest distance left to travel. If more than one dimension has the same distance left to travel, a dimension is randomly selected. If there are no free channels within the selected dimension, then any free channel in the dimension with the next furthest distance left to travel is requested. This type of requesting continues until a channel has been assigned to the header flit of this message or until no free channels have been found. If no free channels are found, the header flit at the top of the queue blocks. No other header flits can be routed until this header flit has been routed. After the header flit is routed at the current node, it is then moved to the neighboring node's queue. Because delayed header flit routing is used (explained in Section 3.2), the header flit just routed remains at the neighboring queue until all remaining header flits in this multicast message have been routed at the current node. After the first header flit is routed, all remaining header flits are routed in the same manner as the first header flit, with one exception. When each of the remaining header flits reaches the top of the queue, it is first routed (if possible) to any channel already allocated to this multicast message by any of the preceding header flits that have already been routed. If this header flit cannot be routed in any of the previously routed dimensions, then it is routed using the priority scheme described above for the first flit in the message. By trying to route the remaining header flits to already allocated channels for each multicast message, extra channels are only assigned to the multicast message when necessary. This keeps other channels available for other multicast messages in the network and reduces the probability of blocking since a smaller number of channels are assigned per node for each multicast message. Once all header flits for each multicast message are routed, the data flits for this message are moved simultaneously to all channels allocated to this multicast message. HTA allows full adaptivity for multicast messages since there is no channel routing restriction. To deal with potentially deadlocked situations, the deadlock detection and recovery scheme described in Section 3.3 is used. The schematic of the HTA routing algorithm is shown in Fig. 1; the pseudocode is provided in the Appix. 3.2 Header Flit Routing The scheme most commonly used for routing header flits [10], [13], [4] is referred to here as immediate header flit routing. To increase performance, a new type of scheme is proposed, called delayed header flit routing Immediate Header Flit Routing When a header flit is routed at the current node and moved to a neighboring queue, it is routed at this neighboring node without waiting for the remaining header flits in the message to be routed. Fig. 2 shows an example of immediate header flit routing. In this figure, header flit A:2 is blocked while header flit A:1 continues to be routed, holding VCs and causing message B to block Delayed Header Flit Routing HTA uses delayed header flit routing to lower the probability of messages blocking and to increase performance. In delayed header flit routing, a header flit at a neighboring node is prevented from being routed until all header flits at the current node have been routed. Because, in tree-based routing, the remaining header flits may not be immediately routed, it's more advantageous to keep all header flits within close proximity of one another using delayed header flit routing. This close proximity prevents header flits from being assigned to queues at downstream nodes before all flits in the message can use them. This keeps the downstream queues free so that they are available for other messages in the network that can use them immediately. This scheme only requires a small additional amount of control logic to detect when all

4 4 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 Fig. 1. HTA Routing Algorithm for a multicast message. header flits at the current node have been routed and one extra control line per VC is used to notify the header flits at the neighboring nodes when routing can continue. Fig. 3 shows an example of delayed header flit routing. In this figure, header flit A:2 is blocked while header flit A:1 waits at the neighbor node until the last header flit in this message (flit A:2) is routed. Message B can now be routed. As the number of destinations grows, delayed header flit routing becomes increasing important in reducing the probability of message blocking. 3.3 Deadlock Detection and Recovery Mechanism In HTA, each node has a dedicated holding queue, called the deadlock queue. If a header flit currently under consideration cannot be routed to a channel in a predetermined amount of time (timeout delay), the header flit is considered to be in a potential deadlock situation and is routed to the deadlock queue at the current node. This timeout delay value will further be explored in Section 4. Once one of the header flits in a message has been assigned to the deadlock queue, all remaining header flits in the message must be routed as soon as they reach the top of

5 KUMAR ET AL.: A NEW ADAPTIVE HARDWARE TREE-BASED MULTICAST ROUTING IN K-ARY N-CUBES 5 Fig. 2. Immediate header flit routing: Header flit A:2 is blocked while header flit A:1 continues to be routed, holding VCs and causing message B to block. Fig. 3. Delayed header flit routing: Header flit A:2 is blocked while header flit A:1 waits to be routed at the neighbor node until the last header flit in this message (flit A:2) is routed; Message B can now routed. the queue. If any remaining header flit cannot be immediately routed, it is also considered to be in a potential deadlock and is routed to the deadlock queue at the current node. Messages in the deadlock queue are reinjected into the network after a predetermined amount of time (reinjection delay). This reinjection delay will be further explored in Section 4. When the deadlock queue is full and another message is potentially deadlocked, an interrupt is generated and the message is absorbed into the current node. When space is available in the router's deadlock queue, the message is prefetched from the local processing node and moved to the deadlock queue in the router. Messages in the deadlock queue have priority over those messages that are newly generated at the same node. By allowing the overflow of messages to be stored in the local processing node, this deadlock queue becomes essentially infinite for all practical purposes without causing any additional delay in routing and eliminates the possibility of deadlock. Fig. 4 shows an example of a potentially deadlocked situation. Flit A:1 has been routed to a neighboring queue in the X dimension. Flit A:2 has not been routed in T number of cycles (timeout delay) and has been routed to the deadlock queue at the current node. Since this flit is potentially deadlocked, all remaining header flits in this message (Flit A:3) must now be immediately routed. Since Flit A:3 requests the queue already occupied by Message B, this flit cannot be immediately routed to a free channel and must also be routed to the deadlock queue. After a predetermined amount of time (reinjection delay), the message that was routed to the deadlock queue is moved to the source queue at the current node and reinjected into the network. Fig. 4. Potentially deadlocked situation: Flit A:1 has been routed to a neighboring queue. Flit A:2 has timed-out and been routed to the deadlock queue, requiring flit A:3 to be immediately routed. Since flit A:3 requests the queue occupied by Message B, it must also be routed to the deadlock queue behind flit A:2.

6 6 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 Fig. 5. Schematic of 2D router for HTA. 3.4 HTA Router Implementation The HTA router implementation is shown in Fig. 5 and uses one unidirectional physical channel (PC) per dimension per node. Six VCs are multiplexed over one PC, with three of the VCs dedicated to unicast and three dedicated to multicast. Only one VC is required for the multicast communication algorithm. However, three VCs are used here in order to decrease message blocking probability and to increase performance. At least one sink channel is used for each node. Once a sink channel is assigned to a message, it is not released until the whole message has finished its transmission. Storage buffers are associated with input (IP) channels, requiring the routing decision to be made after buffering the message. A crossbar is used to connect input buffers to output buffers allowing the transfer of data flits to multiple VCs. 4 PERFORMANCE EVALUATION BY SIMULATION Extensive simulation experiments were carried out to compare the performance of our proposed HTA scheme with the two most representatives of the existing multicast schemes, e.g., Software Multicast (one unicast message is sent for every destination address in the multicast message) and Column-Path [3], [2]. (The destinations in a multicast message are placed into submulticast messages according to the column the destination is in. For example, in a unidirectional torus, at most k submulticast messages can be sent per multicast message, one submulticast message sent per column.) A discrete-time simulator was used for 8-ary 2-cube and 16-ary 2-cube networks. Message sizes varied from 16 to 64 flits and the number of destinations per multicast message was randomly chosen and varied from eight to 32. The buffer sizes used in the simulation are all equal to a single message length. All router implementations use six VCs per dimension. The Software Multicast and Column- Path algorithms both use two VCs per dimension for deterministic routing and four for adaptive. The timeout and reinjection delays for all message sizes and number of destinations per message simulated here for HTA are 16 and 50 clock cycles, respectively. Fifty cycles is a feasible delay because the deadlock detection and recovery scheme is not a software-based approach. Instead, the deadlock queue that holds potentially deadlocked messages in HTA is located in the router. The deadlock queue can hold one multicast message with all overflow messages being absorbed by the local processing node. The communication startup time required for ordering the messages in the Column-Path algorithm is not included in the simulations. The time required for creating and placing the messages in the source queue is also not included for any of the routing algorithms simulated here. The simulations use a stabilization threshold of a difference between traffic 1,000 clock cycles apart to determine steady state. Traffic was varied from 0.1 until saturation was reached in 0.1 increments. Simulations were performed for traffic composed of only multicast communication, only unicast communication, and half unicast and half multicast communication. To reduce the probability of deadlock near saturation, injection limitation schemes are often used [12], [19], [14]. In

7 KUMAR ET AL.: A NEW ADAPTIVE HARDWARE TREE-BASED MULTICAST ROUTING IN K-ARY N-CUBES 7 Fig. 6. Deadlock probabilities. the simulations performed here, message injections were limited to three unicast and one multicast message in the source queue simultaneously. This back-pressure mechanism sometimes results in a fairly flat curve near saturation in some of the latency versus target traffic graphs. All implementations use 12 sink channels. Although this is an unusually high number of consumption channels, the Column-Path routing algorithm requires this many channels for deadlock freedom since the adaptive routing algorithm proposed in [6], [1] is used for the base routing conformed path (as opposed to e-cube routing, where less number of sink channels are required). For fairness, the Software Multicast and HTA are also simulated with 12 sink channels, although both only require one sink channel. Fig. 6 shows the probability of deadlock versus normalized applied load for HTA. Fig. 7 shows the message latencies as well as the accepted load versus offered load

8 8 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 Fig. 7. All multicast communications for an 8-ary 2-cube network with message size = 64 flits. Fig. 8. All multicast communication for a 16-ary 2-cube network with message size = 64 flits. plots of the three multicast routing algorithms for all multicast traffic for various message sizes and number of destination nodes. The remaining figures (Figs. 8, 9, and 10) show plots only for message latencies versus offered load since all other accepted traffic versus offered traffic graphs are similar to those in Fig. 7. When both unicast and multicast communication are simulated simultaneously, message latency includes both unicast and multicast latency. A discussion of these results is found in the following sections. 4.1 Deadlock Probability Fig. 6 shows the probability of deadlock versus normalized applied load for HTA. The probability of deadlock is the total number of potentially deadlocked messages (PDM) divided by the number of messages that have reached their destinations. The probabilities are low except near saturation. Taking into account the differences in the simulations (e.g., time-out and reinjection delays, bidirectionality, switching type, message and network size), HTA's results are comparable to those reported for k-ary n-cubes under unicast traffic in [20], [23], [9], [7], [14].

9 KUMAR ET AL.: A NEW ADAPTIVE HARDWARE TREE-BASED MULTICAST ROUTING IN K-ARY N-CUBES 9 Fig. 9. All unicast communication with message size = 64 flits. In schemes such as DISHA, when a potential deadlock occurs, one of the messages in the deadlocked set is removed from the network, using additional buffers at each node to route the message directly and immediately to its destination, and, therefore, the PDM does not have another chance of deadlocking [22], [23], [20]. In HTA, the PDM is removed from the network at the current node and is reinjected into the network after a given amount of time at the same current node in which it deadlocked. The reinjected PDM may potentially deadlock again along the path to its destination node. HTA does not require as much complex logic, is scalable, and does not have a single point of failure. Increasing the number of destinations per message results in a greater probability that a destination will block, increasing deadlock probability. The greater the message size, the greater the deadlock probability because more resources are occupied. Increasing network size results in a longer path between source and destination and also results in greater deadlock probability. 4.2 Multicast Latency Message Latency Fig. 7 shows that HTA performs best among all three algorithms at all utilization and for all message sizes, even without including the time for destination ordering in the Column-Path algorithm. This is because the probability of message blocking has been kept low, the deadlock detection and recovery algorithm is efficient, and because tree-based routing does not unnecessarily copy data flits when routing multicast messages. As the number of destinations per message increases, the Column-Path algorithm performs better (although not as well as HTA) because more destinations can be grouped together, requiring a smaller number of submulticast messages to be sent per multicast message Saturation Point HTA always has the highest saturation point since channels are only used and data flits are only copied when necessary. Traffic in the network is kept low, resulting in increased saturation points Effects of Network Size HTA's performance increases as the 8-ary 2-cube network is increased in size to a 16-ary 2-cube network (Fig. 8). Its performance is always better than the other two algorithms for all message sizes. The Column-Path algorithm performance suffers slightly due to the lower probability that messages will fall in the same column and therefore uses more submulticast messages. The HTA is much more flexible with respect to topology and its latency remains low Effects of Traffic Type (Unicast vs. Multicast) When traffic is composed of all unicast messages (Fig. 9), the Column-Path and Software Multicast algorithms give similar performance because both these algorithms use the same unicast routing scheme and have the same number of

10 10 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 Fig. 10. Fifty percent unicast and 50 percent multicast communications for an 8-ary 2-cube network with message size = 64 flits. VCs devoted to unicast messages. The slight variances are simply due to the random generation of messages. At low utilization, HTA performs best because only three VCs per dimension are devoted to unicast communication, while the other two algorithms devote all six VCs. Having a smaller number of VCs per dimension means less message multiplexing, resulting in lower message latencies. At high utilization, the Column-Path and Software Multicast algorithms perform better because each of these algorithms has six VCs per dimension for unicast messages. Six VCs provide greater adaptivity for messages to be routed around blocked messages at high utilization, resulting in higher saturation points. When traffic is composed of half unicast and half multicast traffic (Fig. 10), unicast and multicast latency versus applied load graphs are shown. HTA performs comparably to or better than the other algorithms for all messages sizes and utilization. 5 COMPARISON OF HTA WITH EXISTING SCHEMES HTA differs in many important respects from the previously proposed tree-based routing algorithms [10], [13], [5], [4]. Below, we provide a detailed comparison of HTA with the existing schemes.. In most tree-based routing algorithms, wormhole switching is implemented. Virtual cut-through switching can greatly decrease the probability of deadlock over wormhole switching when a fixed number of virtual channels are used [23]. Although, in [5], [4], cut-through switching is used, the switching is implemented using a common buffer pool and the buffer is located in the local processing node (not in the router itself). In HTA, a buffer is implemented at every VC and every buffer is located in the router. This greatly increases performance.. In [5], [4], only one PC (with no VCs per PC) is used per dimension and only one sink channel is implemented. Increasing the number of VCs increases routing freedom, which in turn exponentially decreases the probability of deadlock [23], [20] and results in better performance.. Although [5], [4] use a deadlock detection and recovery scheme, the HTA scheme is more efficient. When a header flit blocks for a predetermined amount of time in HTA, the header flit is routed to the deadlock queue. All remaining header flits are then immediately routed to, first, any previously routed channels, then to any available and applicable channel, and, finally, to the deadlock queue if no other routing option remains.. The HTA scheme improves upon the deadlock recovery method in [5], [4] in which all header flits (including all those that have already been routed at the current node and at any of the downstream nodes) are aborted. In addition, for their deadlock recovery scheme, the entire message is always copied to the local processing node when a multicast is split (whether deadlock occurs or not). When an abort does happen, the message is already stored at the local node and is ready to be reinjected into the network after a given amount of time. However, this method wastes valuable channel bandwidth and causes contention in the network if two multicast channels request the split channel.. Choices for timeout values for deadlock detection and recovery schemes include timeouts equivalent to the size of the message [9], four times the message length [14], 8-16 cycles [21], and 800-1,000 cycles [19]. Tree-based multicast communication timeout requirements are slightly different than those for unicast communication. In tree-based multicast, more than one header flit is usually routed at each node. If the timeout is too great for each header flit, message progress through the network will be very slow since data flits are not forwarded until all header flits are routed. If the timeout is too small, many false deadlocks will result. HTA uses 16 cycles for all message sizes and number of destinations per message.. Reinjection delays for unicast communication on wormhole switching under k-ary n-cube networks are around 200 cycles [19], [14]. Deadlock detection and recovery techniques similar to DISHA [21] do not require reinjection delays because, when messages are potentially deadlocked, they are immediately routed using ªfloating buffersº to their destination. HTA uses a reinjection delay of 50 cycles.

11 KUMAR ET AL.: A NEW ADAPTIVE HARDWARE TREE-BASED MULTICAST ROUTING IN K-ARY N-CUBES 11. It was shown in [23] that bidirectional networks have a lower deadlock probability than unidirectional networks. Deadlock recovery schemes using bidirectional channels include [9], [19], [23], [14]. Unidirectional channels are addressed in [23], [7]. HTA uses unidirectional channels.. The QBM scheme [24] is a deadlock-free algorithm. This limits routing options to only those paths valid for deadlock freedom. The scheme also requires a startup delay for building a QBM tree at the ning of a user-level multicast, making it more suitable for bulk multicasts. HTA has more routing flexibility since any available path is a valid routing option in its deadlock detection and recovery scheme and HTA does not require any additional startup delay.. Several machines already have some hardware support for multicast. The ncube-2 is a wormholeswitched hypercube which supports broadcast within each subcube [15]. However, deadlock is possible if multiple multicasts exist [16]. The NEC Cenju-3 supports broadcast within each continuous region, but deadlock is once again possible if multiple multicasts exist. Finally, the Thinking Machines Corporation (TMC) CM-5 supports one multicast at a time via the control network. 6 CONCLUSIONS In this paper, we introduce a new fully adaptive minimal hardware tree-based routing algorithm (HTA) for multicast communication under virtual cut-through switching in k-ary n-cubes and present experimental evaluation of its performance under different operating conditions. HTA is compatible with existing unicast routing algorithms and uses deadlock detection and recovery. Our experimental results demonstrate that the deadlock probability in the proposed scheme remains low except near saturation; the probabilities are comparable to other existing schemes and vary between 0 and 15 percent, except near saturation, where it goes up to 30 percent. HTA performs very well and can outperform both Software Multicast and Column-Path algorithms. The superiority of the proposed multicast routing algorithm is due to its ability to keep the probability of message blocking at each intermediate node along a multicast message's path low. APPENDIX PSEUDOCODE OF THE HTA SCHEME Note: i = current node that flit is at i 0 = previous node that flit was at j = current queue that flit is in j 0 = previous queue that flit was in (all other variable meanings should be obvious from the context of the pseudocode) FUNCTION 1: function route_msg_at_node () delayed_header_flit_condition = FALSE; for (flit=0; flit;num_flits_in_msg; flit++) if flit = header flit while delayed_header_flit_condition = FALSE for (k=0; k;num_header_flits_in_msg; k++) if node[i'].queue[j'].flit[k]routed = TRUE then num_flits_routed++; if num_flits_routed = num_header_flits_in_msg then delayed_header_flit_condition = TRUE; route_header_flit_at_top_of_queue (flit); else forward data flit to all allocated paths for this msg at this node; if at least one header flit in current msg has been routed to deadlock q at current node then while node[i].deadlock_q.reinject_delay++ ; reinject_threshold node[i].deadlock_q.reinject_delay++; Place message from deadlock q into source q; FUNCTION 2: function route_header_flit_at_top_of_queue(header_flit) if header_flit = unicast then Route header_flit using unicast deadlock-free algorithm; else if header_flit can be routed to at least one channel already routed to by another flit in this msg then Route to a previously routed channel using a roundrobin policy among all previously routed channels; exit; else if header_flit can be routed to one or more free multicast channels at this node then Prioritize dimensions for this current header flit so

12 12 IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 that the dimension with the greatest distance this flit has to travel has the highest priority (priority n) followed by the next greatest distance this flit has left to travel (with priority n-1), and so on; for (p=n; p>0; p±) if can route current header_flit to any free multicast channel in the dimension with priority p then Route header flit on a free multicast channel in dimension with priority p; exit; else if any previous flit in this message at this node timed out then Route header flit to deadlock q; exit; else header_flit.timeout++; if header_flit.timeout > threshold then Route header_flit to deadlock q; exit; [10] X. Lin, P. McKinley, and L. Ni, ªPerformance Evaluation of Multicast Worm-Hole Routing in 2D Mesh Multicomputers,º Proc. Int'l Conf. Parallel Processing, pp , [11] X. Lin and L.M. Ni, ªDeadlock-Free Multicast Wormhole Routing in Multicomputer Networks,º Proc. Int'l Symp. Computer Architecture, pp , [12] P. Lopez, J. Martinez, J. Duato, and F. Petrini, ªOn the Reduction of Deadlock Frequency by Limiting Message Injection in Wormhole Networks,º Proc. Parallel Computing, Routing, and Comm. Workshop, June [13] M. Malumbres, J. Duato, and J. Torrellas, ªAn Efficient Implementation of Tree-Based Multicast Routing for Distributed Shared-Memory Multiprocessors,º Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp , Oct [14] J. Martinez, P. Lopez, J. Duato, and T. Pinkston, ªSoftware-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks,º Proc. Int'l Conf. Parallel Processing, Aug [15] NCUBE Co., NCUBE 6400 Processor Manual, [16] L. Ni, ªShould Scalable Parallel Computers Support Efficient Hardware Multicast,º Proc. Int'l Conf. Parallel Processing, [17] L.M. Ni and P.K. McKinley, ªA Survey of Wormhole Routing Techniques in Direct Networks,º Computer, pp , [18] D.K. Panda, S. Singal, and P. Prabhakaran, ªMultidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme,º Proc. First Parallel Routing and Comm. Workshop, [19] F. Petrini, J. Duato, P. Lopez, and J. Martinez, ªLIFE: A Limited Injection, Fully Adaptive, Recovery-Based Routing Algorithm,º Proc. Fourth Int'l Conf. High Performance Computing, Dec [20] T. Pinkston and S. Warnakulasuriya, ªOn Deadlocks in Interconnection Networks,º Proc. Int'l Symp. Computer Architecture, pp , June [21] V. Anjan and T. Pinkston, ªAn Efficient, Fully Adaptive Deadlock Recovery Scheme: DISHA,º Computer Architecture News, vol. 23, no. 2, May [22] K.V. Anjan and T. Pinkston, ªAn Efficient, Fully Adaptive Deadlock Recovery Scheme: DISHA,º Proc. Int'l Symp. Computer Architecture, pp , [23] S. Warnakulasuriya and T. Pinkston, ªCharacterization of Deadlocks in Interconnection Networks,º Proc. Int'l Parallel Processing Symp., Apr [24] J. Yang and C. King, ªEfficient Tree-Based Multicast in Wormhole- Routed 2D Meshes,º Proc. Int'l Symp. Parallel Architectures, Algorithms, and Networks, REFERENCES [1] P. Berman, L. Gravano, G. Pifarre, and J. Sanz, ªAdaptive Deadlock and Livelock Free Routing with All Minimal Paths in Torus Networks,º Proc. Symp. Parallel Algorithms and Architectures, pp. 3-12, [2] R. Boppana, S. Chalasani, and C. Raghavra, ªOn Multicast Wormhole Routing in Multicomputer Networks,º Proc. Symp. Parallel and Distributed Processing, Oct [3] R. Boppana, S. Chalasani, and C. Raghavra, ªResource Deadlocks and Performance of Wormhole Multicast Routing Algorithms,º IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 6, June [4] G. Byrd, R. Nakano, and B. Delagi, ªA Dynamic Cut-Through Communication Protocol with Multicast,º Technical Report STAN-CS , Stanford Univ., Aug [5] G. Byrd, N. Saraiya, and B. Delagi, ªMulticast Communication in Multiprocessor Systems,º Proc. Int'l Conf. Parallel Processing, vol. 1, pp , Aug [6] J. Duato, ªA New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks,º Proc. Symp. Parallel and Distributed Processing, pp , [7] A. Folkestad and C. Roche, ªDeadlock Probability in Unrestricted Wormhole Routing Networks,º Proc. IEEE Int'l Conf. Comm., June [8] P. Kermani and L. Kleinrock, ªVirtual Cut-Through: A New Computer Communication Switching Technique,º Computer Networks, vol. 3, pp , [9] J. Kim, Z. Liu, and A. Chien, ªCompressionless Routing,º Proc. Int'l Symp. Computer Architecture (ISCA), Apr Dianne Kumar received the BS degree in applied physics from Xavier University in 1992 and the MS degree in electrical engineering and the PhD degree in computer science from Colorado State University in 1994 and 1999, respectively. She is currently an assistant professor in the Department of Computer Science and Engineering at the University of Colorado at Denver. Her current research interests include multiprocessor systems, interconnection networks, parallel and distributed computing, and networking. She is a member of the IEEE and of the ACM. Walid A. Najjar received the BE degree in electrical engineering from the American University of Beirut in 1979 and the MS and PhD degrees in computer engineering from the University of Southern California in 1985 and 1988, respectively. He is an associate professor in the Department of Computer Science and Engineering at the University of California Riverside. He was on the faculty of the Department of Computer Science at Colorado State University (1989 to 2000), before that he was with the USC-Information Sciences Institute. His research interests include computer architecture, reconfigurable and embedded systems, parallel computing systems, and interconnection networks.

13 KUMAR ET AL.: A NEW ADAPTIVE HARDWARE TREE-BASED MULTICAST ROUTING IN K-ARY N-CUBES 13 Pradip K. Srimani is a professor and chair of the Department of Computer Science at Clemson University. He has previously served on the faculty of the India Statistical Institute, Calcutta, Gesselschaft fuèr Mathematik und Datenverarbeitung, Bonn, West Germany, Indian Institute of Management, Calcutta, India, and Southern Illinois University, Carbondale, Illinois, Colorado State University, Ft. Collins, Colorado, and the Technical University of Compiegne, France. He was the editor-in-chief of the IEEE Computer Society Press and is an associate editor of the IEEE Transactions on Data and Knowledge Engineering and a contributing member of IEEE Software. His research interests include mobile computing, distributed computing, parallel algorithms, networks, and graph theory applications. He is a co-editor of two books on software reliability and distributed mutual exclusion algorithms by IEEE CS Press. He has guest-edited special issues for Computer, IEEE Software, VLSI Design, Journal of Systems & Software, and Journal of Computer & Software Engineering, IEEE Transactions on Software Engineering, Parallel Computing, International Journal of Systems Science. He is a member of the ACM/IEEECS Steering Committee on Curricula He is a fellow of the IEEE and a member of the ACM.. For further information on this or any computing topic, please visit our Digital Library at

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh

More information

The Odd-Even Turn Model for Adaptive Routing

The Odd-Even Turn Model for Adaptive Routing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 7, JULY 2000 729 The Odd-Even Turn Model for Adaptive Routing Ge-Ming Chiu, Member, IEEE Computer Society AbstractÐThis paper presents

More information

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,

More information

A New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia

A New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia A New Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks J. Duato Facultad de Informatica Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia, SPAIN E-mail: jduato@aii.upv.es

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Anjan K. V. Timothy Mark Pinkston José Duato Pyramid Technology Corp. Electrical Engg. - Systems Dept.

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

MESH-CONNECTED networks have been widely used in

MESH-CONNECTED networks have been widely used in 620 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 5, MAY 2009 Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model Dong Xiang, Senior Member, IEEE, Yueli Zhang,

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks J. M. Martínez, P. López, J. Duato T. M. Pinkston Facultad de Informática SMART Interconnects Group Universidad

More information

Wormhole Routing Techniques for Directly Connected Multicomputer Systems

Wormhole Routing Techniques for Directly Connected Multicomputer Systems Wormhole Routing Techniques for Directly Connected Multicomputer Systems PRASANT MOHAPATRA Iowa State University, Department of Electrical and Computer Engineering, 201 Coover Hall, Iowa State University,

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University

More information

Characterization of Deadlocks in Interconnection Networks

Characterization of Deadlocks in Interconnection Networks Characterization of Deadlocks in Interconnection Networks Sugath Warnakulasuriya Timothy Mark Pinkston SMART Interconnects Group EE-System Dept., University of Southern California, Los Angeles, CA 90089-56

More information

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors Govindan Ravindran Newbridge Networks Corporation Kanata, ON K2K 2E6, Canada gravindr@newbridge.com Michael

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Efficient Communication in Metacube: A New Interconnection Network

Efficient Communication in Metacube: A New Interconnection Network International Symposium on Parallel Architectures, Algorithms and Networks, Manila, Philippines, May 22, pp.165 170 Efficient Communication in Metacube: A New Interconnection Network Yamin Li and Shietung

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Connection-oriented Multicasting in Wormhole-switched Networks on Chip

Connection-oriented Multicasting in Wormhole-switched Networks on Chip Connection-oriented Multicasting in Wormhole-switched Networks on Chip Zhonghai Lu, Bei Yin and Axel Jantsch Laboratory of Electronics and Computer Systems Royal Institute of Technology, Sweden fzhonghai,axelg@imit.kth.se,

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Adaptive Multimodule Routers

Adaptive Multimodule Routers daptive Multimodule Routers Rajendra V Boppana Computer Science Division The Univ of Texas at San ntonio San ntonio, TX 78249-0667 boppana@csutsaedu Suresh Chalasani ECE Department University of Wisconsin-Madison

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

True fully adaptive routing employing deadlock detection and congestion control.

True fully adaptive routing employing deadlock detection and congestion control. True fully adaptive routing employing deadlock detection and congestion control. 16 May, 2001 Dimitris Papadopoulos, Arjun Singh, Kiran Goyal, Mohamed Kilani. {fdimitri, arjuns, kgoyal, makilani}@stanford.edu

More information

Generic Methodologies for Deadlock-Free Routing

Generic Methodologies for Deadlock-Free Routing Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ J. Flich, P. López, M. P. Malumbres, and J. Duato Dept. of Computer Engineering

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

3-ary 2-cube. processor. consumption channels. injection channels. router

3-ary 2-cube. processor. consumption channels. injection channels. router Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths 1 Dhabaleswar K. Panda, Sanjay Singal, and Ram Kesavan Dept. of Computer and Information Science The

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

A MULTI-PATH ROUTING SCHEME FOR TORUS-BASED NOCS 1. Abstract: In Networks-on-Chip (NoC) designs, crosstalk noise has become a serious issue

A MULTI-PATH ROUTING SCHEME FOR TORUS-BASED NOCS 1. Abstract: In Networks-on-Chip (NoC) designs, crosstalk noise has become a serious issue A MULTI-PATH ROUTING SCHEME FOR TORUS-BASED NOCS 1 Y. Jiao 1, Y. Yang 1, M. Yang 2, and Y. Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Dept. of Electrical and Computer

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201

Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201 Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201 Yoshiko Yasuda, Hiroaki Fujii, Hideya Akashi, Yasuhiro Inagami, Teruo Tanaka*,

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors

A Multicast Routing Algorithm for 3D Network-on-Chip in Chip Multi-Processors Proceedings of the World Congress on Engineering 2018 ol I A Routing Algorithm for 3 Network-on-Chip in Chip Multi-Processors Rui Ben, Fen Ge, intian Tong, Ning Wu, ing hang, and Fang hou Abstract communication

More information

Routing and Deadlock

Routing and Deadlock 3.5-1 3.5-1 Routing and Deadlock Routing would be easy...... were it not for possible deadlock. Topics For This Set: Routing definitions. Deadlock definitions. Resource dependencies. Acyclic deadlock free

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

AC : HOT SPOT MINIMIZATION OF NOC USING ANT-NET DYNAMIC ROUTING ALGORITHM

AC : HOT SPOT MINIMIZATION OF NOC USING ANT-NET DYNAMIC ROUTING ALGORITHM AC 2008-227: HOT SPOT MINIMIZATION OF NOC USING ANT-NET DYNAMIC ROUTING ALGORITHM Alireza Rahrooh, University of Central Florida ALIREZA RAHROOH Alireza Rahrooh is a Professor of Electrical Engineering

More information

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Tsutomu YOSHINAGA, Hiroyuki HOSOGOSHI, Masahiro SOWA Graduate School of Information Systems, University of Electro-Communications,

More information

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC

More information

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs.

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Internetworking Multiple networks are a fact of life: Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Fault isolation,

More information

BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks

BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks Mithuna Thottethodi Λ Alvin R. Lebeck y Shubhendu S. Mukherjee z Λ School of Electrical and Computer Engineering Purdue University

More information

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636 1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

A DAMQ SHARED BUFFER SCHEME FOR NETWORK-ON-CHIP

A DAMQ SHARED BUFFER SCHEME FOR NETWORK-ON-CHIP A DAMQ HARED BUFFER CHEME FOR ETWORK-O-CHIP Jin Liu and José G. Delgado-Frias chool of Electrical Engineering and Computer cience Washington tate University Pullman, WA 99164-2752 {jinliu, jdelgado}@eecs.wsu.edu

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks

Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks Southern Illinois University Carbondale OpenSIUC Publications Department of Computer Science 2008 Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks Bidyut Gupta Southern

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Fabrizio Petrini Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD, England e-mail: fabp@comlab.ox.ac.uk

More information

COMMUNICATION DELAY IN WORMHOLE-ROUTED TORUS NETWORKS

COMMUNICATION DELAY IN WORMHOLE-ROUTED TORUS NETWORKS COMMUNICATION DELAY IN WORMHOLE-ROUTED TORUS NETWORKS A. Shahrabi, M. Ould-Khaoua, L. Mackenzie Computing Science Department, Glasgow University, Glasgow, UK Tel: +44 141 339 8855 ext. 0914, Fax: +44 141

More information

Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov

Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov I Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov I. INTRODUCTION nterconnection networks originated from the

More information

Message-Ordering for Wormhole-Routed Multiport Systems with. Link Contention and Routing Adaptivity. Dhabaleswar K. Panda and Vibha A.

Message-Ordering for Wormhole-Routed Multiport Systems with. Link Contention and Routing Adaptivity. Dhabaleswar K. Panda and Vibha A. In Scalable High Performance Computing Conference, 1994. Message-Ordering for Wormhole-Routed Multiport Systems with Link Contention and Routing Adaptivity Dhabaleswar K. Panda and Vibha A. Dixit-Radiya

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY 2003 1 Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing José Flich, Member, IEEE, Pedro López, Member, IEEE Computer

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Homogeneous Routing for Homogeneous Traffic Patterns on Meshes

Homogeneous Routing for Homogeneous Traffic Patterns on Meshes IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 8, AUGUST 2000 781 Homogeneous Routing for Homogeneous Traffic Patterns on Meshes L.D. Aronson AbstractÐThe performance analysis of dynamic

More information

Deadlock- and Livelock-Free Routing Protocols for Wave Switching

Deadlock- and Livelock-Free Routing Protocols for Wave Switching Deadlock- and Livelock-Free Routing Protocols for Wave Switching José Duato,PedroLópez Facultad de Informática Universidad Politécnica de Valencia P.O.B. 22012 46071 - Valencia, SPAIN E-mail:jduato@gap.upv.es

More information

Ecube Planar adaptive Turn model (west-first non-minimal)

Ecube Planar adaptive Turn model (west-first non-minimal) Proc. of the International Parallel Processing Symposium (IPPS '95), Apr. 1995, pp. 652-659. Global Reduction in Wormhole k-ary n-cube Networks with Multidestination Exchange Worms Dhabaleswar K. Panda

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

Communication in Multicomputers with Nonconvex Faults?

Communication in Multicomputers with Nonconvex Faults? In Proceedings of EUROPAR 95 Communication in Multicomputers with Nonconvex Faults? Suresh Chalasani 1 and Rajendra V. Boppana 2 1 Dept. of ECE, University of Wisconsin-Madison, Madison, WI 53706-1691,

More information

Exploring Multiple Paths using Link Utilization in Computer Networks

Exploring Multiple Paths using Link Utilization in Computer Networks 7 Exploring Multiple Paths using Link Utilization in Computer Networks 1 Shalini Aggarwal, 2 Shuchita Upadhyaya 1 Teacher Fellow, Department of Computer Science & Applications, Kurukshetra University Kurukshetra,

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing

Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing 808 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 12, NO. 8, AUGUST 2001 Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing Ram Kesavan and Dhabaleswar

More information

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Natawut Nupairoj and Lionel M. Ni Department of Computer Science Michigan State University East Lansing,

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 03, 2014 ISSN (online):

IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 03, 2014 ISSN (online): IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 03, 2014 ISSN (online): 2321-0613 Performance Evaluation of TCP in the Presence of in Heterogeneous Networks by using Network

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information