Characterization of Deadlocks in Interconnection Networks

Size: px
Start display at page:

Download "Characterization of Deadlocks in Interconnection Networks"

Transcription

1 Characterization of Deadlocks in Interconnection Networks Sugath Warnakulasuriya Timothy Mark Pinkston SMART Interconnects Group EE-System Dept., University of Southern California, Los Angeles, CA Abstract Deadlock-free routing algorithms have been developed recently without fully understanding the frequency and characteristics of deadlocks. Using a simulator capable of true deadlock detection, we measure a network's susceptibility to deadlock due to various design parameters. The effects of bidirectionality, routing adaptivity, virtual channels, buffer size and node degree on deadlock formation are studied. In the process, we provide insight into the frequency and characteristics of deadlocks and the relationship between routing flexibility, blocked messages, resource dependencies and the degree of correlation needed to form deadlock. 1 Introduction Interconnection network routing algorithms aim to minimize message blocking by efficiently utilizing network virtual channel and physical channel resources while ensuring deadlock freedom. Routing approaches to accomplish this can be based on avoiding deadlock or on recovering from deadlock. The main distinction between these two approaches is the decision made in trading off routing freedom and deadlock formation. Avoidance-based routing algorithms enforce certain routing restrictions in order to altogether avoid deadlocks [1,, ]. Recovery-based routing algorithms relax routing restrictions and recover from potential deadlock situations [4, 5]. The circumstances under which either routing approach is preferable depend critically on the frequency with which deadlocks occur and the resulting effects. For instance, deadlock may be so infrequent for a particular network configuration that avoidance-based routing inefficiently uses network resources, resulting in frequent message blocking. On the other hand, deadlock may be so frequent and costly in some network configurations that avoidance-based routing outperforms recovery-based routing. This paper precisely quantifies the frequency and characteristics of deadlock formation in wormhole and cut-through k-ary n-cube networks and identifies network design parameters which influence deadlock formation. This enables us to better understand the nature of deadlocks and their likelihood and to determine the circumstances under which routing al- This research was supported by an NSF Research Initiation Award, grant ECS , and an NSF Career Award, grant ECS gorithms should be based on recovery as opposed to avoidance. In accomplishing this, we analyze the effects of different traffic patterns, bidirectionality, routing adaptivity, node degree, number of virtual channels and buffer depth on the frequency and characteristics of deadlocks. To our knowledge, no other study of router-related deadlock in interconnection networks has been performed to the detail presented here. In the next section, we classify deadlocks through example. Section presents the experiments we performed and the results. Section 4 presents related work and important findings are summarized in Section 5. Deadlock Formation Deadlocks in interconnection networks can occur as a result of cyclic resource dependencies formed when messages hold onto some resources (i.e., virtual channels) while waiting to acquire others. As a message progresses through a network, it acquires exclusive ownership of a virtual channel (VC) prior to each hop. When the header flit of a message blocks, it can be thought of as requesting the exclusive use of one of possibly many alternative VCs in order to progress to the next hop. A blocked message resumes once a new VC is acquired. As the tail of a message moves through the network, it releases previously acquired VCs no longer needed, so they can become available for other messages. The exclusive ownership and resource wait-for conditions along with the condition that messages are not preempted makes cyclic dependencies and deadlock possible..1 Depicting Deadlocks We use channel wait-for graphs (CWGs) [6] to model resource dependencies within interconnection networks. Although similar to dependency graphs used in previous work [, 7, 8], these graphs depict network state reflecting resource allocations and requests existing at a particular point in time, not the resource allocations allowed by the routing algorithm. Hence, in this context, CWGs depicting the entire network state are not necessarily connected. Figures 1 through 4 show examples of messages being routed in k-ary n-cube wormhole networks, along with the corresponding CWGs. In the network illustrations (Figures 1a, a, a and 4a), the source and destination nodes of message m i are labeled s i and d i, respectively. VC labeling in these figures is done only to facilitate explanation and is not

2 s 4 8 d s 10 5 d 1 11 d 5 d s 1 1 d 4 s 4 5 s 6 7 s 1 s 0 1 d d 4 s owned by owned by owned by owned by owned by d 5 d 6 s 4 5 d 1 4 s owned by owned by owned by owned by owned by 6 7 Figure 1. (a) A "single-cycle deadlock" for DOR with 1 VC. (b) The CWG contains a knot. intended to convey information regarding the relative positions of VCs within the network. In the CWGs (Figures 1b, b, b and 4b), vertices represent VCs. Outgoing arc(s) at each vertex are labeled with the message which currently owns that VC. A path formed by a series of solid arcs with the same label implies the temporal order in which VCs were acquired and continues to be owned by a particular message. Blocked messages are represented by connecting the ends of such paths to one or more desired VCs using dashed arcs. At any vertex, the labels of incoming dashed arcs represent the group of messages that desire to use that VC at this instant in time. Only those portions of the network's CWGs useful for illustrative purposes are shown in these figures. Figure 1a shows five messages (,,,,and ) being routed statically in dimension order within a torusconnected network with one VC. Note that messages,,and are blocked while messages and have acquired all of the channels needed to reach their destinations. Message has acquired channels 1 and, and requires to continue. Similarly, message has acquired channels, 4,and 5, and requires 6 to continue; message has acquired channels 6, 7,and 0, and requires 1 to continue. Thus, each of these blocked messages will wait indefinitely for one of the other messages in the group to release an owned VC. Figure 1b shows the CWG for the scenario in Figure 1a. There is a single cycle in this graph consisting of vertices 0, 1,,, 4, 5, 6 and 7. Given the set of all resources involved in this cycle, R = f 0 ; 1 ; ; ; 4 ; 5 ; 6 ; 7 g, observe that the set of vertices that can be reached by each and every member of R is R itself. This type of relationship formed by vertices in one or more cycles is referred to as a knot [9]. Assuming that the routing function is connected, a knot is a necessary and sufficient condition for deadlock [6].. Classifying Deadlocks..1 Single-Cycle Deadlocks Deadlock can be characterized by its deadlock set, resource set, andknot cycle density. The deadlock depicted in Figure 1 is what we refer to as a single-cycle deadlock. In Figure. (a) A "single-cycle deadlock" for minimal adaptive routing with 1 VC. (b) The CWG contains a knot. this example, the deadlock involves messages in its deadlock set f ; ; g, occupies 8 channels in its resource set f 0 ; 1 ; ; ; 4 ; 5 ; 6 ; 7 g, and has a knot cycle density of one cycle (true of all single-cycle deadlocks). Single-cycle deadlocks are more likely to occur in networks having minimal resources and/or highly restrictive routing options on available resources. As in the above example (Figure 1) of a torus network with one VC that allows only non-adaptive (static) dimension ordered routing, the routing function returns at most a single channel option. This is reflected in the CWG by a single dashed outgoing arc at any vertex in Figure 1b (maximum fan-out of one). In such a network, a single cycle is sufficient to form a knot. However, for this to occur, a correlated resource dependency among multiple messages must form. Single-cycle deadlocks are also possible in networks which use less restrictive routing (e.g., minimal adaptive routing with only one VC) when only one routing option is available to all messages comprising the deadlock set (e.g., due to faulty links or routing in the destination's dimension). An example is illustrated in Figures a and b. Here, each of the messages,, and has acquired VCs, exhausted their routing adaptivity, and are therefore waiting to acquire the one channel needed to reach their respective destinations. However, the required channels are already owned by members of this group of messages. The CWG (Figure b) contains a single cycle, and the vertices in this cycle form a knot, R = f 1 ; ; 5 ; 7 g. Hence, with a knot cycle density of one, this too is a single-cycle deadlock; its deadlock set contains 4 messages f ; ; ; g and its resource set includes 8 channels f 0 ; 1 ; ; ; 4 ; 5 ; 6 ; 7 g. This single-cycle deadlock not only requires all of the messages in the deadlock set to have exhausted their adaptivity, but also to own all of the resources needed by other messages in the deadlock set. Therefore, an even higher degree of correlation of message resource dependency is required for this type of deadlock to occur. In this example, message has acquired 8 and 9, and is waiting for a VC owned by message which is involved in the deadlock. Although the message is not able to

3 s 5 s m 6 d 4 s 5 s m 6 s 1 s d,7 d 4, d 4,6 d 1, s 6 s 0 1 m 11 m 6 m6 1 m m 8 s 1 s d,7 d d 4,6 d 1, s 6 s 0 1 m m 11 m 6 m6 1 m m 8 s 4 owned by owned by owned by owned by s 7 owned by owned by m 6 owned by owned by m m s 4 owned by owned by owned by owned by s 7 owned by owned by m 6 owned by owned by m m Figure. (a) A "multi-cycle deadlock" for minimal adaptive routing with VCs. (b) The CWG contains a knot. proceed until the deadlock is resolved, it is not considered to be in the deadlock set as its resources do not meet the condition for participation in a knot as described previously. This type of message is referred to as a dependent message and is distinguished from those messages actually in the deadlock set. The usefulness of this distinction is evident when developing deadlock detection mechanisms for recovery-based routers. The detection mechanism must be careful not to incorrectly identify dependent messages as being among those properly in the deadlock set, as removing them from the network will not resolve the deadlock. Moreover, dependent messages may be transient in that they may be able to proceed using an alternate resource not owned by one of the messages in the deadlock set... Multi-Cycle Deadlocks Figures a and b depict the network and the CWG for a more complex example of a deadlock, one comprised of multiple resource dependency cycles. This network uses minimal adaptive routing and two VCs per physical channel. Once again, all messages (...m 8 )have exhausted their adaptivity and are blocked. Each message is waiting to acquire one of two VCs needed to continue routing, both of which are owned by other members of the group. There are multiple unique cycles in the CWG. The set of all vertices involved in this group of cycles, R = f 1 ; ; 5 ; 7 ; 9 ; 11 ; 1 ; 15 g, meets the requirement for a knot. This is an example of what is referred to as a multi-cycle deadlock; its deadlock set has 8 messages f :::m 8 g, its resource set has 16 VCs f 0 ; 1 ;::: 15 g, and its knot cycle density is 4 cycles. CWGs similar to Figure b, where there are multiple outgoing dashed arcs per blocked message (fan-out > 1), are indicative of networks which allow a greater degree of routing flexibility (e.g., provide multiple VCs per physical channel, allow adaptive routing, etc.). Given that the messages in this example have exhausted their adaptivity, the vertices with a fan-out of two in Figure b correspond to a routing relation that supplies two alternative resources for each of the blocked messages. Should messages have blocked prior to exhausting their adaptivity, vertices with larger fan-out (i.e., Figure 4. (a) A "cyclic non-deadlock" for minimal adaptive routing with VCs. (b) The CWG does not contain a knot. 4) would exist in the graph. As can be seen by this example, the fan-out of vertices in the CWG, which is determined by routing adaptivity and the number of VCs per physical channel, greatly influences the number of unique cycles that can form. More importantly, increasing the routing flexibility exponentially increases the degree of correlation of resource dependency required for multiple cycles to form knots... Cyclic Non-Deadlocks A scenario in which multiple cycles exists but which does not result in deadlock (referred to as cyclic non-deadlock) is depicted in Figures 4a and 4b. This is similar to the previous example except that message 's destination is changed, allowing it to acquire the required VCs on its way to its destination. There are 8 unique cycles in the CWG. Given the set of all vertices in this group of cycles f 1 ; ; 5 ; 9 ; 11 ; 1 ; 15 g, note that vertices 7 and 16 are reachable from members of this set, but the opposite does not hold. This set (or any subset thereof) does not meet the conditions for a knot; therefore, there is no deadlock in this network. This is because message may eventually reach its destination and subsequently release 7, which will allow one of the two messages waiting for this channel ( or ) to continue. Other messages will then be able to proceed in a similar fashion. This example confirms the notion that cycles are necessary but not sufficient for deadlock, as was concluded by Duato [7]. Resource dependency graphs of deadlock avoidance algorithms based on Duato's framework may have cycles but will always have an escape resource to avoid deadlock (such as 7 in Figure 4b). The elimination of these cycles as required by some avoidance-based routing schemes is therefore overly restrictive. Similarly, eliminating cycles in a packet wait-for graph [10] to avoid deadlock is also overly restrictive the packet wait-for graph for this example clearly contains cycles, yet no deadlock exists. In summary, single-cycle deadlocks are possible in networks which have a single channel resource and limited adaptivity defined on that resource (due to static routing or exhausted adaptivity). Multi-cycle deadlocks involving highly correlated message resource dependencies are possible in networks using multiple resources and which allow

4 greater routing adaptivity over those resources. It has been shown that the number of blocked messages (number of vertices which have outgoing dashed arcs) and the flexibility in routing (fan- out of these vertices) greatly influence the formation of cycles [11]. However, deadlock occurs only when a group of cycles form a knot. Normalized Deadlocks o uni directional + bi directional Deadlock Set Size o uni directional + bi directional Deadlock Characterization Our approach for precisely detecting deadlocks is based on a theoretical framework which defines a deadlock as a knot within a CWG [6]. We implement a deadlock detection algorithm that is able to identify knots within the CWG of an ongoing network simulation. The deadlock detection algorithm involves maintaining a CWG, detecting cycles within this graph, and identifying groups of cycles which form knots. It is implemented in a flit-level simulator called FlexSim (an extension of FlitSi.0). All simulations are run for normalized loads up to full network capacity or until the network saturates with respect to the number of resource dependency cycles, generally well beyond the loads at which network performance saturates (shown in the figures by a vertical dashed line). Each simulation is run for 0,000 simulation cycles beyond steady state. Unless otherwise stated, all simulations are performed using uniform traffic, a 16-ary -cube with bidirectional channels, a fixed message size of flits, an edge buffer depth of two flits, one injection and reception channel, and a channel selection policy which favors continuing routing in the current dimension over turning. Minimal true fully adaptive routing (TFAR) is used for adaptive routing and dimension ordered routing (DOR) is used for static routing. Since no other restrictions are enforced, deadlocks are possible for both routing schemes. The deadlock detection algorithm is invoked every 50 simulation cycles. Deadlocks are broken by removing a message in the deadlock set (flit-by-flit) from the network so as to synthesize a recovery procedure (as in the Disha scheme [5]). Deadlock frequency is presented as normalized deadlocks which is the ratio given by the number of deadlocks averaged over all messages delivered. When no deadlocks exist, we instead use the total number of resource dependency cycles formed and the amount of congestion (number and percentage of blocked messages) to represent the conditions that could lead to deadlock formation. The size of deadlock and resource sets and the knot cycle density are used to describe the size and complexity of deadlocks..1 Effect of Physical Links on Deadlocks In studying the effect of network links on deadlock formation, we measure the frequency of deadlocks in tori with uniand bidirectional channels. We assume DOR with one VC per physical channel for both networks (all other parameters set to default values). Figures 5a and 5b show normalized deadlocks vs. load rate and deadlock set size vs. load rate for the two networks under uniform traffic. Normalized load rate is calculated based on total link bandwidth and average internode distance, which differs for both networks. The figures show that the uni-torus leads to relatively more deadlock despite having generated less overall traffic Figure 5. (a) Normalized deadlocks vs. rate. (b) Deadlock set size vs. load rate. load Below network saturation, there are 1 and 7 deadlocks for every 100 messages delivered (on average) in the bi- and unidirectional networks, respectively. For the two networks, no more than 4 (bi) and (uni) messages are involved in each deadlock below saturation loads. This indicates that unless messages experience deadlock more than once, up to % (bi) and 15% (uni) of all messages participate in deadlock. Deep into saturation, deadlock frequency grows to 11% (bi) and 60% (uni) while the number of messages involved in deadlock converges to around 6 for both networks. From this, we can infer that at highly saturated load rates messages may be involved in multiple deadlocks prior to being delivered, particularly in the uni-directional network. The deadlocks formed in both networks are of the singlecycle deadlock variety described in Section..1. The requirements and factors leading to deadlock for the two networks, however, are different which helps to explain the disparity in deadlock frequency. For one, a bi-torus requires a minimum of messages per deadlock whereas only messages comprise the minimal deadlock set for a uni-torus. As confirmed by Figure 5b, the uni-torus has deadlocks involving fewer messages for all load rates up through deep saturation. Second, and more importantly, for uniform traffic in a torus with 16 nodes per dimension each bi-link is used by 1% of the messages traveling in a particular direction within a given dimension whereas each uni-link is used by 50% of the messages in the network. This suggests that the highly correlated resource dependencies resulting from all network traffic having to travel in the same direction (and turn) to reach their respective destinations is a major contributor to deadlock frequency. Our results show that as expected, adding routing resources (e.g., bidirectional physical links) reduces resource contention such that correlated resource dependencies required for deadlock are less likely to form. Although bidirectionality significantly reduces deadlock frequency, it does not by itself reduce the likelihood of deadlock formation to sufficiently low enough levels. However, bidirectionality may be combined with other techniques (following sections) to reduce deadlock frequency to well within acceptable levels.. Effect of Adaptivity on Deadlocks In studying the effect of adaptivity on deadlock formation, we measure the frequency of deadlocks and cycles in tori using DOR and TFAR. To focus on the effects of adaptivity alone, we again use a single VC per physical channel for both algorithms. Figures 6a and 6b show the normalized

5 Normalized Deadlocks and Cycles * TFAR Cycles o TFAR Deadlocks + DOR Cycles and Deadlocks Deadlock and Resource Set Size * TFAR Resource Set o DOR Resource Set x TFAR Deadlock Set + DOR Deadlock Set Figure 6. (a) Normalized deadlocks and cycles vs. load rate. (b) Deadlock and resource set size vs. load rate. Normalized Deadlocks * TFAR1 + DOR1 o DOR Number of Cycles TFAR4 0.5 DOR 0.5 DOR DOR4 TFAR TFAR1 10 TFAR DOR % Messages Blocked Figure 7. (a) Normalized deadlocks vs. load rate. (b) Number of cycles vs. percent of messages blocked. deadlocks and cycles vs. load rate and the deadlock and resource set size vs. load rate for the two algorithms under uniform traffic. DOR allows only single-cycle deadlocks to form (as in Figure 1), so one curve can represent both cycle and deadlock information. In contrast, TFAR allows cyclic non-deadlocks (similar to Figure 4). Since many more cycles can exist than there are deadlocks, two different curves are used to convey cycle and deadlock formation. Our results show that TFAR suffers no deadlocks below network saturation, 1 deadlock per 100 messages delivered at saturation, and about the same number of deadlocks as messages delivered in deep saturation. The ratio of deadlocks to messages delivered for DOR is even smaller prior to saturation (less than 1 per 1000 messages delivered). This rate gradually increases to 1 deadlock for every 10 messages delivered in deep saturation. In terms of actual number of deadlocks (not normalized to throughput), DOR suffers more than TFAR by as much as a factor 6. Interestingly, DOR has higher sustained throughput over TFAR despite having a larger number of deadlocks. This explains the discrepancy between actualdeadlockand normalized deadlock. It is also observed that the performance of TFAR is highly sensitive to just a few deadlocks while the performance of DOR remains relatively unaffected even as the number of deadlocks grows. The size of deadlock and resource sets in DOR are inherently limited by the single-cycle deadlocks which form. Given that deadlocks are broken immediately upon detection, the effects of deadlocks in DOR are local, isolated to a given row or a column within the network. The relatively simpler correlation of message dependency required for these deadlocks makes them more likely but, at the same time, less severe. In contrast, TFAR can lead to large multicycle deadlocks which have a more global effect upon the network. Hence, the higher degree of correlation of message resource dependency required for these deadlocks makes them less likely but more severe. The results shown in Figure 6b confirms our hypothesis. Large multi-cycle deadlocks appear in TFAR with deadlock sets and resource sets that are 5 to 7 and 7 to 10 times larger than those of DOR, respectively. What's more, the knot cycle densities for TFAR deadlocks are greater by a factor of 10 to 0. Some of the larger deadlocks observed in TFAR involve as many as 5% of the messages within the network, occupy more than 40% of the channels, and involve hundreds of cycles, thus confirming their global nature. As a result, the residual effects of such large deadlocks are longer-term and widespread; just a few can greatly degrade performance. This is in contrast to the deadlocks in DOR which have more localized, shorter-term effects, thereby making DOR' s performance less affected by a large number of deadlocks. The cyclic non-deadlocks in TFAR may also degrade performance. Duato [] has described situations where messages block cyclically faster than they can be drained and remain blocked for extended periods, leading to large message latencies. The large number of cycles we have observed even in the absence of deadlocks suggests that this may be occurring. Hence, low throughput resulting from these cyclic non-deadlocks contributes to the higher normalized deadlock frequency for TFAR although fewer actual deadlocks form. Given that TFAR with a single VC makes harmful deadlocks and cyclic non-deadlocks probable, recovery-based adaptive routing would benefit from additional VCs. Next we will examine the effect of additional VCs on reducing the likelihood of deadlock formation.. Effect of Virtual Channels on Deadlocks In investigating the effects of traffic flow on deadlock formation using multiple VCs per physical channel, we measure the deadlock frequency of DOR and TFAR in tori networks which allow the unrestricted use of,, and 4 VCs (all other parameters default). For experiments in which deadlock did not occur, we use network congestion and resource dependency cycles formed as a measure of the likelihood of possible deadlock. Figures 7a and 7b show normalized deadlocks vs. load rate and number of cycles vs. percentage of blocked messages under uniform traffic. In Figure 7b, each curve is annotated with the load rate at which cycles first appear (first point) and the load rate at which the highest number of cycles were found (last point). In Figure 7a, DOR with two VCs (DOR) does not lead to deadlock prior to saturation; the nd VC more than doubles the load at which deadlocks begin to appear when using only 1 VC. At its saturation load rate, approximately 1 deadlock occurs for every 100 messages delivered. Deadlock frequency increases to 1 for every 5 messages delivered in deep saturation. Beyond saturation, the actual number of deadlocks for DOR1 and DOR is roughly the same. However, a larger reduction in throughput at loads after saturation makes the normalized deadlock measure slightly higher for DOR (as shown in the figure). With or more VCs, DOR suffers no deadlocks. In contrast, VCs are sufficient to discourage deadlocks in TFAR (DOR, DOR4, TFAR, TFAR and TFAR4 are not plotted as no deadlocks occurred).

6 A number of factors contribute to the elimination of deadlocks when additional VCs are introduced. The new VCs are resources that become available to messages which would otherwise block. The likelihood of the formation of cycles and knots decreases when fewer messages are blocked within the network. The new VCs also provide a higher number of routing options for those messages which still block within the network. As was illustrated in Section, additional routing options increase the deadlock set size, resource set size, and knot cycle density needed for deadlock, thereby requiring a higher degree of correlation of message dependency in order for deadlock to form. This greatly diminishes the likelihood of deadlocks. Note that TFAR amplifies the effects of additional VCs since adaptivity makes new routing options available in each dimension. This explains why TFAR is able to eliminate all deadlocks with a smaller number of VCs (two instead of three for DOR). The simpler correlation of message dependencies required for deadlock in DOR combined with restrictions in the use of the new resources makes VCs insufficient to eliminate deadlock in DOR. Figure 7b indicates that adding VCs reduces congestion and allows higher loads to be applied before a large number of cyclic non-deadlocks form. TFAR1 results in increasingly higher congestion and a larger number of cycles starting at saturation. TFAR eliminates the cycles encountered at low load rates in TFAR1, and substantially reduces the overall congestion (from over 70% of the messages being blocked down to as few as 1%). As TFAR reaches saturation, its congestion increases while the number of cycles grows rapidly. The third and the fourth VCs for TFAR and DOR show a similar effect on reducing congestion and eliminating cycles at loads prior to saturation, leading to rapid growth in cycles once saturation is reached. In summary, we observe that additional VCs are able to reduce the amount of messages which block within the network, as expected. This, along with the higher degree of correlation of message dependencies required for deadlock in the presence of a larger number of routing options due to the additional VCs greatly diminishes the likelihood of deadlock. We find the extent to which deadlocks are eliminated with as few as VCs per physical channel to be surprising. However, as networks with multiple VCs reach saturation, a higher number of blocked messages along with the larger number of routing options increases the number of cycles exponentially. The fact that no knots exist even in the presence of hundreds of thousands of cycles suggests the formation of extremely large cyclic non-deadlocks at saturation loads. Operating below saturation avoids this performance degradation..4 Effect of Buffer Depth on Deadlocks We now investigate the effects of increasing the channel buffer size on deadlock formation. We measure the frequency of deadlocks in bidirectional tori with channel buffer depths of, 4, 6, 8, 16, and flits. TFAR with one VC per physical channel is used. Using a buffer of the same depth as message length corresponds to virtual cut-through switching [1]. Other buffer depths correspond to wormhole or buffered wormhole switching [1]. Figures 8a and 8b show normalized deadlocks vs. load rate and normalized Normalized Deadlocks x buffer=. buffer=4 + buffer=6... buffer=8 o buffer=16 * buffer= Normalized Deadlocks x buffer=. buffer=4 + buffer=6... buffer=8 o buffer=16 * buffer= Messages in the Network Figure 8. (a) Normalized deadlocks vs. load rate. (b) Normalized deadlocks vs. messages in the network. deadlocks vs. the number of messages in the network. As shown in Figure 8a, networks with buffer depths of, 4 and 6 flits all saturate at a similar load rate. After saturation, these networks lead to a large amount of deadlocks (15 to 5 deadlocks for every 10 messages delivered). The network with a buffer depth of 8 flits saturates at a 5% higher load rate, and leads to a similar deadlock frequency for load rates beyond saturation. Networks with buffers depths of 16 and flits saturate at a 75% higher load than the smallest buffers, reflecting the larger capacity of these networks. A buffer depth of 16 flits leads to the highest number of deadlocks (15 to 5 deadlocks per every 10 messages delivered) while the virtual cut-through network (buffer depth of flits) leads to the smallest number of deadlocks (1 deadlock for every messages delivered) at load rates beyond saturation. The increase in saturation load as the buffer depth is increased confirms that each message requires the simultaneous use of fewer channels due to the higher capacity. This allows for message compaction. Below saturation, compaction leads to less resource contention and allows more messages to be serviced by the network. The similar saturation load for buffer depths of, 4, and 6 flits (6%, 1%, and 18% of the message size) indicates that the amount of compaction occurring for these buffer sizes are alike, and suggests that messages have blocked close to their source nodes, thereby neutralizing the effect of compaction. Increases in saturation loads are greater for larger increments in buffer sizes, thereby suggesting effective compaction for these buffer sizes (buffer sizes of 8, 16 and flits which can accommodate 5%, 50%, and 100% of a message, respectively). When normalized with respect to the number of messages in the network (Figure 8b), the networks with smaller capacity buffers lead to a substantially higher number of deadlocks. This is explained by the fact that in these networks, each message requires the simultaneous use of a larger number of resources, thereby leading to higher resource contention. Although higher capacity buffers allow more messages to enter the network and, potentially, a larger number of messages to block at saturation, the degree of correlation of message dependency required for deadlocks also increases due to the message compaction, thereby making multi-cycle deadlocks with large deadlock and resource sets less probable.

7 .5 Effect of Network Node Degree on Deadlocks To investigate the effects of node degree on the frequency of deadlocks, we measured deadlock frequency in a 16-ary - cube (D) and a 4-ary 4-cube (4D) torus-connected network, both of which use TFAR routing with one VC. Load rate was normalized based on the total link bandwidth and average internode distance of the two networks. The 4D network resulted in relatively fewer deadlocks at loads prior to saturation (less than 1% of the deadlocks which occurred for the D network). Also, the 4D network achieved higher performance well beyond the saturation load of the D network, thereby leading to an even larger gap in the normalized deadlock frequency. The two main factors contributing to this are the additional network resources (physical channels) and the increased routing freedom (dimensions). Similar to other experiments, additional links serve to reduce resource contention and the high node degree, along with adaptive routing, increases the required correlation of message dependencies in order for knots to form. The few deadlocks that did form in the 4D network were all single-cycle deadlocks, which suggests that the few messages in the deadlock sets were limited to restricted routing due to exhaustion of routing adaptivity towards the destination..6 Effect of Non-Uniform Traffic on Deadlocks The deadlock frequencies for non-uniform traffic patterns (bit-reversal, matrix-transpose, perfect-shuffle, and hot- spot) were similar to (in most cases, within 10% of) the deadlock frequencies for the uniform traffic patterns in the experiments described above. The characteristics of the deadlocks (deadlock set size, resource set size, and knot cycle density) were similar as well. The only exception to this was for DOR. Single-cycle deadlocks in DOR (as shown in Figure 1) require circular overlap of messages. The source and destination pairs designated by some of these non-uniform traffic patterns are such that this overlap is not possible. 4 Related Work Deadlock approximation schemes proposed previously [4, 5] have provided little insight into the frequency of true deadlocks. In contrast, our work presents frequencies of actual deadlock as well as their characteristics as they relate to key network parameters. CWGs and similar constructs have previously been used to statically represent connections allowed by deadlock-avoidance based algorithms [, 8]. In contrast, we use these graphs to model dynamic resource allocation in unrestricted routing, and to precisely define and detect deadlocks. A summary of work characterizing deadlocks as knots in generalized resource graphs intended to describe deadlocks in operating systems is presented in [9]. Our work is a specialized application of this framework, intended for depicting deadlocks in interconnection networks. 5 Conclusions and Future and Work We characterize the causal effects of various network parameters on blocked messages, resource dependency cycles, and deadlocks to gain a greater understanding of the viability of deadlock recovery-based routing. Through simulation and analysis, we empirically show how deadlock probability is influenced by these factors when routing restrictions are not enforced so as to avoid deadlock. Our results for k-ary n-cube networks with n confirm that deadlock probability is less in bidirectional networks than in unidirectional networks, and it decreases as node degree and adaptivity is increased. Localized deadlocks of limited harmful effect are more probable with dimension ordered routing whereas globally harmful deadlocks are probable with true fully adaptive routing. Deadlock probability is less in virtual cut-through networks than in bufferedwormhole and wormhole networks, as expected. Interestingly, however, deadlocks are highly improbable (none were detected) if as few as VCs are used with dimension ordered routing and only VCs are used with true fully adaptive routing in bidirectional wormhole networks. These results lead us to conclude that recovery-based routing is viable since the unrestricted use of only a few virtual channels is sufficient to make deadlock highly improbable. Providing greater routing flexibility and buffer capacity through increased routing adaptivity, number of virtual and physical channels (bidirectional), and buffer depth greatly increases the complexity of correlated resource dependencies required for deadlock to occur. We will continue this characterization study by examining the effect of irregular network topology, hybrid message length, misrouting, etc., on deadlock. We also plan to characterize deadlock formation under hybrid non-uniform traffic loads using program-driven simulations. References [1] Andrew A. Chien and J. H. Kim. Planar-Adaptive Routing: Low- Cost Adaptive Networks for Multiprocessors. In Proc. of the 19th Symposium on Computer Architecture, pp 68-77, May 199. [] L. Ni and C. Glass, The Turn Model for Adaptive Routing, In Proc. of the 19th International Symposium on Computer Architecture, IEEE Computer Society, pages 78-87, May 199. [] J. Duato. A New Theory of Deadlock-free Adaptive Routing in Wormhole Networks. IEEE Transactions on Parallel and Distributed Systems, 4(1):10-11, December 199. [4] J. Kim, Z. Liu, and A. Chien. Compressionless Routing: A Framework for Adaptive and Fault-tolerant Routing. In Proc. of the 1st International Symposium on Computer Architecture, pp 89-00, April [5] Anjan K.V. and Timothy M. Pinkston, An Efficient, Fully Adaptive Deadlock Recovery Scheme: Disha, In Proc. of the nd International Symposium on Computer Architecture, pp 01-10, June [6] Sugath Warnakulasuriya and Timothy Mark Pinkston. Implementation of Deadlock Detection in a Simulated Interconnection Network Environment, Technical Report CENG 97-01, University of Southern California, January [7] J. Duato. A Necessary and Sufficient Condition for Dead lock-free Adaptive Routing in Wormhole Networks. IEEE Transactions on Parallel and Distributed Systems, 6(10): , October [8] Loren Schwiebert, D.N. Jayasimha, A Necessary and Sufficient Condition for Deadlock-Free Wormhole Routing, Journal of Parallel and Distributed Computing,, (1996).

8 [9] Mamoru Maekawa, Arthur E. Oldehoft, and Rodney R. Oldehoft, Operating Systems: Advanced Concepts, Benjamin Cummings, [10] William J. Dally and Hiromichi Aoki, Deadlock- Free Adaptive Routing in Multicomputer Networks Using Virtual Channels, IEEE Transactions on Parallel Distributed Systems, Vol. 4, No. 4, April, 199. [11] Timothy Mark Pinkston and Sugath Warnakulasuriya. On Deadlock in Interconnection Networks, To appear in Proc. of the 4th International Symposium on Computer Architecture, June [1] Parviz Kermani and Leonard Kleinrock. Virtual cut- through: A new computer communication switching technique, Computer Networks, pages 67-86, [1] C.B. Stunkle et al. The SP high-performance switch, IBM Systems Journal, vol. 4, no., pp , 1995.

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Anjan K. V. Timothy Mark Pinkston José Duato Pyramid Technology Corp. Electrical Engg. - Systems Dept.

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Crossbar Analysis for Optimal Deadlock Recovery Router Architecture

Crossbar Analysis for Optimal Deadlock Recovery Router Architecture rossbar Analysis for Optimal Deadlock Recovery Router Architecture Yungho hoi Timothy Mark Pinkston SMART Interconnects Group EE-Systems Dept, University of Southern alifornia, Los Angeles, A 90089-2562

More information

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks J. M. Martínez, P. López, J. Duato T. M. Pinkston Facultad de Informática SMART Interconnects Group Universidad

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

Wormhole Routing Techniques for Directly Connected Multicomputer Systems

Wormhole Routing Techniques for Directly Connected Multicomputer Systems Wormhole Routing Techniques for Directly Connected Multicomputer Systems PRASANT MOHAPATRA Iowa State University, Department of Electrical and Computer Engineering, 201 Coover Hall, Iowa State University,

More information

Adaptive Multimodule Routers

Adaptive Multimodule Routers daptive Multimodule Routers Rajendra V Boppana Computer Science Division The Univ of Texas at San ntonio San ntonio, TX 78249-0667 boppana@csutsaedu Suresh Chalasani ECE Department University of Wisconsin-Madison

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks

BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks Mithuna Thottethodi Λ Alvin R. Lebeck y Shubhendu S. Mukherjee z Λ School of Electrical and Computer Engineering Purdue University

More information

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Deadlock and Router Micro-Architecture

Deadlock and Router Micro-Architecture 1 EE482: Advanced Computer Organization Lecture #8 Interconnection Network Architecture and Design Stanford University 22 April 1999 Deadlock and Router Micro-Architecture Lecture #8: 22 April 1999 Lecturer:

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

True fully adaptive routing employing deadlock detection and congestion control.

True fully adaptive routing employing deadlock detection and congestion control. True fully adaptive routing employing deadlock detection and congestion control. 16 May, 2001 Dimitris Papadopoulos, Arjun Singh, Kiran Goyal, Mohamed Kilani. {fdimitri, arjuns, kgoyal, makilani}@stanford.edu

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router

Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router erformance Evaluation of robe-send Fault-tolerant Network-on-chip Router Sumit Dharampal Mediratta 1, Jeffrey Draper 2 1 NVIDIA Graphics vt Ltd, 2 SC Information Sciences Institute 1 Bangalore, India-560001,

More information

Generic Methodologies for Deadlock-Free Routing

Generic Methodologies for Deadlock-Free Routing Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University

More information

A Survey of Routing Techniques in Store-and-Forward and Wormhole Interconnects

A Survey of Routing Techniques in Store-and-Forward and Wormhole Interconnects SANDIA REPORT SAND2008-0068 Unlimited Release Printed January 2008 A Survey of Routing Techniques in Store-and-Forward and Wormhole Interconnects David M. Holman and David S. Lee Prepared by Sandia National

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Routing and Deadlock

Routing and Deadlock 3.5-1 3.5-1 Routing and Deadlock Routing would be easy...... were it not for possible deadlock. Topics For This Set: Routing definitions. Deadlock definitions. Resource dependencies. Acyclic deadlock free

More information

CONNECTION-BASED ADAPTIVE ROUTING USING DYNAMIC VIRTUAL CIRCUITS

CONNECTION-BASED ADAPTIVE ROUTING USING DYNAMIC VIRTUAL CIRCUITS Proceedings of the International Conference on Parallel and Distributed Computing and Systems, Las Vegas, Nevada, pp. 379-384, October 1998. CONNECTION-BASED ADAPTIVE ROUTING USING DYNAMIC VIRTUAL CIRCUITS

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Topology basics. Constraints and measures. Butterfly networks.

Topology basics. Constraints and measures. Butterfly networks. EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Efficient Communication in Metacube: A New Interconnection Network

Efficient Communication in Metacube: A New Interconnection Network International Symposium on Parallel Architectures, Algorithms and Networks, Manila, Philippines, May 22, pp.165 170 Efficient Communication in Metacube: A New Interconnection Network Yamin Li and Shietung

More information

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes N.A. Nordbotten 1, M.E. Gómez 2, J. Flich 2, P.López 2, A. Robles 2, T. Skeie 1, O. Lysne 1, and J. Duato 2 1 Simula Research

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

MESH-CONNECTED networks have been widely used in

MESH-CONNECTED networks have been widely used in 620 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 5, MAY 2009 Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model Dong Xiang, Senior Member, IEEE, Yueli Zhang,

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Fabrizio Petrini Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD, England e-mail: fabp@comlab.ox.ac.uk

More information

Appendix B. Standards-Track TCP Evaluation

Appendix B. Standards-Track TCP Evaluation 215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering McGill University - Faculty of Engineering Department of Electrical and Computer Engineering ECSE 494 Telecommunication Networks Lab Prof. M. Coates Winter 2003 Experiment 5: LAN Operation, Multiple Access

More information

Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits

Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits Computer Science Department Technical Report #TR050021 University of California, Los Angeles, June 2005 Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits Yoshio Turner and Yuval

More information

A DAMQ SHARED BUFFER SCHEME FOR NETWORK-ON-CHIP

A DAMQ SHARED BUFFER SCHEME FOR NETWORK-ON-CHIP A DAMQ HARED BUFFER CHEME FOR ETWORK-O-CHIP Jin Liu and José G. Delgado-Frias chool of Electrical Engineering and Computer cience Washington tate University Pullman, WA 99164-2752 {jinliu, jdelgado}@eecs.wsu.edu

More information

A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes

A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 1 A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes Dianne R. Kumar, Member, IEEE, Walid A. Najjar, and Pradip K. Srimani,

More information

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks 2080 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012 Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks Rohit Sunkam

More information

On characterizing BGP routing table growth

On characterizing BGP routing table growth University of Massachusetts Amherst From the SelectedWorks of Lixin Gao 00 On characterizing BGP routing table growth T Bu LX Gao D Towsley Available at: https://works.bepress.com/lixin_gao/66/ On Characterizing

More information

The Odd-Even Turn Model for Adaptive Routing

The Odd-Even Turn Model for Adaptive Routing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 7, JULY 2000 729 The Odd-Even Turn Model for Adaptive Routing Ge-Ming Chiu, Member, IEEE Computer Society AbstractÐThis paper presents

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201

Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201 Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201 Yoshiko Yasuda, Hiroaki Fujii, Hideya Akashi, Yasuhiro Inagami, Teruo Tanaka*,

More information

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors Govindan Ravindran Newbridge Networks Corporation Kanata, ON K2K 2E6, Canada gravindr@newbridge.com Michael

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ J. Flich, P. López, M. P. Malumbres, and J. Duato Dept. of Computer Engineering

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Measure of Impact of Node Misbehavior in Ad Hoc Routing: A Comparative Approach

Measure of Impact of Node Misbehavior in Ad Hoc Routing: A Comparative Approach ISSN (Print): 1694 0814 10 Measure of Impact of Node Misbehavior in Ad Hoc Routing: A Comparative Approach Manoj Kumar Mishra 1, Binod Kumar Pattanayak 2, Alok Kumar Jagadev 3, Manojranjan Nayak 4 1 Dept.

More information

The final publication is available at

The final publication is available at Document downloaded from: http://hdl.handle.net/10251/82062 This paper must be cited as: Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The

More information

On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes

On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 E-mail: jie@cse.fau.edu

More information

Communication in Multicomputers with Nonconvex Faults?

Communication in Multicomputers with Nonconvex Faults? In Proceedings of EUROPAR 95 Communication in Multicomputers with Nonconvex Faults? Suresh Chalasani 1 and Rajendra V. Boppana 2 1 Dept. of ECE, University of Wisconsin-Madison, Madison, WI 53706-1691,

More information

Computation of Multiple Node Disjoint Paths

Computation of Multiple Node Disjoint Paths Chapter 5 Computation of Multiple Node Disjoint Paths 5.1 Introduction In recent years, on demand routing protocols have attained more attention in mobile Ad Hoc networks as compared to other routing schemes

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Tsutomu YOSHINAGA, Hiroyuki HOSOGOSHI, Masahiro SOWA Graduate School of Information Systems, University of Electro-Communications,

More information

On Constructing the Minimum Orthogonal Convex Polygon for the Fault-Tolerant Routing in 2-D Faulty Meshes 1

On Constructing the Minimum Orthogonal Convex Polygon for the Fault-Tolerant Routing in 2-D Faulty Meshes 1 On Constructing the Minimum Orthogonal Convex Polygon for the Fault-Tolerant Routing in 2-D Faulty Meshes 1 Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton,

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Richard Kershaw and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering, Viterbi School

More information

The Cray T3E Network:

The Cray T3E Network: The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus Steven L. Scott and Gregory M. Thorson Cray Research, Inc. {sls,gmt}@cray.com Abstract This paper describes the interconnection network

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

3. Evaluation of Selected Tree and Mesh based Routing Protocols

3. Evaluation of Selected Tree and Mesh based Routing Protocols 33 3. Evaluation of Selected Tree and Mesh based Routing Protocols 3.1 Introduction Construction of best possible multicast trees and maintaining the group connections in sequence is challenging even in

More information

n = 2 n = 2 n = 1 n = 1 λ 12 µ λ λ /2 λ /2 λ22 λ 22 λ 22 λ n = 0 n = 0 λ 11 λ /2 0,2,0,0 1,1,1, ,0,2,0 1,0,1,0 0,2,0,0 12 1,1,0,0

n = 2 n = 2 n = 1 n = 1 λ 12 µ λ λ /2 λ /2 λ22 λ 22 λ 22 λ n = 0 n = 0 λ 11 λ /2 0,2,0,0 1,1,1, ,0,2,0 1,0,1,0 0,2,0,0 12 1,1,0,0 A Comparison of Allocation Policies in Wavelength Routing Networks Yuhong Zhu a, George N. Rouskas b, Harry G. Perros b a Lucent Technologies, Acton, MA b Department of Computer Science, North Carolina

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information