Routing Algorithms for 2-D Mesh Network-On-Chip Architectures

Size: px
Start display at page:

Download "Routing Algorithms for 2-D Mesh Network-On-Chip Architectures"

Transcription

1 Routing Algorithms for 2-D Mesh Network-On-hip Architectures Edward hron Gene Kishinevsky Brandon Nefcy Nishant Patil Department of Electrical Engineering, Stanford University {echron, genek, nefcy, Abstract Network-On-hip (No) Architectures are becoming increasingly important for multiprocessors. A 2-D mesh is a popular network topology for Nos. In this paper, we analyze three routing algorithms for 2-D meshes Dynamically switching between Adaptive and Deterministic (DyAD), Orthogonal One-Turn (O1TURN) and ontention-aware Input Selection (AIS). We discuss the methodology and results presented in the three papers and evaluate the algorithms. inally, we present possible venues for future research. 1. Introduction Designs containing multiple cores on a single chip continue to proliferate into the market. As this takes place, the need for efficient on-chip networks is becoming a major key to continuing increases in computing performance. Existing designs have shown that the creating a large number of processing units on a single chip is possible [Vangal 07], however interconnecting these units in an efficient manner such that the maximum computational benefit can be derived is no small task. The design constraints of an on-chip network are somewhat different than those of traditional networks. Memory and computation resources are somewhat more expensive, while wires are more plentiful [Dally 01, Radulescu 03 and Henkel 04]. This is something of a reversal when compared to traditional networks, and opens up new and interesting design problems. Within the scope of this paper we will analyze three different routing algorithms aimed specifically at the network on chip (No) design space: Dynamically switching between Adaptive and Deterministic (DyAD) which attempts to combine a deterministic and adaptive algorithm together, Orthogonal One-Turn (O1TURN) which randomly selects between two dimension ordered minimal, and ontention-aware Input Selection (AIS), which takes a different approach and examines the problem of selecting a winner from several input channels requesting the same output channel. We will present the proposed benefits and drawbacks of these approaches, along with a critique of the methods used to evaluate each idea. Lastly we present a discussion of the three algorithms considered jointly, and give some ideas for future research extending these ideas. The paper is organized as follows. Section 2 describes DyAD while O1TURN and AIS are discussed in Sec. 3 and Sec. 4 respectively. Section 5 contains a critique of all the algorithms considered together. Section 6 concludes the paper. 2. DyAD Smart Routing of Networks-on-hip 2.1 Introduction The acronym DyAD stands for: Dynamically switching between Adaptive and Deterministic routing modes. The intention of the DyAD routing scheme [Hu 04] is to propose a new paradigm for the design of a Network on hip (No) router that allows the No routing algorithm to exploit the advantages of both deterministic and adaptive routing. As such, DyAD is presented as a hybrid routing scheme that can perform either adaptive or deterministic routing to achieve best possible throughput. With the DyAD hybrid routing scheme, the network continuously monitors its local network load and makes the choice of whether to use an adaptive or deterministic routing mode based on local network load. When the network is not congested a DyAD router works in a deterministic mode and thus can route with the low latency that is facilitated by deterministic routing. When the network becomes congested, a DyAD router switches to routing in adaptive mode to avoid routing to congested links by exploiting other less congested routes. The authors implemented one possible variation of the DyAD hybrid scheme that employs two flavors of the oddeven routing scheme, one flavor as a deterministic scheme and one flavor as an adaptive routing scheme. By measuring how full local IO queues are, a router may switch between deterministic and adaptive modes. urther, the DyAD scheme proposed is shown to be deadlock and livelock free in the presence of the mixture of deterministic and adaptive routing modes. Performance measurements are reported that highlight the advantages of this hybrid approach. Measurements are reported for several permutation traffic patterns as well as a real world multimedia traffic pattern. Evidence is presented that the additional resources required to support a hybrid routing scheme are minimal. 2.2 Design The paper focuses on a specific design and implementation of DyAD but claims that the paradigm can be made to fit a variety of possible implementations. The design analyzed in the paper is for a 2D mesh network topology, which the authors point out is 1

2 commonly used in designs of Nos since it readily fits the tiled architecture common with Nos. They also claim that the DyAD algorithm can be easily extended for other topologies besides 2D meshes. The authors choose to select routing algorithms to design and analyze with their prototype of a No with a DyAD router using adaptive and deterministic routing modes that did not require virtual channels. The authors argue against using virtual channels because the cost of the silicon area needed for virtual channel buffers makes their use quite expensive with Nos. The paper states that wormhole flow control is the flow control (switching) scheme typically used for Nos to minimize silicon area and to keep latency to a minimum. Packets are divided up into flits, routed in a pipeline fashion and stored in registers on the No router instead of memory buffers to minimize silicon area overhead. Wormhole flow control is used by the DyAD prototype. As a control to measure against, the XY DOR minimal path routing scheme is selected to be the representative deterministic routing scheme since it is a commonly used algorithm and is deadlock and livelock free. The two routing algorithms selected for the DyAD prototype are a deterministic routing algorithm and an adaptive algorithm both based on the odd even adaptive routing algorithm. or the adaptive algorithm, the authors created a version of the odd-even routing algorithm that they named OE-fixed. or the adaptive algorithm the authors selected the minimal odd-even routing algorithm. The odd-even adaptive routing algorithm was proposed by [hiu 00] in his paper on the odd-even turn model. The model shows how selectively restricting the directions routing turns are permitted to take provides the resource ordering needed to ensure that the routing algorithm remains deadlock free. The odd-even routing algorithm prohibits even column routing tiles from routing east to north and east to south while prohibiting odd column routing tiles from routing north to west and south to west. Among adaptive routing algorithms without virtual channel support [Glass 98], the odd-even scheme routes in a more evenly distributed fashion across the network. A minimal route version of odd-even was selected to ensure the network doesn t livelock and also to minimize energy consumption. The OE-fixed deterministic routing algorithm is a version of the odd-even algorithm with the adaptiveness removed; meaning that if the possible output locations determine more than one route, one of the possible routes is consistently selected. The minimal odd-even adaptive routing algorithm is described in [Schwiebert 93] and is a minimal odd-even adaptive routing scheme. To monitor congestion in the local network, each input port controller monitors how full the IO it uses to hold arriving flits is. At a specified congestion threshold, a congestion signal is asserted to indicate to upstream routers that the downstream router is currently congested. A congestion signal prompts upstream routers to switch to adaptive routing, otherwise deterministic routing is utilized. Packet flits are held in the IO which can be used to determine back pressure. Dropping flits in a No may not be possible since these architectures may not provide an end-to-end protocol for retransmission. 2.3 Implementation In addition to using the deterministic and adaptive flavors of odd-even turn routing, the author s implementation of the DyAD hybrid scheme employs wormhole flow control. As such, a single virtual channel is present with the buffering done using registers arranged in a IO to save space, reduce latency and allow efficient monitoring of queue depth. The IOs are limited to the range of 3 to 8 flits to ensure back pressure remains stiff. A 5x5 crossbar is the router switching fabric to switch the 4 external and 1 internal routes. The router logic is implemented at each input channel. The crossbar arbiter uses an S arbitration policy. Both the phit and flit size are set to be 32 bits. The authors simulated several square mesh networks using different routing schemes and design parameters under different traffic patterns. Square mesh sizes measured were 4x4, 6x6 and 8x8, though only numbers for the 6x6 meshes are reported in the paper for reasons of space. Three traffic patterns, uniform, transpose (two types of transpose: matrix transpose (y = -x), and (y = x) from node (i,j) to node (j,i)) and hot spot are used in the simulation. As mentioned above, the IO size was varied (3 to 8 flits). Though not of all of the results were included in the paper due to space limitations, for example the authors only report results for the 6x6 mesh size, the authors report that all of the results had the same characteristic as the results presented. Additionally, real world traffic was measured using 9 profiled traces of an H263 video decoder using different video clips. or this analysis a 4x4 network was used with nine of the 16 PEs selected at random to generate packets according to the profiled traces. The traffic from these multimedia traces can be described as bursty. The remaining PEs in the network use uniform traffic to send packets to other PEs with equal probability. Each simulation was run with a warm-up of 2000 cycles, followed by the collection of performance data after 20,000 packets are sent. The packets were 5 flits in size. 2.4 Results The congestion threshold was set at 60% and when a IO queue fills and the threshold is reached, the router 2

3 switches from deterministic routing mode to adaptive routing mode. or the two transpose traffic patterns, the minimal oddeven provided substantially better sustained throughput than the deterministic routing algorithms (by as much as 53% and 61%). or uniform traffic DOR routing provided better sustained throughput than minimal odd-even routing (by as much as 21%). or hot spot traffic minimal odd-even performed about the same as odd-even and was less than 10% better than DOR. or the real world traffic the minimal odd-even provided modestly better sustained throughput than DOR (by as much as 16 %) and was just slightly better than oddeven (3%). or all the traffic patterns studied OE-fixed deterministic best case performed as poorly as DOR (for traffic patters where deterministic routing does poorly like the transpose patterns) but otherwise it never performed as well as DOR. In fact, for uniform traffic DOR was substantially better (as much as 30% better). The implementation prototyped was shown to consume an additional 7% of chip area to accommodate the expanded routing capability which the authors considered negligible versus the performance benefit of having a hybrid routing algorithm. 2.5 ritique While the DyAD paper highlights some potential advantages of a hybrid scheme, such as improved performance for some traffic patterns with only minimal cost in terms of chip area there are many unanswered questions and other issues we would like to see explored further. The authors spent a lot of effort to create the implementation of their odd-even routing flavors of DyAD and measure performance with varying mesh sizes and IO sizes and then to compare their results to DOR and a traditional odd-even adaptive routing system. Unfortunately, only some of their results could be presented and some issues could only be touched on very briefly due to space limitations. Also very limited measurements of performance of the system, primarily packet latency versus injection rate were presented. This left us with many questions that we would like to have seen answered in order to allow us to perform a more careful analysis of their design and implementation. Some of the issues from the paper we would like to see explained or expanded on include: There was no worst-case analysis reported for the algorithms, even for the deterministic (oblivious) schemes. Since defined methodologies to produce worst-case analysis traffic patterns exist, at least for the oblivious routing functions [Towles 02], there should be some worst-case analysis reported. Adding an average case analysis would be helpful as well as none was provided. Packet sizes were fixed to 5 flits. An analysis with small, medium and large packet sizes would have been desirable. One suggestion for possible packet sizes comes from [Petrini 97] where network performance was measured for packet sizes of 5, 20 and 80 flits. It was surprising how many papers measured performance for a single fixed packet size that was typically only 5 flits long even though analyzing various packet sizes can prove valuable in evaluating system design. No explanation was provided as to why a congestion threshold of 60% was selected as the optimal size to provide back pressure. The sensitivity of this parameter has not been discussed. It would be interesting to see the effect on throughput of varying the size of the IO (which the authors mentioned they did between 3 and 8 flits) and how the variation in back pressure affected their routing choices and the resulting network throughput. The numbers reported in the paper were of two varieties: average packet delay versus packet injection rate and for router design evaluation IO capacity versus Router Area. It is interesting to note that the average packet delay is measured from the input port. A more common and general approach is to measure latency from the source to the destination. We are not clear on how this measurement may affect the results. The authors measurements show that having both deterministic and adaptive routing algorithms together does not cost more than 7% of additional silicon area for the routing function. The authors chose a scheme that uses no virtual channels to minimize buffer space. This may not be necessary for DyAD in general, since it is not limited to the implementation presented. or the traffic patterns the authors reported in the paper, the deterministic OE-fixed algorithm performed worse than DOR. It would have been instructive to analyze a hybrid algorithm with deterministic DOR and minimal OE adaptive or other combinations of algorithms. Measurement of the silicon area consumed and performance gained with other choices for adaptive and deterministic routing would be nice to substantiate the claim that the hybrid scheme can be used to gain a performance advantage versus the additional cost of the silicon needed and routing delays introduced by a hybrid scheme. The authors present DyAD as combining the low latency of deterministic routing with the high throughput of adaptive routing. However, the paper 3

4 does not present an analysis of any potential savings due to the higher clock frequency or fewer pipeline stages that a simplified deterministic routing scheme offers. The author s assumed that virtual channels are too expensive in silicon area to use for a No, even though the additional overhead is becoming less prohibitive in advanced MOS technologies. A very limited number of traffic patterns were analyzed. It would be helpful to have additional traffic patterns analyzed. The authors did mention adding additional realistic traffic patterns as a topic for future work. The authors present DyAD as a hybrid paradigm that is not limited to the actual design they presented. While this may be true in theory and in practice, no other functional implementations were presented to substantiate this claim. The authors mention that minimal odd-even routing was selected to reduce power consumption. However, the paper made no further analysis of power consumption. Power consumption is becoming a critical issue with processor cores so it is important for Nos to consider the power consumption of the interconnection network as well. Providing measurements to verify that power was conserved and by how much would be of value. It would be interesting to consider how a different topology choice would affect DyAD, for example topologies with express lanes, 2-D tori and non 2D mesh topologies. The authors mentioned under future work that the configuration should be carefully customized to match the given application traffic characteristics. However, there was very little guidance given as to how this should be accomplished. An indication as to what information would be relevant to customizing the configuration and how this customization should be done would have been useful. 2.6 onclusion The DyAD paradigm highlights potential considerations for future routing algorithms. While the notion of using a hybrid router is certainly an appealing idea, we feel the paper by [Hu 04] does not provide compelling evidence that hybrid routing algorithms offer the best of both deterministic and adaptive routing schemes. Although the additional resources required to implement a hybrid scheme were not prohibitive, the performance improvements presented did not make a strong case for the DyAD routing algorithm. 3. Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks 3.1 Introduction An oblivious routing algorithm (O1TURN) for 2-D mesh networks has been described in [Seo 06]. O1TURN performs well in the three main criteria as defined in their paper minimizing number of hops, delivering near optimal worst-case and good average-case throughput, and allowing a simple implementation to reduce router latency. According to the authors, existing routing algorithms optimize some of the above mentioned design goals while sacrificing the others. The proposed O1TURN (Orthogonal One-TURN) algorithm addresses all three of these issues. O1TURN allows each packet to traverse one of two dimension-ordered routes (X first or Y first) by randomly selecting between the two options. It is an interesting 2-D extension to the Randomized Local Balanced routing (RLB) algorithm utilized in ring topologies [Singh 02]. 3.2 Background and Problem In this section we will discuss the three popular oblivious network routing algorithms that O1TURN was compared against. Dimension-ordered routing (DOR) [Ni 93] is the simplest algorithm for routing in a 2-D mesh. Packets are routed in one dimension first and then in the other dimension. DOR guarantees a minimal hop count and has a simple implementation making it a popular choice for No systems. While DOR performs well at low network load due to the short header latency, its performance degrades quickly as the load on the network increases due to lack of load balancing. Therefore the worst-case and average-case throughputs for DOR are fairly bad as can be seen in Table 3.1. DOR performed worse than all other oblivious algorithms analyzed in the paper for the worstcase and average-case traffic patterns. ROMM is a class of Randomized, Oblivious, Multiphase, Minimal routing algorithms [Nesson 95]. or a large range of traffic patterns ROMM is superior to DOR since it allows minimal routing with some load balancing. ROMM randomly chooses an intermediate node in the minimal rectangle between the source and destination nodes, and then routes packets through the intermediate node using DOR. The simplicity and good average-case performance of ROMM make it a desirable algorithm for systems where average-case throughput is important. However, ROMM fails to provide good worst-case throughput since source/destination pairs can create additional congestion in channels not in the row and column of source and destination nodes. Although the worst-case throughput is undesirably low, in practice it does not occur very frequently. In fact people were generally unaware of the exact worst case traffic pattern until an analytical approach 4

5 for calculating worst case throughput was described in [Towles 02]. Therefore, ROMM is a popular choice for networks where the worst-case throughput is not critical. The VALIANT routing algorithm guarantees optimal worst-case throughput by randomizing every traffic pattern [Valiant 81]. VALIANT randomly picks an intermediate node from any node in the network and routes minimally from source to intermediate node and then from the intermediate to the destination node. This is a non-minimal routing algorithm which destroys locality and hurts header latency, but guarantees good load balancing. It can be used if the worst-case throughput is the only critical measure for the network. 3.3 Proposed Solution As was demonstrated, none of these three common oblivious routing algorithms perform well across the board. The proposed O1TURN algorithm solves this problem by having the low latency of DOR, good average-case throughput of ROMM, and optimal worst-case throughput of VALIANT. The authors propose the use of the O1TURN algorithm to avoid the shortcomings of existing routing algorithms as discussed above. By randomly selecting either of the DOR directions for each packet (X first then Y or Y first then X) O1TURN maintains the low latency of DOR while guaranteeing optimal worst-case throughput by introducing load-balancing. Since the near optimal worst case throughput of O1TURN relies on distributing traffic evenly between the XY and YX routing layers, special logic needs to be implemented to balance load in the case of variable length packets. The authors contend that this logic is off the critical path and does not add to the delay of the router. Additionally, though not explicitly stated, even in the presence of this logic, the O1TURN algorithm does remain oblivious since the decision to route XY or YX is only done at the source of the packet and not at every hop along the way. 3.4 Analysis The simplicity of the O1TURN algorithm allows an implementation comparable to that of the DOR algorithm in terms of area and delay. O1TURN offers a minimal hop count as guaranteed by DOR and thus keeps the zero-load packet header latency to a minimum. The worst-case throughput is proven to be optimal for k x k mesh networks where k is even and to be within a 1 factor of optimal 2 k when k is odd. A graph of the worst-case throughputs for different sizes of 2-D mesh networks is shown in ig It is clearly shown how the worst-case throughput for DOR and ROMM quickly degrade with an increase in network diameter. On the other hand, O1TURN converges to the optimal worst-case throughput as the network size grows. igure 3.1 Worst-case throughputs for routing algorithms In Table 3.1, the average-case throughput for O1TURN is shown to be similar to or better than the other standard oblivious routing algorithms. The average case throughput was computed by calculating the harmonic mean of worst-case throughputs for a million random communication patterns [Towles 02]. O1TURN achieves an average-case throughput and latency similar to those of ROMM, which in turn behaves fairly well on a variety of traffic patterns. Additionally, O1TURN avoids ROMM s degraded worst-case throughput and outperforms VALIANT on both average-case throughput and latency with an equal worst-case throughput. This demonstrates the improvement that O1TURN has over traditional oblivious routing algorithms. Table 3.1 Throughputs for Various Traffic Patterns using different routing algorithms The O1TURN algorithm only loses out to ROMM in two instances: transpose and performance shuffle traffic 5

6 patterns and performs as well as or better than all other algorithms for all traffic patterns. 3.5 Implementation and Results A base router implementation of O1TURN consists of a pipelined virtual channel router. This router has the following pipeline stages: 1. Routing stage, which determines the output channel of the packet. 2. Virtual channel allocation phase, which assigns free output virtual channels to the packets at the input. 3. Switch Arbitration stage, which competes for a switch port from input to output. 4. Switch Traversal stage, which transmits the packet through the switch crossbar and the physical channel to the neighboring node. In the case of DOR networks, a single virtual channel (V) per physical channel (P) is sufficient to achieve deadlock free routing. Since O1TURN uses 2 dimension orders for routing X first then Y or Y first then X, it needs a minimum of 2 virtual channels (one for each dimension order) for deadlock free routing. The routing and the V allocation computation for packets in the XY and YX layers are done completely independently of each other. This simplifies the V allocation in the case of O1TURN compared to DOR since O1TURN has to arbitrate between half the number of virtual channels for a given total number of virtual channels. However, this reduced V allocation complexity comes at a price of increased head of line (HOL) blocking since the number of V per routing layer is halved. To compensate for increased chance of HOL blocking in the case of O1TURN compared to DOR, the number of virtual channels will have to be doubled for O1TURN. The switch arbitration delay is increased since the number of Vs contending for a physical channel is doubled. Nevertheless, the authors contend that the V allocation stage remains the critical path and the additional latency of the switch allocation can be hidden (Table 3.2). This analysis assumes that the router has only 4 pipeline stages; in reality, routers will have more pipeline stages to balance the critical paths between the various router computations. justification for the packet size that they chose. We believe that a small packet size enhances the performance of an oblivious algorithm with respect to an adaptive algorithm (Sec. 5). The large number of Vs/P may have biased the results in favor of O1TURN compared to DOR. The paper analyses 4 traffic patterns (three permutations and one hot spot). The performance of O1TURN is shown to be better than DOR and ROMM. ompared to an adaptive routing algorithm like DUATO, the authors show that O1TURN performs better on all traffic patterns except the matrix transpose and the hot-spot pattern. However, if the latency is measured in O4 (instead of number of cycles), the performance improvement afforded by DUATO is reduced, since the router latency for DUATO is increased (ig. 3.2). We believe that the overhead for adaptive algorithms like DUATO will be reduced for larger packet sizes as the router computation (done only for the head flit) is amortized over many body flits. Thus, a more complete analysis of O1TURN vs. DUATO should have included varying packet sizes. Table 3.2 Router Pipeline Delays (O4) or the evaluation of the O1TURN scheme, the authors measured two 2D mesh sizes: 4 x 4 and 8 x 8 with 5 flit packets and 5 flit input buffers using 8 Vs / P. They did not vary the packet size and provided little igure 3.2 Latency of Routing Algorithms in O4 The authors compare O1TURN to a similar adaptive algorithm, positive first and negative first (PN) [Upadhyay 97] which load balances between two routing layers. The O1TURN paper states that the latency overhead 6

7 due to an adaptive algorithm will be higher. However, there is no additional discussion on whether this latency overhead can be hidden by parallelizing the computation or whether this overhead can be amortized by using larger sized packets. In general, O1TURN is shown to give close to optimal worst-case throughput with good average case throughput and low latency for the traffic patterns and the network parameters chosen in the simulations. We believe that a more complete analysis would need to include varying packet sizes and varying number of Vs / P. Mesh networks have limited path diversity when compared to tori networks, which restricts some of the potential of adaptive algorithms. It would be interesting to compare the O1TURN algorithm on a torus against some popular adaptive algorithms such as hannel Queue Routing [Singh 04]. It is likely that the average-case throughput can be significantly improved compared to O1TURN with an adaptive algorithm if the No utilizes a torus topology. 4. Improving Routing Efficiency for Network on-hip through ontention-aware Input Selection In their paper [Wu 06], Wu, Al-Hashimi, and Schmitz present the idea of directing packets through nodes in the network based not only the downstream destination of the packets and congestion information looking forward at future hops, but also on contention information available from looking backwards at previous hops oncept The authors begin by recognizing the importance of No designs as multiprocessor on-chip designs become more complex and integrated. It is pointed out that the performance of a No is strongly dependent on the routing technique employed. Wu et al. note that while a great deal of emphasis has been placed on output selection i.e. determining the next destination node when traffic arrives at an input very little work has been done with regards to input selection. The need for input selection arises when the output selection process has selected the same output path for two packets to traverse. A mechanism is needed to decide which of the two packets will get access to the output first. This mechanism is referred to as input selection. Traditionally a method of round-robin or first come first serve (S) has been employed to make this decision. The authors investigate a more cognizant method of input selection which they refer to as ontention-aware Input Selection (AIS). The main idea behind AIS is that when two or more input packets both desire the same output channel, the decision as to which packet should obtain the output is made based on upstream contention information. The aim of AIS is to use contention information to alleviate congestion Implementation The key element that AIS hinges upon is the generation and the use of information to make the input selection decision. In their implementation, Wu et al represent this information as ontention Level (L). The L for an input channel is determined from the number of active requests for that channel that come from upstream. igure 4.1: A partial 2-D mesh example for AIS. This can be understood with the example shown in ig In the figure, all packets are destined for node (4,1). Three packets arrive at node (1,1) on three different input channels, and they all need the same output channel. The L for the corresponding input channel at (1,2) is 3, because there are three waiting packets. Node (2,0) has only one packet to send through node (2,1). Because of this the corresponding input channel at node (2,1) has L of 1. When node (2,1) decides which input channel to connect to the output leading to (3,1), it will choose (1,1) since the L is higher. This channel is highlighted in green, while the losing channel is shown in red. Somewhat counter intuitively, the L values between successive nodes do not add. In their implementation, the authors choose to have L values not propagate beyond a single stage, so at node (3,1) the L coming from node (2,1) is only 2 and not 4, because there are 2 input channels that want to send from (2,1) to (3,1). The information that one of the inputs at (2,1) has 3 packets waiting and one of them has only 1 packet waiting is not propagated further along. The authors cite the reason for this being to save on hardware complexity. The primary overhead of AIS as described by Wu et al is the need for a AIS module on each of the output channels in the router. The AIS module is responsible for two main tasks, one is to generate the L information for its particular output to pass downstream where it will become the L value for the input channel, and the other is to examine the L values for all inputs requesting its output and decide which should obtain the output based on 7

8 their L values. While not explicitly specified, it is likely that this module would consist of no more than a single 1 s counter and a comparator tree. The other main overhead of AIS is wires to convey the L information generated at the output of one node to the corresponding input on another node. The authors bound the number of additional wires needed and, as we have already noted, state that additional wires do not have a significant impact because the design is targeted for No [Dally 01] Evaluation The authors created VHDL switch models in order to test the AIS algorithm. S was chosen as the representative traditional input selection model. To show results on both a deterministic and adaptive algorithm, XY routing [Ni 93] and OE routing [hiu 00] were chosen. A total of four models were made, combining each input and output selection algorithm: XY+S, XY+AIS, OE+S, and OE+AIS. A 6x6 mesh was chosen for analysis, with buffers being 5 flits in size and packets being 5 flits in length. Three synthetic traffic patterns uniform, transpose, and hot spot were used along with one pattern derived from real-life traffic GSM voice ODE. Simulations for the traffic patterns were done over 50,000 clock cycles, with the first 5,000 clock cycles ignored for reasons of network stability. Results from simulation indicate that for uniform traffic the AIS variants perform significantly better than their S counterparts. Transpose traffic proved more difficult for AIS, with the XY+S and XY+AIS setups performing almost the same and the OE+AIS performing marginally worse than OE+S. The authors indicate that because transpose traffic rarely creates the scenario where more than two inputs are contending for the same output, and thus AIS does not offer much benefit. Under hot spot traffic, XY+AIS and OE+AIS both perform better than XY+S and OE+S. Under the ODE traffic, XY+AIS and OE+AIS again do better than both XY+S and OE+S. As a last point of interest, the authors describe the estimated areas for each of the four router implementations under a 0.12 µm technology. As would be expected because of the additional complexity of AIS compared to S, the areas for the AIS implementations are larger. The area of XY+AIS is 1.9% higher than XY+S, the area of OE+AIS is 2.6% higher than OE+S ritique Although this paper proposes an inventive idea in an area that has not been carefully considered before, there are a number of points in the paper that needed to be fleshed out in order to solidify the concepts is presents. No worst case traffic pattern analysis is done, attempted, or even mentioned for AIS. While this may have been difficult to do with the adaptive routing algorithm (OE), it should have been possible for the deterministic routing algorithm (XY). It would have been insightful to know if AIS was beneficial to the worst case pattern, or if the worse case pattern changed when AIS was applied instead of S. The scope of the evaluation performed is limited. Only a single mesh size is used (6x6), a single packet size (5 flits) and a handful of traffic patterns (uniform, transpose, hot-spot, and an example based on GSM voice ODE traffic) are examined. This severely limits the conclusions that can be drawn from the results, and the overall assessment that can be made of the algorithm. Questions concerning the scalability of the algorithm to different size meshes, the impact of increased or varied packet sizes, or the performance under average case traffic are unanswered. At the least it would have been instructive to test the initial 6x6, 5 flit-per-packet setup with some randomly chosen traffic patterns to gain some heuristic data. There is no insight given as to how much additional delay the logic for AIS would create within the router. While possible to imagine what logic circuits would be needed to perform the L computations and comparisons, the authors should have given this some thought and at least made a brief statement with some justification. Some of the results reported are not easy to interpret vs. a baseline algorithm because various parameters are unspecified. or example, results are given for average packet latency in terms of cycles however there is no indication as to how long a cycle is. Relating to the previous point, this information could be important because the additional AIS hardware may have required a longer clock cycle time. The authors mention that L numbers do not propagate more than a single node (see Sec. 4.2 and ig. 4.1) in an attempt to keep hardware complexity low. It would have been interesting to see an analysis detailing how much additional hardware would be necessary to propagate L values through more than one node, and if this added information would have had a worthwhile impact on performance. While Wu et al. do provide information regarding the impact on the area of the design when using AIS vs. S they do not draw many interesting conclusions from this information. The authors show no more than what the reader was likely to already suspect; AIS is logically a more complex algorithm than S and because of this consumes more area. It is helpful to see that the area increase is not an inordinately large amount, but with some deeper investigation it would have been possible to provide more insight to the reader. or instance, an analysis as to whether or not the performance gains garnered by using AIS are 8

9 sufficient to justify the additional area overhead would have helped to provide a basis for designers to decide whether or not to include AIS. The most critical flaw of the paper is that during the presentation of the GSM voice ODE traffic results, the authors start a tangential but very important discussion to mention that packet starvation is a potential issue with of AIS (a packet with low contention may wait indefinitely on packet with higher contention). This potentially debilitating problem is quickly brushed off as not occurring in the experiments. A very brief explanation is given as to why starvation could not occur with only two packet streams, but for more streams this issue is left as future work. Wu et al attempt to downplay the problem, but starvation is important to real designs where worstcase packet delivery time can be a key parameter. AIS would have been much more appealing overall had the starvation issue been thoroughly addressed onclusion Overall, the authors proposed AIS algorithm provides good insight into the area of input selection an area that has for a long time been relegated to simple and perhaps inefficient methods (round robin and S). The authors provide enough information for the reader to adequately understand the basics behind the AIS concept, and show that under a few select circumstances AIS can offer a performance gain worth considering. It is unfortunate that the evaluation is so limited in scope, and that the authors didn t properly address the critical issue of packet starvation, but AIS remains a good idea overall that, with appropriate follow-up efforts, could make a significant impact. 5. Overall ritique The 2-D mesh routing papers often consider different networks setups and loads. omparing routing algorithms with different flow control strategies, packet sizes, traffic patterns, etc. can be difficult. Additionally, there was not always sufficient or convincing justification for the choices that were made, which in turn affected the simulation results. It would be helpful if a standardized set of benchmarks were developed for network routing algorithm evaluation, especially for Nos which deal with a restricted class of networks (typically 2-D meshes or tori). Specifically a standard set of traffic patterns and packet sizes should be tested. or example, all the papers only analyzed a packet size of five flits. This is a good choice for small packet sizes, but does not describe the network behavior for larger packet sizes that can be fairly common in today s No systems. Benefits of larger packets for adaptive algorithms include: 1. Higher router latency, due to increased routing computation, is amortized over a larger number of flits. 2. Longer packets can cause higher congestion for a given injection rate and hurt the performance of certain oblivious routing algorithms. On the other hand, adaptive algorithms can detect this congestion and route appropriately to keep throughput high and queue latency low. This should be case for a significant sample of realistic traffic patterns. It would be instructive for the authors to compare the performance of the routing algorithms with varying packet sizes, for example 5, 10 and 20 flits per packet and varying the mesh sizes, for example 4 x 4, 6 x 6 and 8 x 8 to account for different network designs. This would be useful to determine how well a design can handle varying packet sizes and how network diameter impacts the design. We would suggest having an assortment of traffic patterns such as 1. Nearest Neighbor 2. Uniform Random 3. Bit omplement 4. Tornado 5. Worst ase (if known for that specific algorithm) 6. Average ase (over a significant number of traffic patterns) 7. Transpose 8. Realistic Traffic Patterns (odecs, Multimedia) A normalized latency metric such as O4 delay and a normalized throughput metric based on channel capacity would be useful to compare the different algorithms reported. An area comparison relative to a DOR implementation would be useful to assess the overhead of these algorithms. None of the papers considered the power impact of their schemes versus the baseline case of DOR. On-chip network power analysis will become increasingly important as the interconnection network will contribute a significant portion of the total power with increasing number of nodes on a chip. Single node power has been greatly optimized, and based on Amdahl s law, the power consumed by the on-chip network should also be minimized to lower the overall chip power. Table 5.1 is a summary of the various aspects of each of the three papers, detailing the quality of analysis on several key points. Grades denote quality, depth, and rigor of analysis and are not and indicator of the performance of the algorithm with respect to that metric. It was difficult to compare the performance of the algorithms due to lack of standardized evaluation parameters. The included comments are a summary of the results and / or the scope of the analysis. 9

10 Worst ase Throughput Analysis Average ase Throughput Analysis Packet Size omparison Area Overhead vs. baseline Router Latency Overhead vs. baseline Power Overhead vs. baseline Mesh Size omparison DyAD (Hybrid Adaptive Deterministic) O1TURN (Oblivious) AIS (Oblivious or Adaptive) Not Reported Matches Valiant A Not Reported 4 synthetic and 1 real-world 1 million random 3 synthetic and 1 real-world traffic patterns traffic patterns traffic patterns B+ 5 flits / packet 5 flits / packet 5 flits / packet 7 % vs. DOR Negligible vs. DOR 1.9 % % vs. S A B A Same slow clock cycle for DOR as Similar or slightly better Not Reported DyAD A Not Reported Not Reported Not Reported 4 x 4, 6 x 6 and 8 x 8 4 x 4, 8 x 8 and sub-mesh 6 x 6 B+ B+ Table 5.1 omparison between DyAD, O1TURN and AIS routing algorithms 6. onclusion A hybrid algorithm with an appropriately selected algorithm pair (one deterministic and the other adaptive) with a system designed to take advantage of that flexibility may perform better in certain cases than purely adaptive or oblivious routing algorithms; however, the DyAD paper provides little concrete evidence to support this. If optimal worst-case throughput and low latency are critical and decent average case performance is sufficient then the O1TURN routing algorithm is a good choice. The AIS technique could be beneficial to a variety of output based routing techniques if proper measures are taken to avoid starvation. If average-case throughput is critical for a network along with optimal worst-case and slightly suboptimal latency is tolerable then an adaptive algorithm like QR may be the better choice. 7. References [hiu 00] hiu, G.-M., The odd-even turn model for adaptive routing. IEEE Tran. on Parallel and Distributed Systems, 11(7): , July [Dally 01] Dally, W. J. and B. Towles, Route Packets, Not Wires: On-hip Interconnection Networks, DA 2001, June 18-22, 2001, Las Vegas, Nevada, USA. [Duato 93] Duato. J., A New Theory of Deadlock-ree Adaptive Routing in Wormhole Networks, IEEE Transactions on Parallel and Distributed Systems, 4(12): , Dec [Glass 98] Glass,.J. and L.M. Ni, The turn model for adaptive routing, 25 Years ISA: Retrospectives and Reprint, pages , [Henkel 04] Henkel, J., W. Wolf, and S. hakradhar, "Onchip networks: A scalable, communication-centric embedded system design paradigm," VLSI Design, pp , India, [Hu 04] Hu, J., R. Marculescu, "DyAD - Smart Routing for Networks-on-hip," dac, pp , Design Automation onference, 41st onference on (DA'04), [Nesson 95] Nesson, T. and S. L. Johnsson, Romm routing on mesh and torus networks, Proc. of the seventh annual AM symposium on Parallel algorithms and architectures, pages AM Press, [Ni 93] Ni, L. M. and P. K. McKinley, A survey of wormhole routing techniques in direct networks, omputer, vol. 26, pp , [Petrini 97] Petrini,. and M. Vanneschi, "Performance Analysis of Minimal Adaptive Wormhole Routing with Time-Dependent Deadlock Recovery, " in Proc. of the 11th Int. Parallel Processing Symp., IPPS'97, pp , April [Radulescu 03] Radulescu A. and Goossens K.. ommunication Services for Networks on Silicon, Domain-Specific Processors: Systems, Architectures, Modeling, and Simulation. Marcel Dekker, [Schwiebert 93] Schwiebert L. and D. N. Jayasimha, "Optimal ully Adaptive Wormhole Routing for Meshes," Supercomputing '93, pages , November [Seo 06] Seo, D., A. Ali, W. Lim, N. Rafique and M. Thottethodi, Near Optimal Worst-case Throughput 10

11 Routing for Two-Dimensional Mesh Network, ISA [Singh 02] Singh A., Dally, W. J., B. Towles and A. K. Gupta, Locality-preserving randomized oblivious routing on torus networks, Proc. fourteenth annual AM symposium on Parallel algorithms and architectures, pages 9-13, [Singh 04] Singh A., W. J. Dally, A. K. Gupta, B. Towles, Adaptive channel queue routing on k-ary n-cubes, SPAA 2004: [Towles 02] Towles, B. and Dally, W. J., "Worst-case Traffic for Oblivious Routing unctions," (preliminary version) AM Symposium on Parallel Algorithms and Architectures (SPAA), Winnipeg, Manitoba, anada, August, [Upadhyay 97] Upadhyay J., V. Varavithya, and P. Mohapatra, A Traffic Balanced Adaptive wormhole routing scheme for Two-Dimensional Meshes, IEEE Transactions on omputers, pages , May [Valiant 81] Valiant, L. G. and G. J. Brebner, Universal schemes for parallel communication, Proc. thirteenth annual AM symposium on Theory of computing, pages AM Press, [Vangal 07] Vangal S., J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. inan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote and N. Borkar, An 80-Tile 1.28TLOPS Network On-hip in 65nm MOS, ISS 2007, ebruary 11-15, 2007, San rancisco, alifornia, USA. [Wu 06] Wu, D., Al-Hashimi, B. M. and M. T. Schmitz, Improving Routing Efficiency for Network-on-hip through ontention-aware Input Selection, Proceedings of 11th Asia and South Pacific Design Automation onf. 2006, Japan. 11

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Dong Wu, Bashir M. Al-Hashimi, Marcus T. Schmitz School of Electronics and Computer Science University of Southampton

More information

Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound

Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound Oblivious Routing Design for Mesh Networs to Achieve a New Worst-Case Throughput Bound Guang Sun,, Chia-Wei Chang, Bill Lin, Lieguang Zeng Tsinghua University, University of California, San Diego State

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks 2080 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012 Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks Rohit Sunkam

More information

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin 50 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 1, NO. 2, AUGUST 2009 A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin Abstract Programmable many-core processors are poised

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Technical Report #2012-2-1, Department of Computer Science and Engineering, Texas A&M University Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Minseon Ahn,

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Daeho Seo Akif Ali Won-Taek Lim Nauman Rafique Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks

Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Near-Optimal Worst-case Throughput Routing for Two-Dimensional Mesh Networks Daeho Seo Akif Ali Won-Taek Lim Nauman Rafique Mithuna Thottethodi School of Electrical and Computer Engineering Purdue University

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012. CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION by Stephen Chui Bachelor of Engineering Ryerson University, 2012 A thesis presented to Ryerson University in partial fulfillment of the

More information

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh 98 Int. J. High Performance Systems Architecture, Vol. 1, No. 2, 27 Design of a router for network-on-chip Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh Department of Electrical Engineering and Computer

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Demand Based Routing in Network-on-Chip(NoC)

Demand Based Routing in Network-on-Chip(NoC) Demand Based Routing in Network-on-Chip(NoC) Kullai Reddy Meka and Jatindra Kumar Deka Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India Abstract

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture On-chip Networking Prof. Michel A. Kinsy Virtual Channel Router VC 0 Routing Computation Virtual Channel Allocator Switch Allocator Input Ports VC x VC 0 VC x It s a system

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Finding Worst-case Permutations for Oblivious Routing Algorithms

Finding Worst-case Permutations for Oblivious Routing Algorithms Stanford University Concurrent VLSI Architecture Memo 2 Stanford University Computer Systems Laboratory Finding Worst-case Permutations for Oblivious Routing Algorithms Brian Towles Abstract We present

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

AC : HOT SPOT MINIMIZATION OF NOC USING ANT-NET DYNAMIC ROUTING ALGORITHM

AC : HOT SPOT MINIMIZATION OF NOC USING ANT-NET DYNAMIC ROUTING ALGORITHM AC 2008-227: HOT SPOT MINIMIZATION OF NOC USING ANT-NET DYNAMIC ROUTING ALGORITHM Alireza Rahrooh, University of Central Florida ALIREZA RAHROOH Alireza Rahrooh is a Professor of Electrical Engineering

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

The Odd-Even Turn Model for Adaptive Routing

The Odd-Even Turn Model for Adaptive Routing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 7, JULY 2000 729 The Odd-Even Turn Model for Adaptive Routing Ge-Ming Chiu, Member, IEEE Computer Society AbstractÐThis paper presents

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

INTERCONNECTION networks are used in a variety of applications,

INTERCONNECTION networks are used in a variety of applications, 1 Randomized Throughput-Optimal Oblivious Routing for Torus Networs Rohit Sunam Ramanujam, Student Member, IEEE, and Bill Lin, Member, IEEE Abstract In this paper, we study the problem of optimal oblivious

More information

Static Virtual Channel Allocation in Oblivious Routing

Static Virtual Channel Allocation in Oblivious Routing Static Virtual Channel Allocation in Oblivious Routing Keun Sup Shim Myong Hyon Cho Michel Kinsy Tina Wen Mieszko Lis G. Edward Suh Srinivas Devadas Computer Science and Artificial Intelligence Laboratory

More information

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips : An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips Varsha Sharma, Rekha Agarwal Manoj S. Gaur, Vijay Laxmi, and Vineetha V. Computer Engineering Department, Malaviya

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

in Oblivious Routing

in Oblivious Routing Static Virtual Channel Allocation in Oblivious Routing Keun Sup Shim, Myong Hyon Cho, Michel Kinsy, Tina Wen, Mieszko Lis G. Edward Suh (Cornell) Srinivas Devadas MIT Computer Science and Artificial Intelligence

More information

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs Jindun Dai *1,2, Renjie Li 2, Xin Jiang 3, Takahiro Watanabe 2 1 Department of Electrical Engineering, Shanghai Jiao

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS 1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

More information

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router

Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router erformance Evaluation of robe-send Fault-tolerant Network-on-chip Router Sumit Dharampal Mediratta 1, Jeffrey Draper 2 1 NVIDIA Graphics vt Ltd, 2 SC Information Sciences Institute 1 Bangalore, India-560001,

More information

Escape Path based Irregular Network-on-chip Simulation Framework

Escape Path based Irregular Network-on-chip Simulation Framework Escape Path based Irregular Network-on-chip Simulation Framework Naveen Choudhary College of technology and Engineering MPUAT Udaipur, India M. S. Gaur Malaviya National Institute of Technology Jaipur,

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636 1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. of Computer Engineering (DISCA) Universidad Politécnica de Valencia

More information

From Routing to Traffic Engineering

From Routing to Traffic Engineering 1 From Routing to Traffic Engineering Robert Soulé Advanced Networking Fall 2016 2 In the beginning B Goal: pair-wise connectivity (get packets from A to B) Approach: configure static rules in routers

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Basic Switch Organization

Basic Switch Organization NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model

A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model Jie Wu Dept. of Computer Science and Engineering Florida Atlantic University Boca Raton, FL

More information

Bursty Communication Performance Analysis of Network-on-Chip with Diverse Traffic Permutations

Bursty Communication Performance Analysis of Network-on-Chip with Diverse Traffic Permutations International Journal of Soft Computing and Engineering (IJSCE) Bursty Communication Performance Analysis of Network-on-Chip with Diverse Traffic Permutations Naveen Choudhary Abstract To satisfy the increasing

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links Hoda Naghibi Jouybari College of Electrical Engineering, Iran University of Science and Technology, Tehran,

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms Usman Mazhar Mirza Master of Science Thesis 2011 ELECTRONICS Postadress: Besöksadress: Telefon: Box 1026

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks 009 8th International Conference on Parallel Architectures and Compilation Techniques Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim, Michel Kinsy,

More information

Deadlock and Router Micro-Architecture

Deadlock and Router Micro-Architecture 1 EE482: Advanced Computer Organization Lecture #8 Interconnection Network Architecture and Design Stanford University 22 April 1999 Deadlock and Router Micro-Architecture Lecture #8: 22 April 1999 Lecturer:

More information

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim, Michel Kinsy, Tina Wen, and Srinivas Devadas

Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim, Michel Kinsy, Tina Wen, and Srinivas Devadas Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-009-0 March 7, 009 Oblivious Routing in On-Chip Bandwidth-Adaptive Networks Myong Hyon Cho, Mieszko Lis, Keun Sup Shim,

More information

Fault-adaptive routing

Fault-adaptive routing Fault-adaptive routing Presenter: Zaheer Ahmed Supervisor: Adan Kohler Reviewers: Prof. Dr. M. Radetzki Prof. Dr. H.-J. Wunderlich Date: 30-June-2008 7/2/2009 Agenda Motivation Fundamentals of Routing

More information

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Wormhole Routing Techniques for Directly Connected Multicomputer Systems

Wormhole Routing Techniques for Directly Connected Multicomputer Systems Wormhole Routing Techniques for Directly Connected Multicomputer Systems PRASANT MOHAPATRA Iowa State University, Department of Electrical and Computer Engineering, 201 Coover Hall, Iowa State University,

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

A Novel Semi-Adaptive Routing Algorithm for Delay Reduction in Networks on Chip

A Novel Semi-Adaptive Routing Algorithm for Delay Reduction in Networks on Chip Research Journal of Applied Sciences, Engineering and Technology 4(19): 3641-3645, 212 ISSN: 24-7467 Maxwell Scientific Organization, 212 Submitted: February 13, 212 Accepted: March 24, 212 Published:

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Analysis of Sorting as a Streaming Application

Analysis of Sorting as a Streaming Application 1 of 10 Analysis of Sorting as a Streaming Application Greg Galloway, ggalloway@wustl.edu (A class project report written under the guidance of Prof. Raj Jain) Download Abstract Expressing concurrency

More information

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS Chandrika D.N 1, Nirmala. L 2 1 M.Tech Scholar, 2 Sr. Asst. Prof, Department of electronics and communication engineering, REVA Institute

More information

EE 382C Interconnection Networks

EE 382C Interconnection Networks EE 8C Interconnection Networks Deadlock and Livelock Stanford University - EE8C - Spring 6 Deadlock and Livelock: Terminology Deadlock: A condition in which an agent waits indefinitely trying to acquire

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh Abstract The success of an electronic system in a System-on- Chip is highly

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information