Crossbar Analysis for Optimal Deadlock Recovery Router Architecture

Size: px
Start display at page:

Download "Crossbar Analysis for Optimal Deadlock Recovery Router Architecture"

Transcription

1 rossbar Analysis for Optimal Deadlock Recovery Router Architecture Yungho hoi Timothy Mark Pinkston SMART Interconnects Group EE-Systems Dept, University of Southern alifornia, Los Angeles, A fyunghoc@charity, tpink@charityguscedu; Abstract We explore the design of optimal deadlock recovery-based fully adaptive routers by evaluating promising internal router crossbar designs Unified and decoupled crossbar designs aimed at exploiting the full capabilities of adaptive routing are evaluated by analyzing their effect on overall network performance We show that an enhanced hierarchical crossbar design that supports routing locality in virtual network class achieves highest performance with relatively low cost 1 Introduction The importance of the interconnection network in achieving high-performance parallel computing is conspicuous ommunication efficiency is maximized if network routers are designed to fully exploit the underlying capabilities of the network and routing algorithm Routing adaptivity and router speed critically affect the overall network performance Unfortunately, the two are competing factors as increased adaptivity generally results in increased router delay Deadlock-recovery routing schemes [1, 2] maximize adaptivity by completely eliminating routing restrictions enforced by deadlock avoidance schemes to prevent deadlock If not implemented carefully, this increase in adaptivity can compromise the cost and delay of the router, resulting in overall network performance degradation rather than improvement The internal router crossbar design significantly affects the cost and delay of other router components such as header selection, routing decision and arbitration logic, etc The router crossbar design determines its size and routing freedom, which can influence 50% or more of the total router delay [3] Therefore, it is necessary to obtain an optimal crossbar design to implement high-performance deadlock recovery routers This study explores the design of optimal deadlock recovery-based fully adaptive routers that minimize the cost of adaptivity while maximizing network performance A variety of promising internal router crossbar designs are evaluated in terms of cost and speed Further, the performance of This research was supported by an NSF Research Initiation Award, grant ES , and an NSF areer Award, grant ES each design is simulated at the network level The next section presents relevant background and related work Section 3 describes the crossbar designs and their unique features Section 4 presents extensive evaluation and performance analysis of the router crossbar designs using modeling and network level simulation Finally, the conclusions drawn from this work are given in Section 5 2 Background Many crossbar designs have been developed for deadlock avoidance algorithms which reduce router cost and delay by exploiting routing restrictions enforced by the underlying algorithm In the Mesh Routing hip [4], for example, static dimension ordered routing is enforced where packets are not allowed to route in the Y dimension before reaching the dimension of the destination This makes it possible to partition the crossbar into smaller, faster units based on dimension In the partially adaptive Planar Adaptive Router [5], only two dimensions at a time are available for routing to avoid deadlock (routing in all other dimensions is prohibited) By exploiting these routing restrictions, the crossbar is partitioned into planar subcrossbar units (as opposed to dimensional in the Mesh Routing hip) to improve speed In the Hierarchical Adaptive Router [6], the crossbar is partitioned into ordered virtual network subcrossbar classes Deadlock is avoided by enforcing routing restrictions in the lowest virtual network, viz, Duato s algorithm [7] The less optimal alternative to these designs applicable to all avoidance algorithms is a unified crossbar design Its performance is less influenced by routing restrictions but more coupled to network characteristics such as node degree and virtual channel support Router crossbar designs should fully exploit the restrictions as well as the capabilities of the underlying routing algorithm and network to achieve highest possible performance Because of the different restrictions inherent to deadlock avoidance algorithms, not all crossbar designs are optimal or even applicable Deadlock recovery routers enforce few if any restrictions, allowing many more crossbar designs to be used including modified variants of designs proposed for deadlock avoidance routers These designs should aim at exploiting the full capabilities of unrestricted routing while, at the same

2 (a) U-B 1st bar 2nd bar th bar (c) H-B from nth bar 1st bar from nth bar (b) -B 2nd bar nth bar 1st bar 2nd bar th bar (d) E-B Figure 1 Internal router crossbar designs to 1st bar to 1st bar time, reduce crossbar complexity This is the motivationof our work We explore the design of optimal deadlock recoverybased routers through careful analysis of promising crossbar designs Although our analysis is applicable to deadlock recovery schemes in general, we focus on Disha-based [1] recovery routing 3 Router rossbar Designs Four alternative crossbar designs for deadlock recovery routers are presented and evaluated by examining their unique features They are classified into two categories: unified crossbar designs and decoupled crossbar designs We consider one unified-crossbar structure (U-B) and three decoupled crossbar structures shown in Figure 1: the cascade-crossbar (-B), the hierarchical-crossbar (H-B) and the enhancedhierarchical-crossbar (E-B) We describe these crossbar designs in greater detail, but first a few assumptions are made A connect channel is an internal physical channel which connects two subcrossbars within the same router; increasing the number of connect channels decreases internal blocking but increases subcrossbar size We assume a k-ary cube network with virtual channels per physical link and connect channels per router subcrossbar, where applicable Messages are assumed to be received from processing nodes througha randomly selected injection virtual channel (ie, in Figure 1) All mutual and external deadlocks are assumed to be recovered from by a deadlock recovery mechanism, eg, in Disha [1], the centralized deadlock buffer () is used to progressively recover from deadlock 31 Unified Design The most straightforward of the four crossbar designs is unified-crossbar, shown in Figure 1(a) Its structure consists of a single crossbar capable of connecting all router inputs to any of the router outputs across all virtual channels This results in a crossbar size of P = (21),whereP is the number of crossbar input ports The cost and speed are functions of both n and, resulting in increased delay as n and increase However, due to U-B s strictly noblocking internal structure, any input port can be connected to an available output port in a single cycle regardless of other connections This capability can be exploited by fully adaptive routing, making the U-B worthwhile to evaluate despite its potentially long delay 32 Decoupled Designs The decoupled crossbar designs consist of smaller subcrossbars connected by connect channels This structure reduces the size of the crossbar as well as the complexity of the routing arbitrator, making the router potentially faster Further, this design structure is intended to exploit routing locality in dimension or in virtual channel network If most packets tend not to change dimensions or virtual channel network frequently due to locality in routing, then it is not necessary for the crossbar to provide packets with direct access to all output channels even in the case of fully adaptive routing Instead, changes in direction or virtual network can be supported by simply requiring that indirect access to all output channels through subcrossbar connect channels be provided This makes decoupled crossbar designs which differ in the type of routing locality exploited interesting to evaluate Each design is briefly discussed below The cascade-crossbar design is derived from avoidancebased dimension ordered routing [4] but modified to exploit recovery-based fully adaptive routing As shown in Figure 1(b), each subcrossbar in -B is associated with only one dimension which consists of virtual channels for each direction This results in a subcrossbar input size of P = (2 + +1)ports Whenever packets change dimensions, the connect channel must be used to access the next subcrossbar in sequence Unlike the subcrossbars in avoidance-based routers, the subcrossbars and virtual channels in each subcrossbar may be used by packets in no prescribed order for recovery-based fully adaptive routing Additionally, wrap around connections are also allowed in -B from the lowest dimension to the highest dimension to support adaptive routing This router design exploits routing locality in direction and is, therefore, promising if packets being adaptively routed tend to continue in a given dimension the majority of time before turning (possibly using different virtual channels within that dimension) In fact, many adaptive routers currently implement a preferred channel selection policy of straight over turn to exploit such locality in routing decisions This design exploits this fact

3 The hierarchical-crossbar design is derived from the Hierarchical Adaptive Router [6] Each subcrossbar in H-B is associated with one virtual channel network (N) which includes all dimensions and directions as shown in Figure 1(c) This results in a subcrossbar input size of P = max[(2n + ); (2n + )] ports Its subcrossbars each support connections to all directions, which allows packets to change dimensions within each subcrossbar but requires the use of connect channels to change Ns This crossbar structure exploits routing locality in virtual channel network, which is expected to better support adaptive routing than locality in dimension by allowing packets to avoid faulty and congested areas without using connect channels Unlike in the deadlock-avoidance routers, no routing restrictions are enforced on the lowest subcrossbar N for recovery-based routing However, inherent to this hierarchical design structure is the restriction that no connect channels exist between the lowest and highest Ns Moreover, packets are injected into the highest N only and trickle down to the next lowest N on blockage, making the lowest virtual network a potential bottleneck While this ordering between Ns is required for avoidance-based routing, it is a restriction that can be relaxed in recovery routing The enhanced-hierarchical-crossbar design is proposed as an enhancement to the hierarchical-crossbar design In addition to the connectivity supported by H-B, the E-B design has wrap-around connect channels between the lowest and highest virtual channels networks since N ordering is not necessary in recovery-based routing (see Figure 1(d)) This solves the bottleneck problem of H-B In addition, injection ports from the processor are distributed over all subcrossbars in E-B to balance N load [8] Again, this feature is disallowed in deadlock-avoidance routers but is allowed with recovery routing Therefore, E-B exploits routing locality in N while supporting greater, more distributed access to Ns The resulting subcrossbar input size is P = (2n + +1)ports While the decoupled crossbar structures provide many potential performance benefits, each inherently has internal complications that can degrade performance if not handled correctly as discussed below 321 Internal Blocking The function of connect channels is to provide access to decoupled subcrossbars for packets that are currently blocked at the present subcrossbar or have finished routingover the resources supported by the present subcrossbar The number of connect channels between two neighboring subcrossbars is critical to the performance of the router as the lack of a sufficient number limits the available routing alternatives of packets, causing internal blocking (eg, packet P 3 in Figure 2(a)) Internal blocking may be more frequent in some decoupled designs than in others There is a cost and delay associated with increasing the number of connect channels to prevent internal blocking, which sets an upper bound Hence, it is important to examine this trade-off to balance performance and cost P3 P2 B Module Internal Blocking P2 onnect hannel () Legend : Inactive packet Request for channel Active packet Subcrossbar (a) Internal Blocking (b) Internal Self-Deadlock (c) Internal Mutual-Deadlock Figure 2 Internal Blocking and Deadlock 322 Internal Deadlock Internal deadlock is defined as the infinitely persisting packet blockage resulting from connect channel cyclic dependency Two types can occur: internal self-deadlock and and internal mutual-deadlock illustratedin Figures 2(b) and (c), respectively, for one connect channel per subcrossbar, minimal routing, and the -B design In Figure 2(b), packet P 1 tries to obtain available output channels in subcrossbar y or z as it has finished routing in the x dimension It subsequently moves through subcrossbars y and z and back to x through the wraparound connect channel because the output channels in y and z were not immediately available As a result, packet P 1 will remain blocked forever because it has used up all connect channels needed to route in the remaining dimensions to its destination Obviously, internal self-deadlock can significantly degrade performance as it may cause other packets to internally block In Figure 2(c), packets P 1, P 2,andP 3 are trying to reach subcrossbars z, x, and y, respectively, because they have finished routing in all other dimensions However, they mutually block one another in a cyclic fashion These internal deadlock situations must be guarded against in the decoupled crossbar structures Theorem 1: Necessary conditions for internal deadlock to occur are (1) a decoupled crossbar structure, (2) wraparound connect channels, and (3) the absolute necessity for a packet to traverse the next subcrossbar in sequence, ie, the destination address in the present dimension is reached but not in at least one other dimension Proof: yclic dependency among resources is a necessary condition for deadlock [9] To build a cyclic dependency among connect channels, conditions (1) and (2) are required To satisfy the hold and wait-for condition of deadlock, condition (3) is required If there is no need for a packet to move to other subcrossbars, the blocked packet will subsequently be assigned to an output channel in the current subcrossbar From Theorem 1, internal deadlock can occur only in the cascade-crossbar the only decoupled design which satisfies all conditions Internal deadlock can be solved by the same recovery method used for external deadlock (ie, the resource in Disha) However, internal deadlocks cause local router congestion which may lead to overall network perfor- x y z P2 P3 x y z

4 mance losses To improve router and network performance, the followingrestrictions may be applied to avoid internal selfdeadlock, although they are not necessary: no packet is allowed to use more than(n,1) connect channels and no packet may use the connect channels leading to subcrossbar dimensions in which it has finished routing 323 Multi-cycle Delay In the decoupled crossbar structures, packets may experience different setup delays and data-through delays, depending on the number of subcrossbar traversals If the clock cycle is bound by the setup (or data-thru) delay of one subcrossbar, packets requiring subcrossbar traversal would take multiple cycles to pass throughthe router, which increases their average router delay Therefore, while the router speed (clock cycle time) may be faster than the unified design, the average router delay can actually be higher, depending on the dynamic behavior of routes taken through the network If routing locality behavior exists and is appropriately exploited, the decoupled designs should outperform the unified designs This presents an interesting trade-off that must be evaluated by simulation 4 Performance of rossbar Designs An optimal router architecture for deadlock recovery-based routing should incorporate the best of alternative internal router crossbar designs We evaluate these designs at the router level by estimating router delay and cost using hien s model [3] and at the network level by measuring network performance via simulation 41 Router Level Performance Evaluation We compare our router designs in terms of cost and speed using hien s model [3] and assuming n = = 3Forthe decoupled crossbar designs, we vary the number of connect channels from one to three Table 1 gives the overall router cost and delay for the alternative crossbar designs The decoupled crossbar routers are faster and less costly than the unified crossbar router by up to 20% and 33%, respectively These advantages increase as the number of dimensions and virtual channels grow but diminish as the number of connect channels grow beyond a certain point Nevertheless, faster routers do not always result in higher network performance Performance evaluation of each router design at the network level is required to determine how well each design exploits routing locality and the underlying capabilities of recovery-based routing flexibility 42 Network Level Performance Evaluation We compare the performance of the crossbar designs through extensive simulation using FlexSim, a more flexible version of FLITSIM 20 All simulations are run on an 8 Table 1 ost and delay of router designs Gate ount Tsetup Tdata-thru U-B ns 524ns -B (=1) ns 441ns -B (=2) ns 452ns -B (=3) ns 461ns H-B (=1) ns 433ns H-B (=2) ns 441ns H-B (=3) ns 452ns E-B (=1) ns 441ns E-B (=2) ns 452ns E-B (=3) ns 461ns 8 8 three dimensional torus (n = 3) with 3 virtual channels per physical channel and full-duplex links Messages are 32 flits long A buffer depth of two is assumed All router designs use one injection and reception channel per node A true fully adaptive minimal deadlock recovery routing scheme (Disha) is assumed with a default time-out of 25 cycles before deadlock is suspected Maximum normalized throughput (in flits/cycle/sec) and average latency (in nsec) is measured which take into account multi-cycle and router delay penalties of the different designs lock cycle time is assumed to be the minimum data-thru delay of a single pass through the router (sub)crossbar Uniform Traffic Results: Router designs using -B with 1 to 3 connect channels are compared against the U- B router design in Figure 3(a) As shown, the performance of routers with -B improves drastically as the number of connect channels increases This result indicates that connect channels in -B are critical resources and that locality in dimension is not high enough to exploit the flexibility provided by adaptive routing To have comparable throughput as U-B, -B requires at least 3 connect channels Router designs using H-B with 1 to 3 connect channels are compared to the U-B router design in Figure 3(b) Unlike that of the -B router, the performance of the H-B router is not affected significantly by the number of connect channels This indicates that the frequency of packets using connect channels is much smaller in H-B than in -B, which means that locality in virtual channel network is better able to exploit adaptive routing, unlike locality in dimension However, the H-B router does not have performance comparable to U-B (H-B with 3 connect channels has 12% less maximum throughput) This shows that the lowest virtual channel network is indeed a bottleneck When packets internally block in the H-B router, they move down to the next lower virtual network and have to finish routing in this congested network even though higher networks free up after packets reach the lower virtual networks Therefore, this crossbar design is less suitable for deadlock recovery-based adaptive routers As shown in Figure 3(c), the performance of the E-B

5 (a) ascade B (b) Hierarchical B (c) Enhanced B : U B o : B(=1) + : B(=2) * : B(=3) : U B o : H B(=1) + : H B(=2) * : H B(=3) : U B o : E B(=1) + : E B(=2) * : E B(=3) Figure 3 Latency and throughput comparisons under Uniform traffic router exceeds that of U-B and all other crossbar designs The E-B router with only 2 connect channels has up to 25% lower latency and slightly higher maximum throughput than the U-B router Moreover, there is not a wide performance gap in going from one connect channel to three, which indicates that they are not critical resources One reason why the E-B router with only one connect channel shows such good performance is that message injection at the node is distributed uniformly over all virtual channel networks, which minimize the need for packets to change virtual channel networks and allows them to experience less subcrossbar traversals Another reason is that locality in virtual channel network is high and the design can still exploitthe full capabilities of adaptive routing Moreover, the possible performance losses of the decoupled crossbar structure are negligible compared to the overall advantages Nouniform Traffic Results: We also characterize the performance of these router designs using bit-reversal and perfect shuffle nouniform traffic patterns As shown in Figure 4(a), each additional connect channel increases the maximum throughput of the -B router by 7 units However, unlike the case with uniform traffic, even three connect channels are not enough to obtain maximum throughput equal to the U-B router Not only are connect channels critically limiting resources but also locality in dimension under bit-reversal traffic pattern is worse than that under uniform traffic In Figure 4(b), the H-B router shows similar results as the -B router except for the fact that the number of connect channels do not significantly impact performance Instead, the lowest virtual channel network limits the performance of the H-B router In contrast, the E-B router shows comparable performance to the U-B router, which means exploitation of locality in virtual channel network is profitable under uniform as well as nouniform traffic Further, uniformly distributing message injectionacross subcrossbars helps to mitigatethe potential problems associated with its decoupled crossbar structure (ie, internal blocking and multiple cycle delay) so as to minimize their effects Simulation results under perfect shuffle traffic (Figure 5(a) and (b)) further confirm that connect channels and the low- With sequential RA ost (Gate ount) Table 2 Summary of Results Maximum throughput (RAN) Average message latency (RAN) Maximum throughput (BR) Average message latency (BR) Maximum throughput (PS) U-B B (=1) B (=2) B (=3) H-B (=1) H-B (=2) H-B (=3) E-B (=1) E-B (=2) E-B (=3) Average message latency (PS) est virtual channel network are limiting resources in -B and H-B designs, respectively In contrast, E-B routers outperform all other routers including the U-B router (Figure 5(c)) Moreover, the average latency for the E-B router is measured to be up to 25% lower than the U-B router Table 2 summarizes our results for the four alternative router designs presented The cost, average message latency, and maximum throughput of all router designs are normalized to the U-B router The H-B ( = 1) design is the least costly, however its performance is comparatively low The E-B ( = 2) design gives the best performance (highest throughputand lowest latency under uniform and nouniform traffic), and its cost is 20% cheaper than the U-B router design We, therefore, conclude that the enhanced-hierarchicalcrossbar design with a moderate number of connect channels is the most optimal design for fully adaptive deadlock recovery routers 5 onclusions and Future Work This paper explores the design of optimal deadlock recovery-based routers through careful analysis of unified and decoupled internal router crossbar designs rossbar designs are evaluated by examining their unique features, cost, speed, and overall effect on network performance We find that the higher cost and delay of the unified-

6 (a) ascade B (b) Hierarchical B (c) Enhanced B : U B : U B : U B o : B(=1) o : H B(=1) o : E B(=1) + : B(=2) + : H B(=2) + : E B(=2) * : B(=3) * : H B(=3) * : E B(=3) Figure 4 Latency and throughput comparisons under Bit Reversal traffic (a) ascade B (b) Hierarchical B (c) Enhanced B : U B : U B : U B o : B(=1) o : H B(=1) o : E B(=1) + : B(=2) + : H B(=2) + : E B(=2) * : B(=3) * : H B(=3) * : E B(=3) Figure 5 Latency and throughput comparisons under Perfect Shuffle traffic crossbar design outweighs the benefits of nonblocking connectability for high dimensional, large virtual channel networks Subcrossbar connect channels in the cascaded-crossbar design are limiting resources because routing locality in dimension poorly exploits fully adaptive routing, particularly for nouniform traffic Increasing the number of connect channels has an overall effect of improving performance up to the point where the subcrossbar size becomes prohibitively large onnect channels in the hierarchical-crossbar design are not critical resources but, instead, the lowest virtual channel network can be a performance bottleneck Although less costly than the other crossbar designs to implement by up to 20%, the hierarchical-crossbar design yields 20% less network throughput The enhanced-hierarchical-crossbar design outperforms the others in both cost and performance; compared to the unified design, it is 20% cheaper, 25% faster, and achieves slightlyhigher throughput Of the decoupled crossbar designs, it requires the fewest connect channels as it is able to exploit fully adaptive routing flexibility with routing locality in virtual channel network Our results suggest that the increased adaptivity offered by deadlock recovery-based routing algorithms can be profitably exploited and implemented in routers with reasonable cost and speed We will continue to explore the design of other internal router architecture components optimized for efficient deadlock recovery-based routers in future work References [1] Anjan K and Timothy Mark Pinkston DISHA: A Deadlock Recovery Scheme for Fully Adaptive Routing In eedingsof The 22ndInternationalSymposiumon omputer Architecture, pages 20210,IEEE omputer Society, June 1995 [2] J Kim, Z Liu, and A hien ompressionlessrouting: A Framework for Adaptive and Fault-tolerant Routing In eedings of the 21st International Symposium on omputer Architecture,IEEE omputer Society, pages , April 1994 [3] AndrewA hien A ost and Speed Model for k-ary cube Wormhole Routers In eedings of the Symposium on Hot Interconnects IEEE omputer Society, August 1993 [4] harles M Flaig LSI mesh routing systems Master s thesis, alifornia Institute of Technology, Departmentof omputer Science, May 1987 [5] Andrew A hien and J H Kim Planar-Adaptive Routing: Low-ost Adaptive Networks for Multiprocessors In eedings of the 19th Symposium on omputer Architecture, pages IEEE omputer Society, May 1992 [6] Ziqiang Liu and Andrew A hien, Hierarchical Adaptive Routing, In Symposium on Parallel and Distributed essing, October 1994 [7] J Duato A New Theory of Deadlock-free Adaptive Routing in Wormhole Networks IEEE Transactions on Parallel and Distributed Systems, 4(12):1320, [8] Steve Scott and Greg Thorson Optimized Routing in the ray T3D, in eedings of the Workshop on Parallel omputer Routing and ommunication, pp 28294, May 1994 [9] J Duato A Necessary and Sufficient ondition for Dead lock-free Adaptive Routing in Wormhole Networks IEEE Transactions on Parallel and Distributed Systems, 6(10): , October 1995

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Anjan K. V. Timothy Mark Pinkston José Duato Pyramid Technology Corp. Electrical Engg. - Systems Dept.

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Characterization of Deadlocks in Interconnection Networks

Characterization of Deadlocks in Interconnection Networks Characterization of Deadlocks in Interconnection Networks Sugath Warnakulasuriya Timothy Mark Pinkston SMART Interconnects Group EE-System Dept., University of Southern California, Los Angeles, CA 90089-56

More information

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks J. M. Martínez, P. López, J. Duato T. M. Pinkston Facultad de Informática SMART Interconnects Group Universidad

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. of Computer Engineering (DISCA) Universidad Politécnica de Valencia

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

The Cray T3E Network:

The Cray T3E Network: The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus Steven L. Scott and Gregory M. Thorson Cray Research, Inc. {sls,gmt}@cray.com Abstract This paper describes the interconnection network

More information

BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks

BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks BLAM : A High-Performance Routing Algorithm for Virtual Cut-Through Networks Mithuna Thottethodi Λ Alvin R. Lebeck y Shubhendu S. Mukherjee z Λ School of Electrical and Computer Engineering Purdue University

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

Routing and Deadlock

Routing and Deadlock 3.5-1 3.5-1 Routing and Deadlock Routing would be easy...... were it not for possible deadlock. Topics For This Set: Routing definitions. Deadlock definitions. Resource dependencies. Acyclic deadlock free

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

Generic Methodologies for Deadlock-Free Routing

Generic Methodologies for Deadlock-Free Routing Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University

More information

Adaptive Multimodule Routers

Adaptive Multimodule Routers daptive Multimodule Routers Rajendra V Boppana Computer Science Division The Univ of Texas at San ntonio San ntonio, TX 78249-0667 boppana@csutsaedu Suresh Chalasani ECE Department University of Wisconsin-Madison

More information

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ J. Flich, P. López, M. P. Malumbres, and J. Duato Dept. of Computer Engineering

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ P. López, J. Flich and J. Duato Dept. of Computing Engineering (DISCA) Universidad Politécnica de Valencia, Valencia, Spain plopez@gap.upv.es

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

Deadlock and Router Micro-Architecture

Deadlock and Router Micro-Architecture 1 EE482: Advanced Computer Organization Lecture #8 Interconnection Network Architecture and Design Stanford University 22 April 1999 Deadlock and Router Micro-Architecture Lecture #8: 22 April 1999 Lecturer:

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Basic Switch Organization

Basic Switch Organization NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization

More information

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY 2003 1 Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing José Flich, Member, IEEE, Pedro López, Member, IEEE Computer

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router

Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router erformance Evaluation of robe-send Fault-tolerant Network-on-chip Router Sumit Dharampal Mediratta 1, Jeffrey Draper 2 1 NVIDIA Graphics vt Ltd, 2 SC Information Sciences Institute 1 Bangalore, India-560001,

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

True fully adaptive routing employing deadlock detection and congestion control.

True fully adaptive routing employing deadlock detection and congestion control. True fully adaptive routing employing deadlock detection and congestion control. 16 May, 2001 Dimitris Papadopoulos, Arjun Singh, Kiran Goyal, Mohamed Kilani. {fdimitri, arjuns, kgoyal, makilani}@stanford.edu

More information

The final publication is available at

The final publication is available at Document downloaded from: http://hdl.handle.net/10251/82062 This paper must be cited as: Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The

More information

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

Wormhole Routing Techniques for Directly Connected Multicomputer Systems

Wormhole Routing Techniques for Directly Connected Multicomputer Systems Wormhole Routing Techniques for Directly Connected Multicomputer Systems PRASANT MOHAPATRA Iowa State University, Department of Electrical and Computer Engineering, 201 Coover Hall, Iowa State University,

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Fabrizio Petrini Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD, England e-mail: fabp@comlab.ox.ac.uk

More information

A Survey of Routing Techniques in Store-and-Forward and Wormhole Interconnects

A Survey of Routing Techniques in Store-and-Forward and Wormhole Interconnects SANDIA REPORT SAND2008-0068 Unlimited Release Printed January 2008 A Survey of Routing Techniques in Store-and-Forward and Wormhole Interconnects David M. Holman and David S. Lee Prepared by Sandia National

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

CONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK

CONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK IADIS International Conference on Applied Computing CONGESTION CONTROL BY USING A BUFFERED OMEGA NETWORK Ahmad.H. ALqerem Dept. of Comp. Science ZPU Zarka Private University Zarka Jordan ABSTRACT Omega

More information

Part IV. Chapter 15 - Introduction to MIMD Architectures

Part IV. Chapter 15 - Introduction to MIMD Architectures D. Sima, T. J. Fountain, P. Kacsuk dvanced Computer rchitectures Part IV. Chapter 15 - Introduction to MIMD rchitectures Thread and process-level parallel architectures are typically realised by MIMD (Multiple

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 3343 Abstract The

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks 2080 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012 Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks Rohit Sunkam

More information

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs Jindun Dai *1,2, Renjie Li 2, Xin Jiang 3, Takahiro Watanabe 2 1 Department of Electrical Engineering, Shanghai Jiao

More information

CONNECTION-BASED ADAPTIVE ROUTING USING DYNAMIC VIRTUAL CIRCUITS

CONNECTION-BASED ADAPTIVE ROUTING USING DYNAMIC VIRTUAL CIRCUITS Proceedings of the International Conference on Parallel and Distributed Computing and Systems, Las Vegas, Nevada, pp. 379-384, October 1998. CONNECTION-BASED ADAPTIVE ROUTING USING DYNAMIC VIRTUAL CIRCUITS

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

MESH-CONNECTED networks have been widely used in

MESH-CONNECTED networks have been widely used in 620 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 5, MAY 2009 Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model Dong Xiang, Senior Member, IEEE, Yueli Zhang,

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Deadlock- and Livelock-Free Routing Protocols for Wave Switching

Deadlock- and Livelock-Free Routing Protocols for Wave Switching Deadlock- and Livelock-Free Routing Protocols for Wave Switching José Duato,PedroLópez Facultad de Informática Universidad Politécnica de Valencia P.O.B. 22012 46071 - Valencia, SPAIN E-mail:jduato@gap.upv.es

More information

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes N.A. Nordbotten 1, M.E. Gómez 2, J. Flich 2, P.López 2, A. Robles 2, T. Skeie 1, O. Lysne 1, and J. Duato 2 1 Simula Research

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits

Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits Computer Science Department Technical Report #TR050021 University of California, Los Angeles, June 2005 Deadlock-Free Connection-Based Adaptive Routing with Dynamic Virtual Circuits Yoshio Turner and Yuval

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE By HAIBO ZHU A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

INTERCONNECTION networks are used in a variety of applications,

INTERCONNECTION networks are used in a variety of applications, 1 Randomized Throughput-Optimal Oblivious Routing for Torus Networs Rohit Sunam Ramanujam, Student Member, IEEE, and Bill Lin, Member, IEEE Abstract In this paper, we study the problem of optimal oblivious

More information

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Tsutomu YOSHINAGA, Hiroyuki HOSOGOSHI, Masahiro SOWA Graduate School of Information Systems, University of Electro-Communications,

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

The Adaptive Bubble Router 1

The Adaptive Bubble Router 1 The Adaptive Bubble Router 1 V. Puente, C. Izu y, R. Beivide, J.A. Gregorio, F. Vallejo and J.M. Prellezo Universidad de Cantabria, 395 Santander, Spain y University of Adelaide, SA 55 Australia The design

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information