ServerNet Deadlock Avoidance and Fractahedral Topologies

Size: px
Start display at page:

Download "ServerNet Deadlock Avoidance and Fractahedral Topologies"

Transcription

1 ServerNet Deadlock Avoidance and Fractahedral Topologies Robert Horst Tandem Computers Incorporated Ridgeview Court Cupertino, CA Abstract This paper examines the problems of deadlock avoidance in multistage networks, and proposes a new class of scalable topologies for constructing large networks without introducing loops that could cause deadlocks. The new topologies, called fractahedrons, are deadlock-free and reduce the maximum link contention compared to other networks. The use of fractahedral topologies is illustrated by various configurations of 4-port ServerNet routers. The properties of fractahedral networks are compared with networks configured as a mesh, hypercube or fat tree. 1.0 Introduction Multistage networks are finding increasing use in both massively parallel computer systems and in networks of workstations and PCs. The networks that provide this connectivity must provide high bandwidth, low latency, scalability, low cost, and reliability. To provide high reliability, it is important for the network to be designed in a a way that guarantees it will not deadlock. While some general techniques are known to avoid or recover from deadlocks, many of these techniques cannot be directly applied without adversely impacting cost or performance. Most traditional topologies for MPP networks have been developed and analyzed without particular regard to solving the deadlock problem. Some topologies that appear to be symmetric and ideal candidates for MPP systems may in fact be quite asymmetric and suboptimal when routing aigorithms are designed to avoid deadlocks. The development of ServerNet has caused us to take a fresh look at MPP topologies to look for better ways of constructing networks to optimize performance while avoiding the possibility of deadlock. ServerNet is a system area network for providing high-speed communications from processor to processor, processor to U0 device, or U 0 device to other U0 devices [l]. The first implementation of ServerNet, (formerly called TNet), has byte-serial point-to-point 50 MB/sec links. Full duplex operation is provided by pairing two unidirectional links in a cable that can reach up to 30 meters. Complex networks can be constructed using 6-port router ASICs (application-specific integrated circuits) that contain input FIFO buffers and a non-blocking crossbar switch. Full network fault-tolerance can be provided by configuring pairs of router fabrics with dual-ported nodes. ServerNet is the key enabling technology for implementing systems with different requirements for performance and reliability [2]. ServerNet systems support software-based faulttolerance through the process-pair technology of the Tandem Nonstop Kernel, and support duplexed hardwarebased fault-tolerance for running standard operating systems such as Unix and Windows NT. In addition, Server- Net can provide reliable communications in clusters of non-fault tolerant workstations or PCs. 2.0 Background Proposed topologies for MPP routing networks include the mesh, ring, torus, star, binary tree, fat tree, hypercube, cube-connected cycles, and shuffle-exchange network. Characteristics of these networks can be found in many MPP references [3,4,5]. A key design problem for many networks is that they contain loops that could give rise to deadlocks. Figure 1 illustrates the way a deadlock can occur in a wormhole-routed network. With wormhole routing, the head of a packet is routed before the tail of the packet arrives at that router. Deadlocks can occur when a set of packets cannot make further progress because of a circular dependency in which each packet must wait for another to proceed before acquiring access to an output link. This deadlock situation can occur in any network with loops in the connection graph. Previous solutions to the deadlock problem were costly in terms of router complexity or communications efficiency. A popular solution to deadlock avoidance was /96 $ IEEE Proceedings of IPPS

2 A C 111 n A Cool I ink Figure 1. Deadlock in a wormhole-routed network. The head of each packet is blocked by the tail of another packet. Circles are routers (packet switches) described by Dally and Seitz in [6]. They propose adding virtual channels to routers, then breaking loops by allowing some messages to pass other packets. This solution requires multiple packet buffers at each router stage, and severely complicates the router design. The cost of the buffers can be quite significant because buffering space may dominate the area of a typical router. Other solutions to the deadlock problem are softwarebased and can impact performance. For instance, some networks detect deadlocks with timeout counters, discard the packets in progress, and re-send the lost packets. This technique cannot be used in system area networks because the lightweight protocol implemented over these networks cannot tolerate out of order delivery of packets. If the entire transfer is retried to avoid out of order delivery, the deadlock recovery time may be unacceptable. Solutions based on retry also make it difficult to distinguish between network congestion and hardware-related intermittent failures requiring maintenance actions. Another technique for avoiding deadlocks is to design the routing algorithm to preclude routing loops. For instance, dimension order muting may be used in a mesh network to avoid routing loops. With dimension-order routing, packets are routed first in one direction, say the X direction, then the Y direction. With this rule applied in Figure 1, routes A and C would be allowed, but routes B and D would be disallowed, thus preventing the deadlock situation. Figure 2 shows a 3-dimensional hypercube with certain paths disallowed in order to break cycles. By designating specific paths to be disabled, the routing algorithm is less restrictive than dimension-order routing. Disables are configured to break the loops on each face of the cube, as well as to break loops with six and eight links. The main problem with this technique is that most arrangements of path disables give uneven link utilization under uniform load. With the disables as 000 (J Figure 2. Breaking deadlocks in a hypercube by disabling paths. Arrows show paths that are not allowed. shown in Figure 2, the upper links are lightly utilized because they are used only to communicate with the top node, while the bottom links are more heavily used because they are used both to communicate with the bottom node as well as for pass-though traffic between other nodes. The link utilization can be made to be uniform if the path disables can be unidirectional. Such an arrangement could be described with twelve single-ended arrows instead of six double ended arrows. The disadvantage of this technique is that most traffic in the network is not reflexive; the path from A to B may be different than the path from B to A. Non-reflexive routing is allowed in ServerNet, but it increases the impact of a link failure. There may be nothing wrong with any of the hardware along the path from A to B, but that path may be unusable due to the inability to send acknowledgments back from B to A. The problem with many network topologies is that we must either choose non-reflexive routing or uneven link utilization. The deadlock problem can be avoided through other network topologies, but these networks may not provide the required bandwidth. Bandwidth in MPP systems is often measured in term of bisection bandwidth, the total traffic that can flow between halves of the system when cut at its weakest point. Tree networks are free of routing loops, but their bisection bandwidth is determined by the bandwidth through the router at the root node. The fat tree 275

3 can improve this situation by replicating routers at higher levels of the tree, but creates the new problem of finding a way to evenly distribute traffic over the parallel links in the fat part of the tree. This problem can cause increased link contention, a subject that is addressed in more detail in section Fully-connected networks of routers The basic building blocks for the new topologies are fully-connected assembly of routers. Figure 3 shows all fully-connected configurations of 6-port routers. (The first generation of ServerNet is implemented with 6-port routers because it offers the best price-performance point given the available pins and gates on the chosen ASIC technology.) M Ports Max link contentio 10 5: : 1 The configurations giving the most ports are the three and four router options given in Figure 3b and 3c. Of these two options, the four router option has less potential link contention; at most three nodes may simultaneously attempt to use any one of the inter-router links. The reduced contention means that this configuration will be less prone to queuing delays than the three router configuration. It is also attractive because routing within this assembly routes packets based on exactly two bits of the destination node identifier. This prevents sparse usage of the node address space and simplifies the routing algorithm. The topology of Figure 3c can be redrawn in three dimensions as a tetrahedron as shown in Figure : 1 Figure 4. Tetrahedral topology with 6-port routers : 1 1: Fractahedral networks Multiple tetrahedrons can be connected together with higher level tetrahedrons to increase the number of connected nodes. This type of structure repeats at higher levels to form networks with similar topologies when viewed from any scale. This self-similar structure of tetrahedrons is called a fractuhedron, for fractal-tetrahedron. Figure 5 shows the self-similar structure of a three level fractahedron. Figure 3. Fully-connected topologies of 6-port routers. 276

4 Figure 5. Three level thin fractahedron Each tetrahedron has routers with ports divided into three sets. One set connects to two lower-level tetrahedrons, another set connects to the other three routers in that tetrahedron, and the last set connects to the next higher level tetrahedron. If there is only one connection between each tetrahedron and the next higher level, this is called a thin fractahedron, and if all routers of one level connect to the next level, it is called a fat fractahedron. The basic topology is not restricted to 6-port routers. Any fully connected set of routers can form a similar snowflake structure. Depending on system implementation, there may be additional router levels between the end nodes (CPUs or peripheral adapters) and the lowest level tetrahedral router. One or two added router levels are t,ypically needed to fan out to the devices associated with one CPU. With one additional router level connecting each pair of CPUs to the level 1 tetrahedron, a 16-CPU syst em may be constructed with a maximum delay between CPUs of four router hops -- two within the tetrahedron, and one each to get to and from the tetrahedron. When extended to 1024 CPUs through a thin fractahedron, the maximum delays is twelve. Note that all thin fractahedrons have a bisection bandwidth fixed at four links. While this is adequate for many applications, there are other applications requiring more bandwidth. Hence it is desirable to scale the bandwidth to meet those demands. 2.3 Fat fraetahedrons In Figure 5, there are unused ports at three of the four comers of each tetrahedron. If the higher-level tetrahedrons are replicated, each copy of the higher level tetrahedron can connect to a different corner of the tetrahedron at the next levlel down. With all four upward ports of each tetrahedron connected to replicated routers, the structure is called a

5 fat fractahedron. In a 128 CPU network, at level 2 there are four independent layers of tetrahedron, each connecting to a different corner of the level 1 tetrahedrons. In three dimensions, level 2 is conceptually four tetrahedral layers nested inside each other, but not connected to each other. In two dimensions, it can be envisioned as papers stacked up with a router on each sheet. Each comer of the 4-layer tetrahedron has a pair of four-conductor cables connected to the four routers in the stack. Each of these cables connects to the four comers of a different level 1 tetrahedron. There is also a 16-conductor cable that connects to all four routers at each of the four corners. This cable then connects to a corner of the level 3, 16-layer tetrahedron, and so on. Routing in multilayer networks is done depth-first by examining address bits from high-order to low order. At any level, if there is no match in the address bits above those controlling that level s tetrahedron, then the packet is sent to the next higher level. In networks with all layers implemented, this ascent up the tree takes only one router delay per level. In effect, packets always go straight up the tree without taking any inter-tetrahedral links. Those links are used only on the way down to get to the correct destination. Each tetrahedron encountered matches three more bits of the address, and can take one or two router delays (one if the layer was already correct, two if a tetrahedron delay must be taken to get to the correct layer). In the case of ServerNet, these matches are actually done by looking up entries in the routing table inside each router. In a 1024 CPU system with 3 levels (and layers), worst case delay is 10 router delays (4 on the way up, 6 on the way down), a reduction of two compared to the thin fractahedron. Table 1 gives a summary of the characteristics of fractahedrons. The delay equations do not include any additional delays added between an end node and the first level tetrahedron. In this table and throughout this document, we reserve the upward connections from the top level for future expansion to avoid the need to remove existing connections as a system is expanded. In other words, more nodes could be supported in N levels if we knew there would never be a need for the N+l level. TABLE N-level2-3-1 fractahedral parameters Parameter Maximum Nodes Bisection BW Thin Fat 2*SN 2*8N Maximum delays I 4N-2 hops I 3N-1 hops 4 Links I 4NLinks 2.4 Deadlock prevention In the fat fractahedron, the addition of multiple layers has also introduced potential routing loops. However the preceding routing algorithm eliminates these loops and avoids possible deadlocks. Conceptually, there are multiple upward and downward paths from one node to another, and use of all possible paths would result in deadlock. But the routing algorithm always takes a local inter-level link rather than going through a neighboring inter-level link. This algorithm eliminates possible loops in a way similar to dimension-order routing in a hypercube. The ServerNet routers also have path disable logic that can be set to enforce the elimination of the loops, even if the routing table is corrupted by a fault. 3.0 Comparison to other topologies Given a specific router whose design has been driven by technology constraints, it is useful to examine different ways to connect systems with those routers. In the case of ServerNet, this means finding the best way to build systems with 6-port routers. In this section, we contrast ways of forming a 64-node network with different configurations of 6-port routers. In commercial applications, it is not possible to know the data access patterns a priori, making static load balancing impossible. For instance, for a given database query, we may have an arbitrary set of four CPU nodes trying to communicate with an arbitrary set of four disk controller nodes over an extended period of time. The ability of a network to handle load imbalances is a key factor in application performance, and is discussed for each different topology. Initially, we just use the maximum link contention as a measure of the ability to handle load imbalance. Further studies will use simulation to better determine the effects of contention DMesh To implement a 2-D mesh with a 6-port router, four ports are devoted to the four directions, leaving the last two ports available to connect to the nodes. Connecting 64-nodes requires a 6x6 mesh. Maximum latency for this network is 11 router hops for transfers between opposite corners. The router delays scale quickly as the number of nodes grows. A 128 node network would need an 8x8 mesh with a maximum of 15 router hops, while a 1024 node network requires a 23x23 mesh and 45 hops. Another drawback of the mesh is the worst case link contention. If we assume dimension-order routing to break deadlocks, the worst case contention is along the 278

6 same path as the longest latency. If we label the columns A-F and the rows 1-6, the worst case contention comes from simultaneous transfers from A1 -F6, A2-E6, A3-D6, A4-C6, and A5-B6. All five of these transfers need to turn the same corner at A6. With two nodes at each router, a total of ten transfers may simultaneously try to share the A6 links, giving a 1O:l contention ratio. 3.2 Hypercubes A 64-node (6-D) hypercube requires a 7-port router; six for the hypercube and one for the node connection. With 6-port routers, it would be necessary to use a lower dimension hypercube with some other structure to increase the number of connected nodes. Even if a satisfactory structure could be found, it would be necessary to restrict the allowable paths to avoid deadlocks. This path restriction would give uneven link utilization and high contention, as described previously. Another drawback of the hypercube is that the bandwidth between nodes is fixed. There is no easy way to trade performance for cost to give a range of price-performance points. 3.3 Trees/Fat Trees Trees and fat trees come the closest to meeting the requirements for large commercial systems. Tirees are deadlock-free, can be expanded independent of the number of router ports, and can be scaled in performlance by moving from a simple tree to a fat tree. Figure 6 is a diagram of a 64-node fat tree.!,, I 7' 1... to other,layeys Figure node 4-2 fat tree implemented with 6-port routers. With a 6-port router, the six ports can be partitioned into groups of 3-3 or 4-2. The 3-3 partitioning has no bandwidth reduction toward the root, but is more expensive than the 4-2 partitioning. In most networks, we anticipate some degree of locality in the data access patterns. For instance, each processor in a cluster would typically have a high degree of local access to reach its system disk, and to reach one of a collection of equivalent resources (such as communications lines). For this reason, the 4-2 fat tree may be preferred for most systems even though there is some bandwidth reduction at each level. The bisection bandwidth scales as the networks grows, but not at the same rate. For 64-nodes, the bisection bandwidth is 4 links. In the 64-node fat tree, there are many equivalent paths through the second level routers, and there must be a policy for deciding which path to use. For instance, in routing a packet from node 0 to node 63, any one of the four links to the top level could be traversed. The first temptation might be to dynamically select a non-busy link. However, if sequential packets can take different paths to the same destination, earlier packets might encounter more contention upstream, causing them to be delivered out of order. The guarantee of inorder delivery of packets is key to eliminating software protocol overhead in ServerNet. A typical need for inorder delivery is in the delivery of an I/O interrupt packet that must follow the data transfer from a controller. The interrupt packet cannot be allowed to pass the data on the way to the CPU. To maintain in-order delivery, there must be a fixed path between each a pair of nodes. Figure 6 shows one arbitrary partj tioning of the outbound traffic from nodes 0-16 through routers A-D. Links to the highest level are labeled EIM, FJN, GKO, and HLP to show which link is used for each destination. This partitioning gives even link utilization in the case of uniform traffic, but can have very bad contention in some situations. For instance, assume that nodes want to send data to nodes All twelve transfers will contend for the single link HLP, for a 12:1 contention ratio. Other static partitionings of traffic through the high-level links can do no better than the 12: 1 contention ratio. 3.4 Fat tred fat fractahedron comparison Figure 7 shows 64-nodes connected through a fat fractahedron. The network has been drawn in the style of a fat tree to more clearly show the comparison. In this network, the worst case link contention is for the links within the second level tetrahedrons. For instance, if nodes 6,7, 14,and 15 are all trying to send to nodes 54, 55, 62, and 63, all four transfers will attempt to use the same diagonal link in the same layer of level 2. While this network has the same bisection bandwidth as the 4-2 fat tree, it spreads traffic more evenly through the inter-level links. The worst case contention is just 4:1, for a major improvement over the 12: 1 contention in the fat tree. The cost of the contention reduction is an increase in the number of routers from 28 to

7 ~ 1 A 3-3 fat tree could improve bisection bandwidth, but at great cost in routers and router hops. For 64 nodes, a 3-3 fat tree would require 100 routers, and transfers would take an average of 5.9 router hops. 4.0 Conclusions Maximum link contention 0.0 Figure node fat fractahedron drawn in the style of a fat tree. In the fractahedron, the router delay grows in smaller increments than in the fat tree (which always has an odd number of router hops). Table 2 contrasts the number of levels required to reach a number of other nodes for the two topologies. The average number of hops for the fractahedron is slightly less: 4.3 versus 4.4 for the fat tree. TABLE 2.64-node comparison Attribute Average hops 4-2 Fat Tree Fat Fractahedron 12:l 4: 1 1 Routers I 28 I 48 I This paper has introduced a new family of topologies for massively parallel systems. The fractahedral topologies have been designed to eliminate loops and to reduce link contention compared to existing MPP topologies. The topology scales to any number of nodes, and allows for tradeoffs between cost and performance. The current focus is on tetrahedral ensembles of 6-port ServerNet routers, but the concepts easily generalize to other fully connected groups of N-port routers. Future work will center on simulations of large topologies in order to better understand network performance under heavy loading. As large ServerNet-based systems are deployed, we will begin to characterize the workloads and will measure network performance in real customer environments. 5.0 References R. W. Horst, TNet: A Reliable System Area Network, IEEE Micro, Vol. 15, No. 1, pp , February W. E. Baker, R. W. Horst, D. P. Sonnier, W. J. Watson, A Flexible ServerNet-based Fault-Tolerant Architecture, in Proc. 25th Int. Symp. Fault-Tolerant Computing, Pasadena, CA, June G. Almasi, A. Gottlieb, Highly Parallel Computing, BenjamidCummings Publishing Co., D. Reed, R. Fujimoto, Multicomputer Networks: Message-Based Parallel Processing, MIT Press, C. Leiserson, Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing, IEEE Trans. Computers, Vol. C-34, No. 10, pp , Oct W. J. Dally, C. L. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Trans. Computers, Vol. C-36, No. 5, pp , May ServerNet, Tandem, and Nonstop are trademarks of Tandem Computers Incorporated. 280

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms. Alexandre David Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2 Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Interconnection Network

Interconnection Network Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

CSC630/CSC730: Parallel Computing

CSC630/CSC730: Parallel Computing CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

EE382 Processor Design. Illinois

EE382 Processor Design. Illinois EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors Part II EE 382 Processor Design Winter 98/99 Michael Flynn 1 Illinois EE 382 Processor Design Winter 98/99 Michael Flynn 2 1 Write-invalidate

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Different network topologies

Different network topologies Network Topology Network topology is the arrangement of the various elements of a communication network. It is the topological structure of a network and may be depicted physically or logically. Physical

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept.

Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Advanced Computer Architecture (0630561) Lecture 17 Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. INs Taxonomy: An IN could be either static or dynamic. Connections in a

More information

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

CMSC 611: Advanced. Interconnection Networks

CMSC 611: Advanced. Interconnection Networks CMSC 611: Advanced Computer Architecture Interconnection Networks Interconnection Networks Massively parallel processor networks (MPP) Thousands of nodes Short distance (

More information

CS575 Parallel Processing

CS575 Parallel Processing CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors Govindan Ravindran Newbridge Networks Corporation Kanata, ON K2K 2E6, Canada gravindr@newbridge.com Michael

More information

Fundamentals of Networking Types of Topologies

Fundamentals of Networking Types of Topologies Fundamentals of Networking Types of Topologies Kuldeep Sonar 1 Bus Topology Bus topology is a network type in which every computer and network device is connected to single cable. When it has exactly two

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Chapter 3 : Topology basics

Chapter 3 : Topology basics 1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement

More information

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2) Chapter 2: Concepts and Architectures Computer System Architectures Disk(s) CPU I/O Memory Traditional Computer Architecture Flynn, 1966+1972 classification of computer systems in terms of instruction

More information

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Network Topologies John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14 28 February 2017 Topics for Today Taxonomy Metrics

More information

Lecture 23 Database System Architectures

Lecture 23 Database System Architectures CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Multiprocessor Interconnection Networks- Part Three

Multiprocessor Interconnection Networks- Part Three Babylon University College of Information Technology Software Department Multiprocessor Interconnection Networks- Part Three By The k-ary n-cube Networks The k-ary n-cube network is a radix k cube with

More information

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Dept. Electrical & Computer Eng. Univ. of California, Santa Barbara Parallel Computer Architecture

More information

Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov

Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov I Dynamic Buffer Organization Methods for Interconnection Network Switches Amit Kumar Gupta, Francois Labonte, Paul Wang Lee, Alex Solomatnikov I. INTRODUCTION nterconnection networks originated from the

More information

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2) Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ P. López, J. Flich and J. Duato Dept. of Computing Engineering (DISCA) Universidad Politécnica de Valencia, Valencia, Spain plopez@gap.upv.es

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1 EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

Basic Switch Organization

Basic Switch Organization NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Computing Surface. Communications Network Overview. mei<o. SlOO2-10MIOS.OS

Computing Surface. Communications Network Overview. mei<o. SlOO2-10MIOS.OS Computing Surface Communications Network Overview SlOO2-10MIOS.OS mei

More information

Switched Network Latency Problems Solved

Switched Network Latency Problems Solved 1 Switched Network Latency Problems Solved A Lightfleet Whitepaper by the Lightfleet Technical Staff Overview The biggest limiter to network performance is the control plane the array of processors and

More information

Adaptive Multimodule Routers

Adaptive Multimodule Routers daptive Multimodule Routers Rajendra V Boppana Computer Science Division The Univ of Texas at San ntonio San ntonio, TX 78249-0667 boppana@csutsaedu Suresh Chalasani ECE Department University of Wisconsin-Madison

More information

Routing Protocols in MANETs

Routing Protocols in MANETs Chapter 4 Routing Protocols in MANETs 4.1 Introduction The main aim of any Ad Hoc network routing protocol is to meet the challenges of the dynamically changing topology and establish a correct and an

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Client Server & Distributed System. A Basic Introduction

Client Server & Distributed System. A Basic Introduction Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Characteristics of Mult l ip i ro r ce c ssors r

Characteristics of Mult l ip i ro r ce c ssors r Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information