Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201

Size: px
Start display at page:

Download "Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201"

Transcription

1 Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201 Yoshiko Yasuda, Hiroaki Fujii, Hideya Akashi, Yasuhiro Inagami, Teruo Tanaka*, Junji Nakagoshi*, Hideo Wada* and Tsutomu Sumimoto* Central Research Laboratory, Hitachi, Ltd , Higashi-koigakubo, Kokubunji, Tokyo 185, Japan. Tel : , Fax : {yoshikoy, fujii, akashi, inagami}@crl.hitachi.co.jp *General Purpose Computer Division, Hitachi, Ltd. 1, Horiyamashita, Hadano, Kanagawa , Japan. Abstract We have developed a hardware detour path selection facility for the Hitachi SR2201 parallel computer, which uses a multi-dimensional crossbar as an inter-processor network to ensure operating efficiency and high reliability when a part of the network is y. When this hardware facility is used, packets are transmitted to their destination along alternative paths to avoid the. However, changing the routing may cause deadlock. This paper describes a deadlock-free -tolerant routing scheme that can be used by the detour path selection facility to avoid deadlock, and its implementation for the SR Introduction In recent years, parallel computer systems with distributed memory [1-4] have dominated the quest for high performance computing. Generally, these systems consist of a number of processing elements (s) and an inter-processor network composed of a combination of switches. In these machines, the number of switches composing the network is proportional to the scale of the system. Similarly, the rate of s in the network increases with scale. Thus, to maintain high reliability while the system is operational it is very important to avoid any s in the network. The IBM SP2 parallel computer, which has a bidirectional multistage interconnection network [2] that provides redundancy, ensures high reliability by setting detour routes from a source node to any destination in the routing table of each and by changing the switching technique when a part of the network is y. However, in this system, if even one switch is y, all data transmission must be controlled by the software. This may decrease the efficiency of data transmission. The CRAY T3D parallel system has a three-dimensional torus network [3-4]. In this system, data transmission is controlled by the routing tag look-up table, which contains the routing information each node uses to create the routing tag in the header of the packet. When a part of the network is y, the routing information in the look-up table of each node is rewritten so that no packet would pass the y point. By this table, all packets are transmitted to their destinations. The Hitachi SR2201 parallel computer system [5], which has a multi-dimensional crossbar network [6-8] made by combining common crossbar switches (i.e., switches that provide direct connections from any input port to any output port) in a multi-dimensional arrangement and uses a cutthrough routing switching technique [9-10] to transmit packets with low latency and high throughput, supports a hardware detour path selection facility that ensures operating efficiency and high reliability when a part of the network is y. When developing this facility, we wanted to minimize additional hardware and maintain the data transmission efficiency of the system while not affecting the user program. In the SR2201's -tolerant facility, to minimize the additional hardware, each switch has only the information of the switches that they are physically connected to. To avoid affecting the user program, packets add and delete the detour information automatically when passing along the detour paths. Furthermore, to fully utilize the system's data transmission ability, packets are transmitted around the y point without changing the switching technique. In such a case, though, the routing technique must be changed. However, changing the routing is problematic as deadlock may occur [11-18]. Thus, we propose a deadlock-free tolerant routing scheme suitable for the network topology and data transmission of the SR2201. In this paper, we begin by briefly describing the Hitachi SR2201 parallel computer in Section 2. In Section 3, we describe the inter-processor network of the SR2201: the multi-dimensional crossbar network and the data transmission that our point-to-point communication and broadcast communication utilizes by cut-through routing technique and dimension-order routing in the SR2201. In Section 4, we describe the hardware -tolerant facility and the routing scheme in the multi-dimensional crossbar network. Finally, we discuss the deadlock problem of the hardware -tolerant facility when point-to-point communication and broadcast communication using data transmission of

2 the SR2201 occur at the same time and demonstrate our deadlock-free -tolerant routing scheme. 2. Hitachi SR2201 Parallel Computer The structure of the SR2201 is shown in Fig. 1. The SR2201 connects up to 2048 processing elements (s) that operate independently with a inter-processor network. Each consists of a 150-MHz RISC microprocessor [22] based on the PA-RISC 1.1 architecture that provides a peak performance of 300 MFLOPS, DRAM local memory up to 1 GB, a storage controller (SC), and a network interface adapter (NIA). The SC is connected to the RISC microprocessor, NIA, and DRAM memory. It controls all data reading and writing from and to the memory. The NIA is connected to the network and it generates packets according to the instructions issued by the microprocessor and controls all data transmission between the network and the local memory. Thus, the network and the microprocessors operate independently. As an inter-processor network, the SR2201 uses a multidimensional crossbar network, that enables data transfer among any of the s at 300 MB/s. connected to the network Figure 1. Structure of the SR Multi-dimensional Crossbar Network in the SR Structure inter-processor network network interface adapter RISC microprocessor storage controller local memory : processing element The SR2201 uses a multi-dimensional (MD) crossbar network as its inter-processor network [6-8]. The definition of the d-dimensional crossbar network is as follows: (a) The number of s (n) is can be factorized as n=n1*n2*n3*...*nd ; where ni is the number of s on the ith-dimension. (b) Each corresponds to a lattice point of a d-dimensional solid, and lattice points in a line are connected by a common crossbar switch that provide direct connections from any input port to any output port (thus, each is connected to d crossbars). (c) Each connects a relay switch (router, ), that connects the with d crossbars. This relay switch is structured as a (d+1)x(d+1) crossbar switch. For the case of d=1, the MD crossbar network is equivalent to a conventional crossbar network, structured as an n x n crossbar switch. When d=log2n (n is a power of two), the routers are connected directly to each other; thus the MD crossbar network is equivalent to a hypercube network. The structure of a two-dimensional (2D) 4x3 crossbar network is shown in Fig. 2. As shown in Fig. 2, the 2D crossbar network puts XBs in a 4x3 2D arrangement and each connects with two crossbars. Y-dimension Crossbar Switch X-dimension Crossbar Switch : router : processing element Figure 2. A 4x3 Two-dimensional Crossbar Network. The MD crossbar network has many notable characteristics. Short communication distances: In the MD crossbar network, any two s connected by the same crossbar switch can communicate in only one hop via the routers. Any two s on a d-dimensional crossbar network can communicate with a maximum of d hops on d crossbars via the routers. For example, in the 2D crossbar network shown in Fig. 2, any two s communicate with a maximum of only two hops on the crossbars of the network. Since an MD crossbar network with only a few dimensions can connect more than a thousand s, the diameter of the network remains sufficiently small. Wide communication channels: In many parallel computers, each has a corresponding router, that is built into the. To make systems with a large number of s possible, the s must be very compact. Thus, in practice, the number of input-output pins of the router, which is approximately the number of ports times the physical channel bandwidth, is physically limited. In large-scale numerical applications, the physical channel bandwidth can be widened to raise the communication throughput. The number of ports needed by a router of an MD crossbar is equal to one plus the number of dimensions. This means that the physical channel bandwidth can be

3 made as wide as that of a mesh-connected network. By comparison, the router of a hypercube network needs log2n+1 ports, which limits the width of the physical channel bandwidth. Few network conflicts: A conventional crossbar network, structured as an n x n crossbar switch, has no conflicts in almost all communication patterns. The MD crossbar network is designed based on the conventional crossbar network. Thus, far fewer network conflicts occur in the MD crossbar network than in mesh-connected or torus networks, and the MD crossbar network also provides shorter transmission times and higher throughput [7]. Conflict-free remapping of other topologies: The high number of interconnections in an MD crossbar network allows many important topologies used in large-scale numerical applications to be efficiently mapped onto it. These topologies include ring, mesh, hypercube, and tree-connected networks. A program that generates no conflicts in these topologies will not generate conflicts when re-mapped onto the MD crossbar Data Transmission Each packet consists of data and a header that contains routing information such as a receiving address and a route change (RC) bit as shown in Fig. 3. The receiving address consists of d coordinates on the d-dimensional crossbar network. In the case of the 2D crossbar network as shown in Fig. 2, the receiving address consists of an X- coordinate and a Y-coordinate. The RC bit is set for changing the routing information. The possible meanings of the RC bit are shown in Fig.4. The receiving address only becomes effective when the RC bit equals 0. When the RC bit does not equal 0, packets are transmitted to destinations by a special routing. route change (RC) bit All packets are transmitted to their destination by using cut-through routing with low latency and high throughput as the switching technique and dimension-order routing as the routing technique. Cut-through routing approach proposed by Dally and Seize [10] has been used in many recent parallel computers. In cut-through routing, a packet is divided into a sequence of fixed-size units of data, called flits. The size of a flit depends on system parameters, in particular the channel bandwidth. The header flit (or flits) of a packet governs the route. Each switch in the network starts forwarding a packet as soon as the header flit is received and the required output port is free. If the header flit encounters a port already in use, it's blocked until the port becomes available. The routing order is set to the network hardware in advance. In this paper, we assume that the dimension-order (normal) routing is X-Y routing, first in the X-dimension, then in the Y-dimension. If a part of the network is y, however, the network hardware can change the routing order Ṫhe SR2201 supports two kinds of communication. One is point-to-point communication, and the other is broadcast. Point-to-point communication packets are transmitted using dimension-order X-Y routing according to the receiving address. However, broadcast packets cannot be transmitted by the same routing used in point-to-point communication. When several broadcast communications start at the same time in the network using cut-through routing, deadlock occurs, because they try to acquire channels already secured by other packets. An example of deadlock involving two broadcast packets on a 2D crossbar network is shown in Fig. 5. In Fig. 5, the crossbar switches are represented by the thin lines. Each thick line describes the flow of either broadcast packet 1 (BC 1) from 3 or broadcast packet 2 (BC 2) from 4. In broadcast communication, packets are first transmitted to all output ports of one of the XBs in the X dimension at the same time, and then transmitted to all output ports of all XBs in the Y dimension at the same time. As shown in Fig. 5, each packet can be transmitted to and (i.e., the XBs in the X-dimension) independently. However, in the Y dimension, both have to reserve all Y dimension crossbars. If each broadcast acquires some of these crossbars as shown in Fig. 5 (i.e., cyclic waiting), then deadlock occurs. receiving address X Y Figure 3. Packet Format. data RC bit meaning 0 normal routing 1 broadcast request routing 2 broadcast routing 3 detour routing Figure 4. Meanings of the RC bit Figure 5. An example of channel deadlock involving two broadcast communications.

4 Generally, to avoid broadcast deadlock when using cutthrough routing, conventional parallel computers limit broadcast communication by either using a separate treeconnected network [1] [19], or by performing the broadcast through the software [20-21]. On the other hand, the SR2201 avoids deadlock by gathering and serializing broadcast packets at a specific crossbar switch (the serialized crossbar, the ), which is one of the crossbars in the MD crossbar network, and then transmitting the packets from the to all s. The routing for this broadcast facility is set by the RC bit of the packets. The routing when broadcast packets are transmitted from 3 and 4 at the same time is shown in Fig. 6, where we assume the to be. Broadcast packets from 3 and 4 are first transmitted to the via 9 and 7 by point-to-point communication according to the RC bit 'broadcast request' at the same time (step 1). When these broadcast requests are transmitted to the, the changes the RC bit from 'broadcast request' to 'broadcast', then transmits the packets one-by-one in order of arrival to all s ( 7, 8, and 9) connected to the (step 2). In this case, we assume that the packet from 3 is transmitted, and the packet from 4 is made to wait in the. Since all packets are transmitted to their destination by using cut-through routing in the SR2201, the output ports of 4, and 7 are used by the packet from 4. The s ( 7, 8 and 9) connected to the transmit the packet from 3 to all Y-XBs (, Y2- XB, and ) and all s ( 7, 8, and 9) connected to the s in dimension-order X-Y routing according to the RC bit 'broadcast' (step 3). Finally, all Y-XBs (,, and ) transmit the packet to all s except the s ( 7, 8 and 9) connected to the, and then transmit the packet to the s connected to the s (step 4). After the broadcast packet from 3 is transmitted to all s, all output ports of the can be used. Then the packet from 4 waiting in the can be step step step 3 step 4 Figure 6. Broadcast Routing on the 2D Crossbar Network.

5 transmitted. Since broadcast packets are transmitted to all output ports of the (step 2) and are not transmitted to the s connected to the after passing the (step 4), this routing prevents deadlock. Since the broadcast packets on the SR2201 are transmitted via the, the broadcast routing becomes Y-X-Y routing, which is different from the X-Y routing of the point-to-point communication. 4. Fault-tolerant Routing in the Multi-dimensional Crossbar Network To ensure high reliability in the parallel processor system, it is very important to be able to continue operating the system even if a part of the network is y. In this section, we describe the hardware detour path selection facility of the multi-dimensional crossbar network which can be used when there is a single y point in the network. The hardware detour path selection facility of the 2D crossbar network is shown in Fig. 7. In Fig. 7, the thin line describes the routing when there is no y switch in the network and the thick line describes the routing when a part of the network () is y. To apply the detour path selection facility through the hardware, in this facility when a switch is y the information of the switches to which it is connected is set in advance. This information has at most a few bits. For example, the s set the information of the XBs that they are connected to and the XBs set the information of the s that they are connected to. If the information is set on any switch, the network hardware of the switch changes the route change (RC) bit of the packet from 'normal routing' to 'detour routing', then transmits the packet to the detour point (the detour XB, the D-XB), which is determined by the network hardware in advance. When the packet is transmitted to the D-XB, the network hardware changes the RC bit from 'detour' to 'normal', and then transmits the packet to its destination in dimension-order X-Y routing. The packet leaves no trace of the detour routing behind. source Figure 7. Hardware Detour Path Selection Facility of the 2D Crossbar Network. destination normal routing detour routing Next we describe the routing of the detour path selection facility. Case 1 : Normal When there is no y switch in the network, no information is set on any switch. Point-to-point communication packets are transmitted in dimension-order X-Y routing according to their receiving address since the RC bit is set to 'normal'. Broadcast packets are transmitted in Y-X- Y routing using the according to the RC bit (we described this routing in Sec. 2). Case 2 : a part of network () is y (a) Point-to-point communication routing First, the transmits the packet to the X-XB via the in dimension-order X-Y routing according to the receiving address. Second, the X-XB sets the RC bit to 'detour' if any which it is connected to is y, then transmits the packet to an that is not the y according to the RC bit. Third, the transmits the packet to the Y-XB according to the RC bit. Forth, the Y-XB transmits the packet to the D-XB according to the RC bit. Finally, the D-XB changes the RC bit from 'detour' to 'normal', then transmits the packet to its destination in the X-Y routing according to the receiving address. (b) Broadcast routing If the connected to the is y, another XB which is not connected to the y substitutes for the S- XB. Then the broadcast routing becomes the same as in the no- case. If an is y, the network hardware stops transmission of packets to the y. Using Fig. 8, we can describe the detour routing in detail D-XB Figure 8. Point-to-point communication routing when a part of the network is y. Point-to-point communication from 1 to 5 when 2 is y is shown in Fig. 8. Since 2 is y, the information of 2 is set in, and the XB used for detouring is set to. If the network has no, 1 would transmit the packet to 5 via because dimension-order X-Y routing

6 is normally used. However, since 2 is y, the packet is transmitted to the destination by using a detour path. That routing is as follows: step 1 : 1 transmits the packet to via 1 in the X-Y routing according to the receiving address. step 2 : set the RC bit to 'detour' and then transmits the packet to a specific (the detour : 3) avoiding except for the y since the information of 2 is set. step 3 : 3 transmits the packet to. step 4 : transmits the packet to (the D-XB) via 6 according to the RC bit. step 5 : The D-XB changes the RC bit from 'detour' to 'normal', then transmits the packet to the destination in X-Y routing according to the receiving address. The main feature of this -tolerant routing is to detour by using a specific XB (the D-XB). Since this limitation does not allow cyclic waiting, it prevents deadlock. 5. Deadlock-free Fault-tolerant Routing Scheme Suitable for Data Transmission of the SR2201 When the detour path selection facility described in the previous section was implemented in the multi-dimensional crossbar network, we had to adjust this facility to the existing data transmission facilities, such as the hardware broadcast facility of the SR2201, that use the (described in Sec. 2). However, if broadcast communication and point-topoint communication occur at the same time and a part of the network is y, the detour routing that we have described allows deadlock to occur. This is because cyclic waiting occurs between point-to-point communication and broadcast routing 5 deadlock D-XB detour routing Figure 9. An example of deadlock involving broadcast routing and detour routing. broadcast communication routes, since non-dimension-order routing is used in both the and the D-XB. We show an example of deadlock in Fig. 9. Figure 9 shows data transmission when broadcast communication from 4 and point-to-point communication from 1 to 5 occur at the same time and one of the s is y. (a) 1 transmits point-to-point communication data to 5 via (the D-XB) - 5. (b) 9 transmits a broadcast packet to, which is the serialized crossbar switch, via 9 by point-topoint communication, and then broadcasts to all Y-XBs. (c) Since the point-to-point communication packet occupies 's output port to 6, the broadcast packet cannot be transmitted while the point-to-point communication uses the output port. On the other hand, the point-topoint communication packet cannot be transmitted while 5's output port to 5 is occupied by the broadcast communication. The broadcast packet cannot be transmitted to its destination until all output ports of,, and are free, and the point-to-point communication packet from 1 cannot be transmitted to 5 until 5's output port to 5 is free. Again, since each packet is transmitted by cut-through routing, cyclic waiting occurs and deadlock results broadcast routing D-XB = detour routing Figure 10. Deadlock-free Fault-tolerant Routing. To resolve this deadlock problems, we propose a method of deadlock-free -tolerant routing scheme. In this routing, the D-XB is set to the same XB as the. Figure 10 shows the case where there is a broadcast communication from 9 and a point-to-point communication from 1 to 6 when 2 is y. (a) 9 transmits the broadcast packet to (which is

7 the ) by point-to-point communication, then broadcasts the packet to the Y-XBs. (b) 1 transmits the packet to 5 via Since the and the D-XB are the same XB, 3 transmits the packet to 9 which is connected to the S- XB () by point-to-point communication. The packet from 1 cannot be transmitted to until the output port of can be used because the broadcast packet occupies all output ports of. This routing allows the packets to avoid deadlock, because both the detour transfer and the broadcast communication are serialized in the (). There is only one non-dimension-order routing in this deadlock-free -tolerant routing even if both point-topoint communication and broadcast communication occur at the same time and a part of the network is y, so there is no cyclic waiting between the two kinds of communication. Thus, this routing prevents deadlock. 6. Conclusion We have described a deadlock-free -tolerant routing method for use in the hardware detour path selection facility in the Hitachi SR2201 parallel processor. This method ensures operating efficiency and high reliability when a part of the network is y. In this facility, the information of the switches connected to a y switch is set in advance. Since each switch has only the information of its neighboring switches, the hardware cost is lower than the cost of adding a redundant network. Each packet is transmitted to its destination according to the information and the route information it contains. To avoid deadlock, the detour point is determined according to the route information of the packet in advance. This detour path selection facility on the SR2201 limits the detour point to a specific crossbar switch, which is part of the multi-dimensional network and is used for serializing broadcast communication. Thus, it avoids deadlock by changing the routing. In our future research, we intend to improve this facility to further increase the system reliability. References [1] C. E. Leiserson et al.: The Network Architecture of the Connection Machine CM-5, Proc. 4th Ann. ACM Symp. Parallel Algorithms and Architectures, SPAA, (1992), [2] B. C. Strunkel et al.: The SP2 High-performance Switch, IBM SYSTEMS JOURNAL, Vol. 34, No. 2, (1995), [3] R. E. Kessler, J. L. Schwarzmeier.: CRAY T3D: A New Dimension for Cray Research, Digest of Papers COMPCON 93, (1993), [4] W. Oed.: The Cray research massively parallel processor system T3D, Tech. Rep., Cray Research Inc., (1993). [5] H. Fujii, Y.Yasuda, H. Akashi, Y. Inagami, M. Koga, O. Ishihara, M. Kashiyama, H. Wada, T. Sumimoto.: Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System, Proc. of IPPS'97, (1997). [6] N. Hamanaka, J. Nakagoshi, T. Tanaka.: Reducing Network Hardware Quantity by Employing Multi-processor Cluster Structure in Distributed Memory Parallel Processors, COMPAR 92/VAPP V, (1992), [7] Y. Yasuda, H. Fujii, T. Tanaka, Y. Inagami.: Performance Evaluation of the Hyper Crossbar Network, TECHNICAL REPO OF IEICE. CPSY.93.25, (1993), , (in Japanese). [8] A. Murata, T. Boku, T. Harada, H. Amano.: Structure and Performance of the MDX (Multi-Dimensional X'bar): A Network Class for Large Scale Multiprocessors, Proceedings of ISCA/ IEEE International Conference on Parallel and Distributed Computing, (1996), [9] Kermani P, Kleinrock L.: Virtual cut-through: a new computer communication switching technique, Computer Networks 3, (1979), [10] J. W. Dally, L. C. Seize.: The Torus Routing Chip, J. Distributed Computing, Vol. 1, No. 3, (1986), [11] D. H. Linder, J. C. Harden.: An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-ary n-cubes, IEEE Trans. on Computers, Vol. 40, No. 1, (1991), [12] C. Su, K. G. Shin.: Adaptive Deadlock-Free Routing in Multicomputers Using Only One Extra Virtual Channel, 1993 Int'l Conference on Parallel Processing, (1993), I [13] S. Chalasani, R. V. Boppana.: Fault-Tolerant Wormhole Routing in Tori, Proc. International Conference on Supercomputing, (1994), [14] J. Duato.: A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks, IEEE Trans. on Parallel and Distributed Systems, Vol. 4, No. 12, (1993), [15] C. J. Glass, L. M. Ni.: The Turn Model for Adaptive Routing, Proc. 19th Int'l Symp. Computer Architecture, (1992), [16] P. T. Gaughan, S. Yalamanchili.: Adaptive Routing Protocols for Hypercube Interconnection Networks, IEEE Computer May, (1993), [17] W. J. Dally, H. Aoki.: Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels, IEEE Trans. on Parallel and Distributed Systems, Vol. 4, No. 4, (1993), [18] R. V. Boppana, S. Chalasani.: A comparison of adaptive wormhole routing algorithms, Proc. 20th Ann. Int'l Symp. Computer Architecture, (1993), [19] T. Shimizu, T. Horie, H. Ishihata.: Low-latency message communication support for the AP1000, Proc. 19th Ann. Int'l Symp. on Computer Architecture, (1992), [20] R. Ponnusamy, R. Thakur, A. Choudhary, G. Fox.: Scheduling Regular and Irregular Communication Patterns on the CM- 5, Proc. of Supercomputing '92, (1992), [21] S. L. Johnsson, C. -T. Ho.: Optimum broadcasting and personalized communication in hypercubes, IEEE Trans. Computers, Vol. 38, No. 9, (1989), [22] K. Saito, M. Hashimoto, H. Sawamoto, R. Yamagata, T. Kumagai, E. Kamada, K. Matsubara, T. Isobe, T. Hotta, T. Nakano, T. Shimizu, K. Nakazawa.: A 150MHz Superscalar RISC Processor with Pseudo Vector Processing Feature, Proc. Notebook for Hot Chips VII, (1995),

Wormhole Routing Techniques for Directly Connected Multicomputer Systems

Wormhole Routing Techniques for Directly Connected Multicomputer Systems Wormhole Routing Techniques for Directly Connected Multicomputer Systems PRASANT MOHAPATRA Iowa State University, Department of Electrical and Computer Engineering, 201 Coover Hall, Iowa State University,

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,

More information

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

Efficient Communication in Metacube: A New Interconnection Network

Efficient Communication in Metacube: A New Interconnection Network International Symposium on Parallel Architectures, Algorithms and Networks, Manila, Philippines, May 22, pp.165 170 Efficient Communication in Metacube: A New Interconnection Network Yamin Li and Shietung

More information

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Tsutomu YOSHINAGA, Hiroyuki HOSOGOSHI, Masahiro SOWA Graduate School of Information Systems, University of Electro-Communications,

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Adaptive Multimodule Routers

Adaptive Multimodule Routers daptive Multimodule Routers Rajendra V Boppana Computer Science Division The Univ of Texas at San ntonio San ntonio, TX 78249-0667 boppana@csutsaedu Suresh Chalasani ECE Department University of Wisconsin-Madison

More information

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh

More information

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1 EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What

More information

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes N.A. Nordbotten 1, M.E. Gómez 2, J. Flich 2, P.López 2, A. Robles 2, T. Skeie 1, O. Lysne 1, and J. Duato 2 1 Simula Research

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Deadlock- and Livelock-Free Routing Protocols for Wave Switching

Deadlock- and Livelock-Free Routing Protocols for Wave Switching Deadlock- and Livelock-Free Routing Protocols for Wave Switching José Duato,PedroLópez Facultad de Informática Universidad Politécnica de Valencia P.O.B. 22012 46071 - Valencia, SPAIN E-mail:jduato@gap.upv.es

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Communication in Multicomputers with Nonconvex Faults

Communication in Multicomputers with Nonconvex Faults Communication in Multicomputers with Nonconvex Faults Suresh Chalasani Rajendra V. Boppana Technical Report : CS-96-12 October 1996 The University of Texas at San Antonio Division of Computer Science San

More information

Communication in Multicomputers with Nonconvex Faults?

Communication in Multicomputers with Nonconvex Faults? In Proceedings of EUROPAR 95 Communication in Multicomputers with Nonconvex Faults? Suresh Chalasani 1 and Rajendra V. Boppana 2 1 Dept. of ECE, University of Wisconsin-Madison, Madison, WI 53706-1691,

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Literature Review: Convey the Data in Massive Parallel Computing

Literature Review: Convey the Data in Massive Parallel Computing Literature Review: Convey the Data in Massive Parallel Computing Mohd Kalamuddin Ahamad Computer Science, Mewar University, Chhittorghara, India Mohd Husain Department of CS, APJ Tech University Lucknow,

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

Generic Methodologies for Deadlock-Free Routing

Generic Methodologies for Deadlock-Free Routing Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University

More information

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

2 Keywords: fast Fourier transform, distributed-memory, vector processor, parallel algorithm, Stockham's algorithm, cyclic distribution, block cyclic

2 Keywords: fast Fourier transform, distributed-memory, vector processor, parallel algorithm, Stockham's algorithm, cyclic distribution, block cyclic Chapter 1 VECTOR-PARALLEL ALGORITHMS FOR 1-DIMENSIONAL FAST FOURIER TRANSFORM Yusaku Yamamoto Dept. of Computational Science and Engineering Nagoya University yamamoto@na.cse.nagoya-u.ac.jp Hiroki Kawamura

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2 Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99

More information

Rajendra V. Boppana. Computer Science Division. for example, [23, 25] and the references therein) exploit the

Rajendra V. Boppana. Computer Science Division. for example, [23, 25] and the references therein) exploit the Fault-Tolerance with Multimodule Routers Suresh Chalasani ECE Department University of Wisconsin Madison, WI 53706-1691 suresh@ece.wisc.edu Rajendra V. Boppana Computer Science Division The Univ. of Texas

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

The Cray T3E Network:

The Cray T3E Network: The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus Steven L. Scott and Gregory M. Thorson Cray Research, Inc. {sls,gmt}@cray.com Abstract This paper describes the interconnection network

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

The Cray XT4 and Seastar 3-D Torus Interconnect

The Cray XT4 and Seastar 3-D Torus Interconnect The Cray XT4 and Seastar 3-D Torus Interconnect April 6, 2010 BYLINE Dennis Abts dabts@google.com Google Inc. Madison, WI USA dabts@google.com SYNONYMS Cray Red Storm, Cray XT3, Cray XT4, Cray XT5, Cray

More information

The Impact of Optics on HPC System Interconnects

The Impact of Optics on HPC System Interconnects The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes

More information

On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes

On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 E-mail: jie@cse.fau.edu

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE 3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics

More information

A New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia

A New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia A New Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks J. Duato Facultad de Informatica Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia, SPAIN E-mail: jduato@aii.upv.es

More information

Escape Path based Irregular Network-on-chip Simulation Framework

Escape Path based Irregular Network-on-chip Simulation Framework Escape Path based Irregular Network-on-chip Simulation Framework Naveen Choudhary College of technology and Engineering MPUAT Udaipur, India M. S. Gaur Malaviya National Institute of Technology Jaipur,

More information

All-port Total Exchange in Cartesian Product Networks

All-port Total Exchange in Cartesian Product Networks All-port Total Exchange in Cartesian Product Networks Vassilios V. Dimakopoulos Dept. of Computer Science, University of Ioannina P.O. Box 1186, GR-45110 Ioannina, Greece. Tel: +30-26510-98809, Fax: +30-26510-98890,

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Input Buffering (IB): Message data is received into the input buffer.

Input Buffering (IB): Message data is received into the input buffer. TITLE Switching Techniques BYLINE Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA. 30332 sudha@ece.gatech.edu SYNONYMS Flow Control DEFITION

More information

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA) Network Properties, Scalability and Requirements For Parallel Processing Scalable Parallel Performance: Continue to achieve good parallel performance "speedup"as the sizes of the system/problem are increased.

More information

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Anjan K. V. Timothy Mark Pinkston José Duato Pyramid Technology Corp. Electrical Engg. - Systems Dept.

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA) Network Properties, Scalability and Requirements For Parallel Processing Scalable Parallel Performance: Continue to achieve good parallel performance "speedup"as the sizes of the system/problem are increased.

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing

Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Fabrizio Petrini Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD, England e-mail: fabp@comlab.ox.ac.uk

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Routing, Network Embedding John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14-15 4,11 October 2018 Topics for Today

More information

MESH-CONNECTED networks have been widely used in

MESH-CONNECTED networks have been widely used in 620 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 5, MAY 2009 Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model Dong Xiang, Senior Member, IEEE, Yueli Zhang,

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 21 Routing Outline Routing Switch Design Flow Control Case Studies Routing Routing algorithm determines which of the possible paths are used as routes how

More information

An Empirical Comparison of Area-Universal and Other Parallel Computing Networks

An Empirical Comparison of Area-Universal and Other Parallel Computing Networks Loyola University Chicago Loyola ecommons Computer Science: Faculty Publications and Other Works Faculty Publications 9-1996 An Empirical Comparison of Area-Universal and Other Parallel Computing Networks

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Adaptive Routing in Hexagonal Torus Interconnection Networks

Adaptive Routing in Hexagonal Torus Interconnection Networks Adaptive Routing in Hexagonal Torus Interconnection Networks Arash Shamaei and Bella Bose School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR 97331 5501 Email: {shamaei,bose}@eecs.oregonstate.edu

More information

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES 1 Jaya R. Surywanshi, 2 Dr. Dinesh V. Padole 1,2 Department of Electronics Engineering, G. H. Raisoni College of Engineering, Nagpur

More information

On Constructing the Minimum Orthogonal Convex Polygon for the Fault-Tolerant Routing in 2-D Faulty Meshes 1

On Constructing the Minimum Orthogonal Convex Polygon for the Fault-Tolerant Routing in 2-D Faulty Meshes 1 On Constructing the Minimum Orthogonal Convex Polygon for the Fault-Tolerant Routing in 2-D Faulty Meshes 1 Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton,

More information

The final publication is available at

The final publication is available at Document downloaded from: http://hdl.handle.net/10251/82062 This paper must be cited as: Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The

More information

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Multiprocessor Interconnection Networks

Multiprocessor Interconnection Networks Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 19, 1998 Topics Network design space Contention Active messages Networks Design Options: Topology Routing Direct vs. Indirect Physical

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

The Recursive Dual-net and its Applications

The Recursive Dual-net and its Applications The Recursive Dual-net and its Applications Yamin Li 1, Shietung Peng 1, and Wanming Chu 2 1 Department of Computer Science Hosei University Tokyo 184-8584 Japan {yamin, speng}@k.hosei.ac.jp 2 Department

More information

Homework Assignment #1: Topology Kelly Shaw

Homework Assignment #1: Topology Kelly Shaw EE482 Advanced Computer Organization Spring 2001 Professor W. J. Dally Homework Assignment #1: Topology Kelly Shaw As we have not discussed routing or flow control yet, throughout this problem set assume

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

The Odd-Even Turn Model for Adaptive Routing

The Odd-Even Turn Model for Adaptive Routing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 7, JULY 2000 729 The Odd-Even Turn Model for Adaptive Routing Ge-Ming Chiu, Member, IEEE Computer Society AbstractÐThis paper presents

More information

Understanding the Routing Requirements for FPGA Array Computing Platform. Hayden So EE228a Project Presentation Dec 2 nd, 2003

Understanding the Routing Requirements for FPGA Array Computing Platform. Hayden So EE228a Project Presentation Dec 2 nd, 2003 Understanding the Routing Requirements for FPGA Array Computing Platform Hayden So EE228a Project Presentation Dec 2 nd, 2003 What is FPGA Array Computing? Aka: Reconfigurable Computing Aka: Spatial computing,

More information