A Hybrid Interconnection Network for Integrated Communication Services
|
|
- Elijah Clark
- 5 years ago
- Views:
Transcription
1 A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 Jyh-Charn Liu Department of Computer Science, Texas A&M Univ. College Station, TX Abstract This paper presents a interconnection network architecture to support integrated communication services for multicomputer-based database and multimedia systems. Our study shows that existing wormhole routing networks are inefficient in transfer of long files. We demonstrate the feasibility of integrating different network techniques based on virtual channels and flexible routing mechanisms. 1. Introduction Parallel computing systems based on high performance interconnection communication networks are being used in on-line multimedia and database applications. In a video server, for example, a large number of disks can be connected through an interconnection network to telecommunication ports which are linked to customers. A commercial example of such systems is the Oracle s Media Server on the ncube supercomputer [1]. Different from the conventional scientific computing applications, such systems are communication and data intensive, and require efficient transmission of messages of heterogeneous types to support integrated service types. Packet-switching, circuit-switching and virtual cutthrough are major communication switching mechanisms for interconnection networks [2, 3]. Packet-switching transfers packets in a store-and-forward manner so that the network latency is proportional to the distance of the source and destination nodes. In circuit switching, a path from the source to the destination needs to be initially established and data needs not be stored at the intermediate nodes during the transmission. Virtual cut-through improves packet switching by not buffering messages that are able to proceed immediately on the next channel. Among these techniques, the wormhole routing, a special case of virtual cut-through, is most widely adopted for current interconnection networks. In wormhole routing, a message is broken into small flits, and the data flits are transmitted in a pipelined fashion after its header flit reaches the destination. Messages can be routed either deterministically or dynamically. A deterministic routing algorithm routes messages along fixed paths which are independent of current network conditions. Adaptive routing algorithms allow alternative paths to be used in message routing, but may need additional resources to preserve deadlock and livelock freedom. A common approach to avoid deadlock and to increase sharing of network resources is to use virtual channels [4, 5, 6]. Virtual channels are time-multiplexed over physical channels with bandwidth allocated to each virtual channel as needed. The self-routed, wormhole-switched networks are commonly used in number-crunching applications to handle short messages. Such networks, however, may suffer performance degradation in transfer of long files [3]. Worse yet, delay-sensitive short messages may suffer from the starvation problem and long messages may not be evenly routed over all available paths when these messages share the network. Although adaptive routing schemes can reduce the network delay when most messages are short, the performance of these algorithms in a heterogeneous message environment has not been fully discussed. The circuit switched mechanism is generally most effective for long messages, but may become quite inefficient in transfer of short messages due to the overhead of establishing routing paths. In this paper, we present a interconnection network architecture to support heterogeneous messages for integrated communication services. For simplicity, we assume that a service mainly requires either short or long messages. Our scheme integrates self-routed wormhole routing and circuit switched techniques based on virtual channels. We divide virtual channels into the short-message and longmessage channels to serve messages of different types. The short-message channels are used to support interactive data users and system management functions, while the longmessage channels are designed especially for transfer of bulk files such as video and image files. Each directed physical transmission link can have one or more short-message and long-message channels which are operated in a time 1
2 sharing basis. The short messages are routed deterministically for low overhead, while the routing paths of long messages are determined globally through exchange of control messages to optimize the distribution of network traffic. We compare our scheme with a representative deterministic routing scheme, the [7], and a routing scheme called star-channel [5] in a hypercube network. The scheme is widely used in commercial systems, and the star-channel is shown to have the best performance among the existing adaptive routing schemes. Simulation results indicate that the existing wormhole routing mechanism is not best suited to networks with heterogeneous messages, and the proposed architecture can effectively and efficiently transmit both short and long messages. 2. The System Architecture We assume that each node has a local processor, a router for communication, and a dedicated physical communication link on each direction. Each physical link is timemultiplexed between a short-message channel and a longmessage channel. Messages shorter than a threshold L are routed based on a deterministic, self-routed mechanism such as, while messages longer than L basedona pipelined circuit switching mechanism. (The idea of using pipelined circuit switching can also be found in [8] for faulttolerant routing.) Messages are assumed to be divided into fixed-length flits for transmission, as defined in wormhole routing. A short message may consist of a header flit, additional address flits, if necessary, and data flits, and its routing is controlled by the header flit. A long message consists of only data flits but needs one or more control messages (short messages) to establish the routing path. A global channel allocation algorithm, which is implemented by a control message exchange protocol, is used to optimize the routing paths for long messages. We use the pipelined routing mechanism to illustrate our model, where pipelining of the flits of a message is done asynchronously using low-level handshaking signals [3]. In this architecture, each of short- and long-message channels has its own flit buffer, routing control mechanism, and data paths to the physical links. The bandwidth of a physical link is dynamically shared by its short-message and long-message channels such that one channel can use the full bandwidth when the other is idle. At each router, time slots are switched between the transmission types of short-message and long-message. The sending side multiplexes data from the short- and long-message buffers over the physical link. Only the channel whose output buffer is not empty and whose input buffer is not full may use the physical link. The receiving side is responsible for buffering the received message to the corresponding message buffer based on the message type. The decision on which message will be transmitted in the current time slot is based on the current transmission type and the buffer states on both the sending and receiving sides. 3. Global Message Routing In this section, we discuss a global routing scheme to establish the shortest paths for long messages based on the depth-first search method. To avoid deadlock and livelock of the control messages, a partially ordered relation of nodes has to be defined, which serves as the basis for routing of control messages and determination of routing paths [7, 9]. In this relation, some neighbors of a node are called its ancestors and descendants, such that the control messages for probing routing paths can only be routed from ancestor to descendant. We define the partially ordering relation based on the notion of broadcast addresses which are relative addresses with respect to a source-destination pair. In an n-dimensional hypercube network, for example, a node N x is represented by its n-bit binary address B x.let N s and N d denote the source and the destination nodes of a long message, and assume that B s and B d differ at k bit positions (dimensions), fl 1 ;l 2 ;;l k g. Since a shortest path from N s to N d consists of channels only on these dimensions, N x is in the ancestor-descendant relationship related to N s and N d, if and only if B x and B s (or B d ) are same at all bit positions except for those in fl 1 ;l 2 ;;l k g. Let B sd x denote the broadcast address of N x,thenb sd = B x s B x,where is the XOR operation. Let N x and N y be two adjacent nodes whose addresses differ at bit position l,thenn x is an ancestor of N y with respect to N s and N d,if(1)bothn x and N y are in the ancestor-descendant relationship related to N s and N d, and (2) the lth bit value of B sd x is less than that of B sd. y To send a long message, the source node first initiates a probing control message to establish a routing path. Probing omessages are routed based on the ancestor-descendant relation which can be identified based on the sourcedestination information stored in the messages. When a node receives a probing message from one of its ancestors, it can only send the message to one of its descendants based the depth-first rule. To keep track of the paths that have been probed, each intermediate node maintains two variables, d in and d out, which store the dimensions the probing message is received from and sent to, respectively. d out is updated whenever a probing message is sent out, and the idle long-message channel at the smallest dimension which is greater than d out is chosen for the next channel to route the probing message. There is no path available through this node, if no such long-message channel is available. Then backtracking occurs after the node sends an unsuccess acknowledging control message along d in. This simple depth- 2
3 first rule guarantees the deadlock free. A long-message path is found when the destination receives a probing control message. The destination then sends a success acknowledging message (short message) back to the source along the path traversed by the probing control message, to indicate that a routing path is found. Each intermediate node will allocate the longmessage channel after it receives the acknowledging message, and the path is completely allocated when the source receives the acknowledging message. For simplicity, concurrent requests competing the same long-message channels are assumed to be resolved by a FCFS discipline, and the allocation is aborted if the acknowledging message fails in the competition. That is, if the node finds that the longmessage channel requested by a message has already been allocated to another message, it will stop relaying the acknowledging message but generate an aborting message to abort the channel allocation related to the message. The back-off strategies used in the CSMA/CD protocol may be applicable here. To avoid the starvation problem, messages can be assigned higher priority if they have been blocked for a long time, so that an acknowledging message with lower priority cannot allocate the long-message channel which is also requested by a higher priority message. To avoid excessive contention between these messages, we can restrict the maximum number of the pending long messages in a node. Neither deadlock nor livelock can happen in the proposed scheme because short messages are routed deterministically and a waiting long message does not hold any channels untill all channels on the routing path are allocated. The livelock of control messages cannot happen since dimensions are tried at each intermediate node in a fixed order. Since only one control message is in transmission for each long message at a time, only a small number of control messages are needed if network contention is moderate. 4. Performance Evaluation and Discussion In this section, we compare the scheme with the e- cube routing and the star-channel algorithm through simulation study. The is a dimension-ordereddeterministic algorithm in which the dimensions that a message needs to correct to reach its destination have to be chosen in an increasing or decreasing order. The star-channel scheme needs four virtual channels per bidirectional link for hypercubes. In this scheme, the two virtual channels in one directed link are assigned to be the star and nonstar channels. The header of a message can use nonstar channels arbitrarily but can use only the star channel whose dimension is the most significant of dimensions that the message has to correct. To be fair in comparison, we implement these three schemes using four virtual channels. The classic e- cube implementation only uses two virtual channels, but it has been shown that using an extra pair of virtual channels can greatly increase throughput of [4]. We simulate the time-step operations at the unit (flit) level in a 1-dimensional hypercube. The network performance is evaluated by the average communication latency of messages, which is defined as the average elapsed time after the messages are injected into the network at their source nodes until the whole messages reache their destinations. Message latency is measured in terms of link cycles, where during each link cycle a unit of a message can be sent over a unidirectional link. We assume that long and short messages can be dynamically generated at any node, following a Possion distribution with an average generation rate of l and s,respectively. The lengths of these two types of messages are normally distributed with an average of L l and L s units, respectively, and that of control messages are assumed to be 5-unit. We use the offered link utilization, U, to describe the system workload. Since the total network traffic is (2 n l h l L l +2 n s h s L s ),wherenis the hypercube dimension, and h l and h s are the average transmission distances of long and short messages, respectively, and the total number of links is n2 n, we compute U as (2 n l h l L l +2 n s h s L s ). Let s = K l,then s can be n2 n KUn2 n described as s = (2 n h l L l + K2 n h s l. s ) The effect of the message length on network performance is the major concern of this simulation study. Two traffic patterns are used to describe different traffic characteristics. For the random pattern, we choose the uniform distribution upon which each node has an equal probability to become the destination of a source. A commonly used nonuniformtrafficpatternisthefixed permutations in which a permutation is defined in advance and applied to generate the destination address based on the source address. We simulate the following permutations: Complement: source x n,1x n,2 x 1 x ) destination x n,1x n,2 x 1 x ; Transpose: source x n,1x n,2 x 1 x )destination x n=2,1 x x n,1x n=2, where x i is the complement of x i,andn is assumed to be even. We first compare different schemes under various traffic patterns and message length distributions with a buffer size of 2. Figure 1 plots the communication latency versus link utilization under the uniform traffic pattern, where the average lengths of long and short messages are 1 and 2 ( s = l =5). In the scheme, the back-off time of the scheme is 2 cycles and the maximum number of pending long messages is 1. It is shown that can 3
4 only sustain about 2% of link utilization. The first observation on the routing scheme is that the short message latency increases sharply when the link utilization reaches 35%, while the long message latency remains stable. This is because under a light traffic, short messages can effectively detour around long messages due to the adaptive routing capability. However, the possibility of short messages being blocked by long messages is increased rapidly with the increase of network traffic. The performance of short messages suffers if they are blocked for a long time. The proposed scheme can support effective transmission of short messages even under high network loads. It is noted that the scheme is slightly worse than the method on long messages when the system is lightly loaded. This is because in the scheme long messages may be affected by the short messages that use the same physical links due to the deterministic nature of short message routing. When the network traffic is moderate, the interference between long messages becomes the dominant factor on the long message performance. The scheme can more evenly distribute long messages over the network and thus has a better performance. The performance of different routing schemes under the complement traffic pattern is depicted in Figure 2, assuming the same system environment as above. Similar performance trends are observed in which the network saturates for short messages soon with and the routing when the network load is increased. The routing performs stable and better than the others in this case. For the transpose traffic pattern, all the schemes can only sustain less than 1% link utilization. Our results comfort with those obtained in [5] that the method outperforms others in this pattern. The scheme performs as poorly as because it uses for short messages. However, the adaptive routing method also fails to reach higher utilization, The performance effect of the message length is further illustrated in Figure 3, where the average length of long messages is increased to 2 units. The performance of the is not shown in this figure since it saturates even under the 5% load. It can be seen that the performance of the and the routing scheme are more sensitive to the lengths of long messages. The scheme also performs better than the others for long messages in this case because it reduces the contention among long messages more effectively. We also study the performance impact of various system parameters for the scheme. When using different back-off times from to 1, only minor performance difference is observed due to the low control message overhead, so that a small back-off time is suggested for a lightly or moderately loaded network. It is also noticed that control AVERAGE LONG MESSAGE LATENCY Figure 1. Communication latency versus link utilization under uniform traffic pattern. messages only consume about.6% to 1% of the bandwidth used by data messages. The buffer size affects the performance significantly in all the schemes compared. The network performance is upper bounded to 4% of link utilization in the scheme when a single buffer is used. It is also noticed that when the buffer size is larger than the average length of short messages, further increasing the buffer size only has minor performance improvement. This might be because a short message can reside completely at a single node and its residential time only depends on the status of the next node. Since the comparison results of different schemes under various buffer sizes are coincident with what we have demonstrated above, we omit the results in this article. 5. Conclusion In this paper, we discussed a interconnection network architecture for communication-intensive applications. We proposed an alternative approach to use vir- 4
5 AVERAGE LONG MESSAGE LATENCY Figure 2. Communication latency versus link utilization under complement traffic pattern mode 15 Figure 3. Communication latency versus link utilization under the uniform traffic pattern. tual channels, and suited to the applications with integrated communication service types. demonstrated the necessity and the feasibility of integrating different network technologies. References [1] R. Buck, The Oracle media server for ncube massively parallel systems, Proc. of the 8th Int l Parallel Processing Symp., pp , April, [2] S.A. Felperin, L. Gravano, G.D. Pifarre, and J.L.C. Sanz, Routing techniques for massively parallel communications, Proceedings of the IEEE, vol. 79, pp , April, [3] L. Ni and P. McKinley, A survey of wormhole routing techniques in direct networks, IEEE Computer, vol. 26, no. 2, pp , Feb., [4] P.T. Gaughan and S. Yalamanchili, Adaptive Routing Protocols for Hypercube Interconnection Networks, IEEE Computer, vol. 26, no. 5, pp , May, [5] G. Pifarre, L. Gravano, S. Felperin, and J. Sanz, Fully adaptive minimal deadlock-free packet routing in hypercubes, meshes, and other networks: Algorithms and simulations, IEEE Trans. on Parallel and Distributed Systems, vol. 5, no. 3, pp , [6] C. Glass and L. Ni, The turn model for adaptive routing, Proc. of the 19th Annual Int l Symposium on Computer Architecture, pp , May [7] W.J. Dally and C.L. Seitz, Deadlock-free message routing in multiprocessor interconnection networks, IEEE Trans. on Computers, vol. 36, pp , May, [8] P. T. Gaughan and S. Yalamanchili, A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks, IEEE Trans. on Parallel and Distributed Systems, vol. 6, no. 5, pp , July, [9] Y.-L. Chen and J.-C. Liu, A Fault-Tolerant Distributed Subcube Management Scheme for Hypercube Multicomputers, IEEE Trans. on Parallel and Distributed Systems, vol. 6, no. 7, pp , July,
Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.
Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for
More informationSOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*
SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School
More informationThe Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns
The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering
More informationDeadlock- and Livelock-Free Routing Protocols for Wave Switching
Deadlock- and Livelock-Free Routing Protocols for Wave Switching José Duato,PedroLópez Facultad de Informática Universidad Politécnica de Valencia P.O.B. 22012 46071 - Valencia, SPAIN E-mail:jduato@gap.upv.es
More informationBoosting the Performance of Myrinet Networks
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations
More informationFlow Control can be viewed as a problem of
NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources
More informationMESH-CONNECTED networks have been widely used in
620 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 5, MAY 2009 Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model Dong Xiang, Senior Member, IEEE, Yueli Zhang,
More informationA Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ
A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationSoftware-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks
Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks J. M. Martínez, P. López, J. Duato T. M. Pinkston Facultad de Informática SMART Interconnects Group Universidad
More informationBARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs
-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The
More informationFault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults
Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science
More informationEE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes
EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,
More informationPerformance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing
Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica
More informationWormhole Routing Techniques for Directly Connected Multicomputer Systems
Wormhole Routing Techniques for Directly Connected Multicomputer Systems PRASANT MOHAPATRA Iowa State University, Department of Electrical and Computer Engineering, 201 Coover Hall, Iowa State University,
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationLecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control
Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,
More informationDesign and Implementation of Buffer Loan Algorithm for BiNoC Router
Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationPerformance Analysis of a Minimal Adaptive Router
Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994
More informationTraffic Control in Wormhole Routing Meshes under Non-Uniform Traffic Patterns
roceedings of the IASTED International Conference on arallel and Distributed Computing and Systems (DCS) November 3-6, 1999, Boston (MA), USA Traffic Control in Wormhole outing Meshes under Non-Uniform
More informationPerformance of Multihop Communications Using Logical Topologies on Optical Torus Networks
Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationA Real-Time Communication Method for Wormhole Switching Networks
A Real-Time Communication Method for Wormhole Switching Networks Byungjae Kim Access Network Research Laboratory Korea Telecom 62-1, Whaam-dong, Yusung-gu Taejeon, Korea E-mail: bjkim@access.kotel.co.kr
More informationGeneric Methodologies for Deadlock-Free Routing
Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University
More informationFault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson
Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip
More informationSwitching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.
Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine
More informationBasic Low Level Concepts
Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationNOC Deadlock and Livelock
NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,
More informationCommunication Performance in Network-on-Chips
Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In
More informationDUE to the increasing computing power of microprocessors
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and
More informationAdaptive Multimodule Routers
daptive Multimodule Routers Rajendra V Boppana Computer Science Division The Univ of Texas at San ntonio San ntonio, TX 78249-0667 boppana@csutsaedu Suresh Chalasani ECE Department University of Wisconsin-Madison
More informationDeadlock. Reading. Ensuring Packet Delivery. Overview: The Problem
Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in
More informationthe possibility of deadlock if the routing scheme is not appropriately constrained [3]. A good introduction to various aspects of wormhole routing is
The Red Rover Algorithm for DeadlockFree Routing on Bidirectional Rings Je Draper USC/Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292 (310)822 1511 x750 Email: draper@isi.edu,
More informationDeadlock and Livelock. Maurizio Palesi
Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,
More informationDeadlock-free XY-YX router for on-chip interconnection network
LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ
More informationPacket Switch Architecture
Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.
More informationPacket Switch Architecture
Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.
More informationA Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks
A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC
More informationDeadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201
Deadlock-free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201 Yoshiko Yasuda, Hiroaki Fujii, Hideya Akashi, Yasuhiro Inagami, Teruo Tanaka*,
More informationInterconnection Networks: Routing. Prof. Natalie Enright Jerger
Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly
More informationOptimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres
Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,
More informationEfficient Communication in Metacube: A New Interconnection Network
International Symposium on Parallel Architectures, Algorithms and Networks, Manila, Philippines, May 22, pp.165 170 Efficient Communication in Metacube: A New Interconnection Network Yamin Li and Shietung
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationLecture: Interconnection Networks
Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet
More informationThis chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research
CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #8 2/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationA New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes
IEEE TRANSACTIONS ON COMPUTERS, VOL. 50, NO. 7, JULY 2001 1 A New Adaptive Hardware Tree-Based Multicast Routing in K-Ary N-Cubes Dianne R. Kumar, Member, IEEE, Walid A. Najjar, and Pradip K. Srimani,
More informationA New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia
A New Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks J. Duato Facultad de Informatica Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia, SPAIN E-mail: jduato@aii.upv.es
More informationCombining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?
Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática
More informationLecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels
Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a
More informationThe Odd-Even Turn Model for Adaptive Routing
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 7, JULY 2000 729 The Odd-Even Turn Model for Adaptive Routing Ge-Ming Chiu, Member, IEEE Computer Society AbstractÐThis paper presents
More informationLecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance
Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,
More informationInterconnection Networks: Flow Control. Prof. Natalie Enright Jerger
Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:
More informationFault-Tolerant Routing Algorithm in Meshes with Solid Faults
Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University
More informationRouting Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip
Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract
More informationFault-Tolerant and Deadlock-Free Routing in 2-D Meshes Using Rectilinear-Monotone Polygonal Fault Blocks
Fault-Tolerant and Deadlock-Free Routing in -D Meshes Using Rectilinear-Monotone Polygonal Fault Blocks Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL
More informationLecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background
Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation
More informationLecture 3: Flow-Control
High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor
More informationVIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009
VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs April 6 th, 2009 Message Passing Costs Major overheads in the execution of parallel programs: from communication
More informationInterprocessor Communication. Basics of Network Routing
Interprocessor Communication There are two main differences between sequential computers and parallel computers -- multiple processors and the hardware to connect them together. That hardware is the most
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationDLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip
DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,
More informationThomas Moscibroda Microsoft Research. Onur Mutlu CMU
Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank
More informationDesign of a System-on-Chip Switched Network and its Design Support Λ
Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of
More informationRemoving the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ
Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. of Computer Engineering (DISCA) Universidad Politécnica de Valencia
More informationFT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links
FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links Hoda Naghibi Jouybari College of Electrical Engineering, Iran University of Science and Technology, Tehran,
More informationPower and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip
2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri
More informationAdaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636
1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5
More informationTDT Appendix E Interconnection Networks
TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationNetwork on Chip Architecture: An Overview
Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology
More informationOn Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors
On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors Govindan Ravindran Newbridge Networks Corporation Kanata, ON K2K 2E6, Canada gravindr@newbridge.com Michael
More informationCharacteristics of Mult l ip i ro r ce c ssors r
Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central
More informationJoint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals
Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University
More informationCAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC
QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering
More informationDesign of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh
98 Int. J. High Performance Systems Architecture, Vol. 1, No. 2, 27 Design of a router for network-on-chip Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh Department of Electrical Engineering and Computer
More informationTotal-Exchange on Wormhole k-ary n-cubes with Adaptive Routing
Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing Fabrizio Petrini Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD, England e-mail: fabp@comlab.ox.ac.uk
More informationA MULTI-PATH ROUTING SCHEME FOR TORUS-BASED NOCS 1. Abstract: In Networks-on-Chip (NoC) designs, crosstalk noise has become a serious issue
A MULTI-PATH ROUTING SCHEME FOR TORUS-BASED NOCS 1 Y. Jiao 1, Y. Yang 1, M. Yang 2, and Y. Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Dept. of Electrical and Computer
More informationFault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections
Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple
More informationLecture 7: Flow Control - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical
More informationMulticomputer distributed system LECTURE 8
Multicomputer distributed system LECTURE 8 DR. SAMMAN H. AMEEN 1 Wide area network (WAN); A WAN connects a large number of computers that are spread over large geographic distances. It can span sites in
More informationEECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal
Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,
More informationLecture 16: On-Chip Networks. Topics: Cache networks, NoC basics
Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality
More informationEarly Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks
Technical Report #2012-2-1, Department of Computer Science and Engineering, Texas A&M University Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Minseon Ahn,
More informationRouting Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)
Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup
More informationGeneralized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent
Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent Anjan K. V. Timothy Mark Pinkston José Duato Pyramid Technology Corp. Electrical Engg. - Systems Dept.
More informationSERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS
SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS 1 SARAVANAN.K, 2 R.M.SURESH 1 Asst.Professor,Department of Information Technology, Velammal Engineering College, Chennai, Tamilnadu,
More informationJUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS
1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS
More informationDeadlock-Free Adaptive Routing in Meshes Based on Cost-Effective Deadlock Avoidance Schemes
Deadlock-Free Adaptive Routing in Meshes Based on Cost-Effective Deadlock Avoidance Schemes Dong Xiang Yueli Zhang Yi Pan Jie Wu School of Software Tsinghua Universit Beijing 184, China School of Software
More informationRecall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms
CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationCommunication in Multicomputers with Nonconvex Faults?
In Proceedings of EUROPAR 95 Communication in Multicomputers with Nonconvex Faults? Suresh Chalasani 1 and Rajendra V. Boppana 2 1 Dept. of ECE, University of Wisconsin-Madison, Madison, WI 53706-1691,
More informationNetworks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized
More informationA Literature Review of on-chip Network Design using an Agent-based Management Method
A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,
More informationDynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution
Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Nishant Satya Lakshmikanth sailtosatya@gmail.com Krishna Kumaar N.I. nikrishnaa@gmail.com Sudha S
More informationBackup segments. Path after failure recovery. Fault. Primary channel. Initial path D1 D2. Primary channel 1. Backup channel 1.
A Segmented Backup Scheme for Dependable Real Time Communication in Multihop Networks Gummadi P. Krishna M. Jnana Pradeep and C. Siva Ram Murthy Department of Computer Science and Engineering Indian Institute
More informationModule 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth
Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012
More information