Homework Assignment #1: Topology Kelly Shaw
|
|
- Georgina O’Connor’
- 5 years ago
- Views:
Transcription
1 EE482 Advanced Computer Organization Spring 2001 Professor W. J. Dally Homework Assignment #1: Topology Kelly Shaw As we have not discussed routing or flow control yet, throughout this problem set assume (1) that a network can support throughput to a level that saturates the bottleneck channel, i.e. speedup = 1, and (2) that traffic between two nodes is uniformly distributed over all minimum length paths between those two nodes. Problem 1: Path Diversity Consider the following topologies which all have 1GByte/s channels. 1.) 2-ary 4-fly 2.) 2-ary 4-fly extended by one stage at the beginning 3.) Benes network created from two 2-ary 4-flies. Evaluate each of the following traffic patterns on the 2-ary 4-fly. If the sustainable bandwidth for a traffic pattern on the 2-ary 4-fly is less than 1GByte/s, you must evaluate the traffic pattern on the extended butterfly. If the sustainable bandwidth for a traffic pattern on the extended butterfly is less than 1GByte/s, you must then evaluate the traffic pattern on the Benes network. 1.) What is the sustainable bandwidth per port of this network on traffic that routes from {b3, b2, b1, b0} to {b0, b3, b2, b1}? This means you'll route to {0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15}. 2.) What is the sustainable bandwidth per port of this network on traffic that routes from {b3, b2, b1, b0} to {b1, b3, b2, b0}? 3.) What is the sustainable bandwidth per port of this network on traffic that routes from {b3, b2, b1, b0} to {b2, b1, b0, b3}? (Note: You may have to calculate channel load in a manner similar to when multiple minimal paths are possible due to the randomized routing in the extended stages.) We ll use the following notation for this problem: 1. -> denotes going from an input port to an output port of a switch 2. denotes the butterfly permutations between switches of adjacent stages 1.) {b3, b2, b1, b0} to {b0, b3, b2, b1} On a 2-ary 4-fly: The route taken in binary is: Source: {b3,b2,b1,b0} -> {b3,b2,b1,b0} {b0,b2,b1,b3} -> {b0,b2,b1,b3} {b0,b3,b1,b2} -> {b0,b3,b1,b2} {b0,b3,b2,b1} -> {b0,b3,b2,b1}: Destination The easiest way to think of routing in a butterfly is to realize that each of the N/k switches per stage is uniquely numbered. Log k (N/k) base-k digits are used to number each switch. In any ordinary butterfly, all of the inputs to a switch will have the same log k (N/k) digits as the digits in the switch s unique number. The butterfly permutations occurring between switches of adjacent stages cause this to be true. 1
2 If the output port to be taken out of the switch is some function of the digits used in the switch s number, there will be conflicts because the inputs are trying to contend for the same output ports. The number of conflicts depends on the function determining which output port is desired. In the 2-ary 4-fly, each switch is represented by 3 binary digits. At the first stage, switches are represented by the digits b3b2b1. Since the first digit of the destination is b0 for this permutation, neither of the inputs on a switch will conflict; the inputs have different values for b0. The channel load is 1. Between the first and second stage, the address is permuted to be b0b2b1. This is the switch address for inputs in the second stage. The inputs to this switch have the same values for these 3 digits but differ for their value of b3 (one input will have b3=1 and one will have b3=0 based on the construction of the butterfly). Since the inputs differ in their value for b3 (their desired output port), there are no conflicts in the second stage and the channel load is 1. At stage 3, the switch address will be b0b3b1 and the inputs are being routed to the output port represented by the value b2. Again, there are no conflicts so the channel load is 1. Finally, the switch address in the 4 th stage is b0b3b2. The desired output port for each switch input is represented by b1, so there are no conflicts. Channel load is 1. The maximum channel load is therefore 1 (for all stages) and the sustainable bandwidth is 1Gbyte/s. 2.) {b3, b2, b1, b0} to {b1,b3,b2,b0} On a 2-ary 4-fly: The route taken in binary is: Source: {b3,b2,b1,b0} -> {b3,b2,b1,b1} {b1,b2,b1,b3} -> {b1,b2,b1,b3} {b1,b3,b1,b2} -> {b1,b3,b1,b2} {b1,b3,b2,b1} -> {b1,b3,b2,b0}: Destination In the first stage, the switch address is b3b2b1. Both inputs in each switch want to take the output port represented by b1. However, both inputs have the same value for b1 (otherwise they wouldn t be inputs to the same switch). Therefore, there is a conflict and channel load is 2. At the second stage, there are two inputs on each input port for every other switch. The address of the switch is b1b2b1 and each pair of inputs wants to take the output port represented by b3. The inputs from different input ports do not have the same value for b3 although they do have the same values for b1 and b2. This means there is no conflict between pairs of inputs although each pair causes each output channel to have a load of 2. The third switch will also have two inputs on each input port. The switch s address is b1b3b1. Each pair of inputs wants to take the output port represented by b2. Because the input pairs do not have the same value for b2, there are no additional conflicts and channel load remains at two. Finally, the fourth switch has an address of b1b3b2 and has two inputs on a single input port. Again, the inputs want to go out on different output ports (b0) so the channel load is 1. The maximum channel load is 2 for all stages and the sustainable bandwidth is 1 Gbyte/s / 2 = 0.5Gbyte/s. On an extended butterfly: The first stage of the extended butterfly is a repeat of the final stage of the 2-ary 4-fly. For every two switches, this new stage exchanges one input from each switch randomly. This means that each switch in the first stage of the original butterfly will have one input with b1=1 and one input with b1=0. The values for b2 and b3 will be the same as in the original butterfly configuration. (The variance in b1 means that switch addresses are not completely synonymous with the first 3 bits of the input addresses; only bits b3 and b2 will match the switch address.) It is possible that a first stage switch in the original butterfly may now have two inputs with the same b0 value. Having explained how the network inputs are exchanged in the new stage, we now examine the first stage of the original butterfly. At the first stage of the original butterfly, the inputs want to leave on the output ports designated by b1. Because of the arrangement done by the new stage, each switch has exactly one input with b1=1 and one with b1=0 so there is no conflict. The channel load is 1. At the second stage of the original butterfly, the inputs will have the same values for b1 and b2 and the desired output is represented by b3. The added stage did not alter the usual butterfly pattern for b3 or b2, so the two inputs to each switch will not have the same b3 value. Thus, there is 2
3 no conflict and the channel load is 1. At the third stage of the original switch, the inputs have the same values for b1 and b3 and the desired output port is b2. Again, there is no contention for the output port, so channel load is 1. At the fourth stage of the original switch, the switch address is b1b3b2 and the output port is b0. There is no conflict and channel load is 1. The maximum channel load is 1 and the sustainable bandwidth is 1 Gbyte/s. 3.) {b3, b2, b1, b0} to {b2,b1,b0,b3} On a 2-ary 4-fly: The route taken in binary is: Source: {b3,b2,b1,b0} -> {b3,b2,b1,b2} {b2,b2,b1,b3} -> {b2,b2,b1,b1} {b2,b1,b1,b2} -> {b2,b1,b1,b0} {b2,b1,b0,b1} -> {b2,b1,b0,b3}: Destination In the first stage of the butterfly, each switch is numbered b3b2b1. The desired output for each input is represented by b2. b2 is used to represent the switch number so both inputs to a switch have the same value for b2. This means there is a conflict and the channel load is 2. Four switches in the next stage have two inputs on each input port. The second stage switch is numbered b2b2b1 and the desired output port for the two pairs of inputs is b1. Because b1 is used in the numbering of the switch, both pairs of inputs want to exit the switch on the same output port. This results in a channel load of 4. At the third stage, 4 inputs enter one input port and there are no inputs on the other port (they are all overloading the input port of a different switch). The switch at this stage is represented by b2b1b1. At this point, you need to realize that there are 4 inputs to this switch with the same values for b2 and b1. They must, therefore, vary in digits b3 and b0; two inputs will have b3=0, two will have b3=1 (because the inputs come from different halves of the butterfly), two will have b0=0, and two will have b0=1 (because the pairs came about from the inputs to the first stage switches which differ in b0). The two inputs with the same value for b0=0 will therefore contend for one output port and the two inputs with b0=1 will contend for the other output port. Thus, channel load is 2. At stage four, the switch is represented by b2b1b0 and the desired output port is b3. The switch will have a pair of inputs on one input port. These inputs have different values for b3, so there is no contention. The maximum channel load is 4 so the sustainable bandwidth is 1Gbyte/s / 4 = An extended butterfly: As stated in part 2, the new stage randomly routes between inputs from one pair of switches in one stage to another pair of switches in the next stage. The switches at the first stage of the original butterfly are still represented by b3b2 but the third digit is not the same for both inputs. Each switch will have inputs with the same values for b3 and b2, different values for b1, and may have different or the same values for b0. In this traffic pattern, the inputs at the first stage of the original butterfly desire to exit the switch using the output port represented by b2. However, each switch has two inputs with the same value for b2. This means that the channel load will be 2. Four switches in the second stage of the original butterfly will have two inputs on each input port. They will want to leave the switch using the output port represented by b1. One input from each input port pair will have b1=0 and one input of the pair will have b1=1. (This is because of the randomization done by the new stage.) This means that two inputs will want to leave on the 0 output port and two inputs will want to exit on the 1 output port. The channel load is therefore 2. At the third stage of the original butterfly, one input port at each switch will have two inputs. They will want to use the output port designated by b0. At the added stage, we exchanged inputs randomly. This meant that each switch routed traffic with b0=0 to the top switch in the next stage 50% of the time and to the bottom switch 50% of the time. That means that there was a 50% probability that the inputs have the same value for b0 and a 50% probability that the inputs have differing values for b0. This means that a given channel out of the switch will have a load of 0 with 0.25 probability, a load of 1 with 0.5 probability, and a load of 2 with 0.25 probability. The channel load is therefore 0.25*0+0.5*1+0.25*2 = 1. Each final switch has two inputs on different input ports that are to exit on the output port represented by b3. The b3 values for the two inputs at each switch in the final stage will be different. Thus, channel load is 1. 3
4 The maximum channel load is 2 so the sustainable bandwidth is 1 Gbyte/s / 2 = 0.5 Gbyte/s. In the Benes network: Each new stage added to create a Benes network routes traffic randomly. At each new stage in the network, a bit is randomized. The first new stage of the Benes network results in two inputs to the switch in the next stage with identical values for b3 and b2, different values for b1, and the value for b0 will be 0 half of the time and 1 the other half of the time. The second new stage results in two inputs to the next stage with identical values for b3, the inputs will have different values for b2 and the values for b0 and b1 will both be 0 half of the time and 1 the other half of the time. The third new stage results in two inputs to the switch in the next stage with different values for b3 and the values for b0, b1, and b2 will be 0 half of the time and 1 the other half of the time. Thus, at the first stage of the original butterfly, you have two inputs with different b3 values and values for b0, b1, and b2 which are 0 half of the time and 1 the other half of the time. Suppose you try to determine the channel load when using b0, b1, or b2 to choose a given output channel. There is then a 0.25 probability that neither input will be routed to the channel, a 0.25 probability that both inputs will be routed to that channel, and 0.5 probability that exactly one input will be routed to the channel. The channel load will therefore be 0.25*0+0.25*2+0.5*1=1. On the traffic pattern b2b1b0b3, the channel load at each original butterfly stage will be 1 when using b2, b1, and b0 to determine the output port because of the randomization introduced by the additional stages. At the last stage, none of the inputs at a switch will have the same value for b3 and the channel load will be 1. Therefore, the maximum channel load is 1 and the sustainable bandwidth is 1Gbyte/s. Problem 2: k vs. n Comparison You need to create a k-ary n-cube with 256 nodes given the following constraints. At most 384 signals can go off of a chip. Similarly, 512 signals may go off a board. Finally, you are constrained by the backplane; 6000 signals may cross the midsection of the backplane. a.) Determine k and n. b.) Explain how you would package this network on chips and boards. c.) Using the same number of chips and boards, what is the largest square crossbar you can construct? 4
5 N 256 Ws 6000 wire bisection Wn 384 signals per chip Wb 512 signals per board L 512 bit packet length (assumed) n k w-chip w-board w-s w Bs H Ts T=H+Ts nodes per board nodes per board node per board nodes per board - in a ring nodes per board node per board nodes per board nodes per board node per board node per board If bandwidth is the most important criteria, the optimum choice is n=4, k=4 packaged 1 node to a board. This gives a bisection bandwidth of 5888 (nearly the limit of 6000) and a latency of 26 cycles. If minimum latency is more important, the n=2, k=16 design offers a latency of 19 cycles but at a penalty of reducing bandwidth by half to The network is packaged one node per chip and one chip per board. Placing more than one chip per board reduces the width of the channels and hence the bandwidth of the network. Constructing a square crossbar with 256 chips can be done as an array of 16 x 16 chips. Each chip implements a 192 x 192 port crossbar for a 3072 x 3072 port 1-bit wide crossbar. Of course, if you wanted to implement a network with w=23-bit wide channels, like the 4-ary 4-cube designed in part (a), then each chip could only hold an 8x8 crossbar and the 16 x 16 array would realize a 128 x 128 crossbar (not counting chips for fanout and fanin). Note that this 128 x 128 w=23-bit crossbar has only about half the bisection bandwidth (128 x 23 = 2944) of the 4- ary 4-cube. Problem 3: Concentration and Sharing (note: 7 questions) You are given the task of creating a 16-ary 2-cube network with 1GByte/s channels. Each port has an average bandwidth of 225 MByte/s. When a node presents a request to the network, it does so at a peak bandwidth of 1GByte/s for 200ns. a.) Estimate the cost of this network in terms of total pin bandwidth of the routers. b.) You decide to use concentrators. What degree of concentration should you use? c.) What are the average and peak bandwidths of the concentrated node? d.) What will be the topology of your new network? e.) Estimate the cost of your new network with concentrators, still assuming 1GByte/s channels. f.) What is the serialization latency for an access, still assuming 1GByte/s channels? g.) If you set the network to handle average bandwidth, what is the serialization latency? Soln: This solution assumes bidirectional channels. 5
6 a.) Each node has 4*2 channels. There are 256 nodes. The bandwidth of the channels is 1Gbyte/s. Therefore, the total pin bandwidth is (8 channels/node) *(256 nodes)*(1 Gbyte/s / channel) = 2048Gbyte/s or 2Tbyte/s. b.) The channels have a bandwidth of 1Gbyte/s. Given that the average bandwidth of a port is 225Mbyte/s, you can use a concentration of 4. This will actually decrease your channel load from 2 to 1 because you cut the radix of the network from 16 to 8. c.) The average bandwidth of the concentrated node is 900 Mbyte/s and the peak bandwidth is 4 Gbyte/s. d.) We are combining every four nodes. The way we will do this is by combining every two nodes in each dimension. This means each ring in the network will consist of 8 concentrated nodes. The final network is an 8-ary 2 cube. e.) The total number of concentrated nodes is 8^2 = 64. Since each concentrated node still has 8 channels, the total pin bandwidth is 8*64*1Gbyte/s = 512 Gbyte/s. f.) The serialization latency of the network would be 200ns with 1 Gbyte/s channels. g.) By sizing channels with the average bandwidth, 900MB/s, we ve reduced the channel bandwidth by a factor of The serialization latency, therefore, increases by the same factor 200ns *1.11 = 222ns. 6
Topology basics. Constraints and measures. Butterfly networks.
EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April
More informationLecture 3: Topology - II
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and
More informationChapter 3 : Topology basics
1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement
More informationChapter 4 : Butterfly Networks
1 Chapter 4 : Butterfly Networks Structure of a butterfly network Isomorphism Channel load and throughput Optimization Path diversity Case study: BBN network 2 Structure of a butterfly network A K-ary
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationCS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2
Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts
ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running
More informationRecall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms
CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationInterconnection Network Project EE482 Advanced Computer Organization May 28, 1999
Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999 Group Members: Overview Tom Fountain (fountain@cs.stanford.edu) T.J. Giuli (giuli@cs.stanford.edu) Paul Lassa (lassa@relgyro.stanford.edu)
More informationInterconnection networks
Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory
More informationSHARED MEMORY VS DISTRIBUTED MEMORY
OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationLecture: Interconnection Networks
Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet
More informationLecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue
More informationCS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control
CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed
More informationEE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1
EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What
More informationECE 697J Advanced Topics in Computer Networks
ECE 697J Advanced Topics in Computer Networks Switching Fabrics 10/02/03 Tilman Wolf 1 Router Data Path Last class: Single CPU is not fast enough for processing packets Multiple advanced processors in
More informationBasic Switch Organization
NOC Routing 1 Basic Switch Organization 2 Basic Switch Organization Link Controller Used for coordinating the flow of messages across the physical link of two adjacent switches 3 Basic Switch Organization
More informationCommunication Performance in Network-on-Chips
Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS
ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS Mariano Benito Pablo Fuentes Enrique Vallejo Ramón Beivide With support from: 4th IEEE International Workshop of High-Perfomance Interconnection
More informationInterconnection Networks. Issues for Networks
Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time
More informationCS575 Parallel Processing
CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
More informationCS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationThe final publication is available at
Document downloaded from: http://hdl.handle.net/10251/82062 This paper must be cited as: Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The
More informationRouting Algorithms. Review
Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationAdvanced Computer Networks Data Center Architecture. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015
Advanced Computer Networks 263-3825-00 Data Center Architecture Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 1 MORE ABOUT TOPOLOGIES 2 Bisection Bandwidth Bisection bandwidth: Sum of the
More informationLecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance
Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,
More informationInterconnection Networks
Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially
More informationParallel Computing Platforms
Parallel Computing Platforms Network Topologies John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14 28 February 2017 Topics for Today Taxonomy Metrics
More informationCrossbar - example. Crossbar. Crossbar. Combination: Time-space switching. Simple space-division switch Crosspoints can be turned on or off
Crossbar Crossbar - example Simple space-division switch Crosspoints can be turned on or off i n p u t s sessions: (,) (,) (,) (,) outputs Crossbar Advantages: simple to implement simple control flexible
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationPhysical Organization of Parallel Platforms. Alexandre David
Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationInterconnection Network
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics
More informationThe Impact of Optics on HPC System Interconnects
The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes
More informationCS 6143 COMPUTER ARCHITECTURE II SPRING 2014
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 DUE : April 9, 2014 HOMEWORK IV READ : - Related portions of Chapter 5 and Appendces F and I of the Hennessy book - Related portions of Chapter 1, 4 and 6 of
More informationRouting Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)
Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup
More informationInterconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline
More informationNetwork Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)
Network Properties, Scalability and Requirements For Parallel Processing Scalable Parallel Performance: Continue to achieve good parallel performance "speedup"as the sizes of the system/problem are increased.
More informationInvestigating the Use of Synchronized Clocks in TCP Congestion Control
Investigating the Use of Synchronized Clocks in TCP Congestion Control Michele Weigle (UNC-CH) November 16-17, 2001 Univ. of Maryland Symposium The Problem TCP Reno congestion control reacts only to packet
More informationEE382 Processor Design. Illinois
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors Part II EE 382 Processor Design Winter 98/99 Michael Flynn 1 Illinois EE 382 Processor Design Winter 98/99 Michael Flynn 2 1 Write-invalidate
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationLecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control
Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,
More informationMultiprocessor Interconnection Networks
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 19, 1998 Topics Network design space Contention Active messages Networks Design Options: Topology Routing Direct vs. Indirect Physical
More informationNetwork Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)
Network Properties, Scalability and Requirements For Parallel Processing Scalable Parallel Performance: Continue to achieve good parallel performance "speedup"as the sizes of the system/problem are increased.
More informationChapter 7 Slicing and Dicing
1/ 22 Chapter 7 Slicing and Dicing Lasse Harju Tampere University of Technology lasse.harju@tut.fi 2/ 22 Concentrators and Distributors Concentrators Used for combining traffic from several network nodes
More informationOptimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres
Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres Facultad de Informatica, Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia,
More informationOFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management
Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationDesign of a System-on-Chip Switched Network and its Design Support Λ
Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of
More informationInterface The exit interface a packet will take when destined for a specific network.
The Network Layer The Network layer (also called layer 3) manages device addressing, tracks the location of devices on the network, and determines the best way to move data, which means that the Network
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationRACEway Interlink Modules
RACE++ Series RACEway Interlink Modules 66-MHz RACE++ Switched Interconnect Adaptive Routing More than 2.1 GB/s Bisection Bandwidth Deterministically Low Latency Low Power, Field-Tested Design Flexible
More informationEE 4683/5683: COMPUTER ARCHITECTURE
3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics
More informationPacket Switch Architecture
Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.
More informationPacket Switch Architecture
Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.
More informationEE/CSCI 451 Spring 2018 Homework 2 Assigned: February 7, 2018 Due: February 14, 2018, before 11:59 pm Total Points: 100
EE/CSCI 45 Spring 08 Homework Assigned: February 7, 08 Due: February 4, 08, before :59 pm Total Points: 00 [0 points] Explain the following terms:. Diameter of a network. Bisection width of a network.
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationInterconnection Networks: Routing. Prof. Natalie Enright Jerger
Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly
More informationInterconnection Networks
Lecture 15: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Credit: some slides created by Michael Papamichael, others based on slides from Onur Mutlu
More informationLecture 16: On-Chip Networks. Topics: Cache networks, NoC basics
Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality
More informationNetworks, Routers and Transputers:
This is Chapter 1 from the second edition of : Networks, Routers and Transputers: Function, Performance and applications Edited M.D. by: May, P.W. Thompson, and P.H. Welch INMOS Limited 1993 This edition
More informationInterconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.
Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationLecture 7: Flow Control - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical
More informationPhastlane: A Rapid Transit Optical Routing Network
Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens
More informationPOLYMORPHIC ON-CHIP NETWORKS
POLYMORPHIC ON-CHIP NETWORKS Martha Mercaldi Kim, John D. Davis*, Mark Oskin, Todd Austin** University of Washington *Microsoft Research, Silicon Valley ** University of Michigan On-Chip Network Selection
More informationHybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University
Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects
More informationNetwork-on-Chip Micro-Benchmarks
Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of
More informationTCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks
TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks Gwangsun Kim Arm Research Hayoung Choi, John Kim KAIST High-radix Networks Dragonfly network in Cray XC30 system 1D Flattened butterfly
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationCS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 7 th, Review: VLIW: Very Large Instruction Word
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 7 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationAPPLICATION OF GRAPH THEORY IN COMMUNICATION NETWORKS
APPLICATION OF GRAPH THEORY IN COMMUNICATION NETWORKS Suman Deswal 1 and Anita Singhrova 2 1,2 Deenbandhu Chottu Ram University of Sc. & Tech., Murthal, Sonipat. Haryana, India. ABSTRACT The use of mathematics
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationINTERCONNECTION networks are used in a variety of applications,
1 Randomized Throughput-Optimal Oblivious Routing for Torus Networs Rohit Sunam Ramanujam, Student Member, IEEE, and Bill Lin, Member, IEEE Abstract In this paper, we study the problem of optimal oblivious
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationLecture 22: Router Design
Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip
More informationLecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background
Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation
More informationData Communication. Introduction of Communication. Data Communication. Elements of Data Communication (Communication Model)
Data Communication Introduction of Communication The need to communicate is part of man s inherent being. Since the beginning of time the human race has communicated using different techniques and methods.
More informationFDDI-M: A SCHEME TO DOUBLE FDDI S ABILITY OF SUPPORTING SYNCHRONOUS TRAFFIC
FDDI-M: A SCHEME TO DOUBLE FDDI S ABILITY OF SUPPORTING SYNCHRONOUS TRAFFIC Kang G. Shin Real-time Computing Laboratory EECS Department The University of Michigan Ann Arbor, Michigan 48109 &in Zheng Mitsubishi
More information[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.
Switch Design [ 10.3.2] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering. Here is a basic diagram of a switch. Receiver
More informationUnderlying Technologies -Continued-
S465 omputer Networks Spring 2004 hapter 3 (Part B) Underlying Technologies -ontinued- Dr. J. Harrison These slides were produced from material by Behrouz Forouzan for the text TP/IP Protocol Suite (2
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationCSC 220: Computer Organization Unit 12 CPU programming
College of Computer and Information Sciences Department of Computer Science CSC 220: Computer Organization Unit 12 CPU programming 1 Instruction set architectures Last time we built a simple, but complete,
More information18-740/640 Computer Architecture Lecture 16: Interconnection Networks. Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 11/4/2015
18-740/640 Computer Architecture Lecture 16: Interconnection Networks Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 11/4/2015 Required Readings Required Reading Assignment: Dubois, Annavaram,
More informationEE/CSCI 451 Midterm 1
EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming
More informationParallel Computing Interconnection Networks
Parallel Computing Interconnection Networks Readings: Hager s book (4.5) Pacheco s book (chapter 2.3.3) http://pages.cs.wisc.edu/~tvrdik/5/html/section5.html#aaaaatre e-based topologies Slides credit:
More informationGIAN Course on Distributed Network Algorithms. Network Topologies and Local Routing
GIAN Course on Distributed Network Algorithms Network Topologies and Local Routing Stefan Schmid @ T-Labs, 2011 GIAN Course on Distributed Network Algorithms Network Topologies and Local Routing If you
More informationInitial studies of SCI LAN topologies for local area clustering
Prepared for the First International Workshop on SCI-Based Low-Cost/High-Performance Computing, Santa Clara University Initial studies of SCI LAN topologies for local area clustering Haakon Bryhni * and
More informationMultiprocessor Interconnection Networks- Part Three
Babylon University College of Information Technology Software Department Multiprocessor Interconnection Networks- Part Three By The k-ary n-cube Networks The k-ary n-cube network is a radix k cube with
More informationNOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms
Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines
More information