Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu
Topics Taxonomy Metric Topologies Characteristics Cost Performance 2
Interconnection Networks Carry data between processors and to memory. Components Switches Links (wires, fiber) Classifications Static networks Point-to-point communication links among processing nodes A.k.a. direct networks Dynamic networks Using switches and communication links A.k.a. indirect networks 3
Static and Dynamic Networks static(direct) network dynamic(indirect) network 4
Dynamic Network Switch Map a fixed number of inputs to outputs Number of ports Degree of the switch. Switch cost Grows as the square of switch degree Packaging costs linearly as the number of pins 5
Network Interfaces Links processors (or node) to the interconnect Functions Packetizing communication data Computing routing information Buffering incoming/outgoing data Network interface connection I/O buss: Peripheral Component Interface express (PCIe) Memory bus: e.g., AMD HyperTransport, Intel QuickPath Network performance Depends on relative speeds of I/O and memory busses 6
Example: Intel Quickpath Interconnect 7
Network Topologies A variety of network topologies exist Topologies tradeoff performance for cost Commercial machines often implement hybrids of multiple topologies Due to packaging, cost, and available components 8
Metrics for Interconnection Networks Degree # of links per node Diameter Longest distance between two nodes in the network Worst case communication latency Bisection width Minimum # of wire cuts to divide the network into two equal parts Cost # of links and switches 9
Network Topologies: Buses All processors access a common bus for exchanging data Used in simplest and earliest parallel machines Ex. Sun enterprise servers, Intel Pentium Advantages Distance between any two nodes is O(1) Provides a convenient broadcast media Disadvantage Bus bandwidth is a performance bottleneck P P P Bus 10
Network Topologies: Buses CPU Interrupt controller 256-KB L 2 $ P-Pro module P-Pro module P-Pro module Bus interface P-Pro bus (64-bit data, 36-bit addr ess, 66 MHz) Bus-based interconnects without local caches PCI bridge PCI bridge Memory controller PCI I/O cards PCI bus PCI bus MIU 1-, 2-, or 4-way interleaved DRAM Interconnects in Intel Pentium Pro Quad Bus-based interconnects with local caches 11
Network Topologies: Crossbars A crossbar network uses an p m grid of switches to connect p inputs to m outputs in a non-blocking manner 12
Network Topologies: Crossbars Cost O(p 2 ) for p processors (and memory banks) Difficult to scale for large values of p Ex. Sun Ultra HPC 10000 and the Fujitsu VPP500. 13
Multistage Networks Busses Excellent cost scalability Poor performance scalability Crossbars Excellent performance scalability Poor cost scalability Multistage interconnects Compromise between these extremes 14
Multistage Networks The schematic of a typical multistage interconnection network. 15
Multistage Omega Network Organization log p stages p Inputs and outputs At each stage, input i is connected to output j if: 16
Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs 17
Multistage Omega Network The perfect shuffle patterns are connected using 2 2 switches. The switches operate in two modes Pass-through Cross-over Pass-through Cross-over 18
Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. Cost: p/2 log p switches à O(p log p) 19
Multistage Omega Network Routing s is source processor in binary representation d is destination processor in binary representation In each stage if the most significant bits in s and d are the same Pass-through Otherwise Cross-over Strip the most significant bits Repeat for each of the log p switching stages 20
Multistage Omega Network Routing cross-over cross-over pass-through Example: 001 à 100 1. Stage 1: 0!= 1 à cross-over 2. Stage 2: 0 == 0 à pass-through 3. Stage 3: 1!= 0 à cross-over 21
Blocking in Omega Network One of the messages (010 to 111 or 110 to 100) is blocked at link AB 22
Completely Connected Network Each processor is connected to every other processor Costs # of links is O(p 2 ) Performance scales very well Hardware complexity is not realizable for large values of p Static counterparts of crossbars. 23
Star Connected Network Every node is connected only to a common node at the center Distance between any pair of nodes O(1) But, the central node becomes a bottleneck Static counterparts of buses 24
Linear Array & Ring Linear array Each node has two neighbors, one to its left and one to its right Ring (or 1-D torus) If the nodes at either end are connected (having a wrap-around link) 25
Meshes and k-dimensional Meshes Mesh Generalization of linear array to 2D 4 neighbors (north, south, east, and west) k-dimensional mesh 2k neighbors 2D mesh 2D torus 3D mesh 26
Hypercubes 0D 1D 2D 3D 4D 27
Hypercubes Distance between any two nodes is at most log p Each node has log p neighbors Distance between two nodes # of bit positions at which the two nodes differ 28
Tree-Based Networks Static tree network Dynamic tree network 29
Tree-Based Networks Distance between two nodes is at most 2 log p Easy to layout as planar graphs E.g. H-Trees H-Tree Root can become bottleneck Links closer to root carry more traffic than those at lower levels Solution: fat tree Fattens the links as we go up the tree 30
Fat Tree A fat tree network of 16 processing nodes. 31
Evaluating Interconnection Networks Diameter Longest distance between two nodes Measuring the longest latency of possible communications Bisection Width Minimum # of wire cuts to divide the network into two equal parts Measuring # of concurrent communications two concurrent communications vs. 4 concurrent communications Cost # of links or switches Ability to layout the network Length of wires 32
Static Interconnection Networks Network Diameter Bisection Width Cost (# of links) Completely-connected Star Complete binary tree Linear array 2-D mesh, no wraparound 2-D wraparound mesh Hypercube Wraparound k-ary d-cube 33
Dynamic Interconnection Networks Network Diameter Bisection Width Cost (# of switches) Crossbar Omega Network Dynamic Tree 34
Summary Interconnection network Performance (latency, bandwidth), Cost (#links, #switches) Used to be important, becomes less important Likely to be important for multi-core processors Topologies Low dimension networks Bus, ring, mesh, torus embedding into 2D/3D Direct network (nodes are connected directly) Logarithmic networks (multi-stage networks) More switches between nodes (nodes are connected indirectly) High dimension networks Hypercube (binary n-cube) theoretically good characteristics But degree of node increases exponentially impractical in real world 35
References Chapter 2.4.2-2.4.4 in Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, Addison Wesley, 2003 COMP422: Parallel Computing by Prof. John Mellor-Crummey at Rice Univ. 36