CS575 Parallel Processing

Size: px

Start display at page:

Download "CS575 Parallel Processing"

Randolph Freeman
5 years ago
Views:

1 CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.

2 Interconnection networks Connect Processors, memories, I/O devices Dynamic interconnection networks Connect any to any using switches or busses Two types of switches On / off: 1 input, 1 output Pass through / cross over: 2 inputs, 2 outputs Static interconnection networks Connect point to point using wires CS575 lecture 3 2

3 Dynamic Interconnection Network: Crossbar Connects e.g. p processors to b memories p * b matrix p horizontal lines, b vertical lines Cross points: on/off switches Only one switch on per (row,column) pair Non blocking: P i to M j does not block P l to M k Very costly, does not scale well p * b switches, complex timing and checking CS575 lecture 3 3

4 Dynamic Interconnection Network: Bus Connects processors, memories, I/O devices Master: can issue a request to get the bus Slave: can respond to a request, one bus is granted If there are multiple masters, we need an arbiter Sequential Only one communication at the time Bottleneck But simple and cheap CS575 lecture 3 4

5 Crossbar vs bus Crossbar Bus Scalable in performance Not scalable in hardware complexity Not scalable in performance Scalable in hardware complexity Compromise: multistage network CS575 lecture 3 5

6 Multi-stage network Connects n components to each other Usually built from O(n.log(n)) 2x2 switches Cheaper than cross bar Faster than bus Many topologies e.g. Omega (book fig 2.12), Butterfly,... CS575 lecture 3 6

7 Static Interconnection Networks Fixed wires (channels) between devices Many topologies Completely connected Star (n(n-1))/2 channels Static counterpart of crossbar One central PE for message passing Static counterpart of bus Multistage network with PE at each switch CS575 lecture 3 7

8 More topologies Necklace or ring Mesh / Torus Trees 2D, 3D Fat tree Hypercube 2 n nodes in nd hypercube n links per node in nd hypercube Addressing: 1 bit per dimension CS575 lecture 3 8

9 Hypercube Two connected nodes differ in one bit nd hypercube can be divided in 2 (n-1) D cubes in n ways 4 (n-2) D cubes 8 (n-3) D cubes To get from node s to node t Follow the path determined by the differing bits E.g à 11000: à à Question: how many (simple) paths from one node to another? CS575 lecture 3 9

10 Measures of static networks Diameter Maximal shortest path between two nodes Connectivity Ring: p/2, hypercube: log(p) 2D wraparound mesh: 2 sqrt(p)/2 Measure of multiplicity of paths between nodes Arc connectivity Minimum #arcs to be removed to create two disconnected networks Ring: 2, hypercube: log(p), mesh: 2, wraparound mesh: 4 CS575 lecture 3 10

11 More measures Bisection width Minimal #arcs to be removed to partition the network in two (off by one node) equal halves Ring: 2, Complete binary tree: 1, 2D mesh: sqrt(p) Question: bisection width of a hypercube? Channel width #bits communicated simultaneously over channel Channel rate / bandwidth Peak communication rate (#bits/second) Bisection bandwidth Bisection width * channel bandwidth CS575 lecture 3 11

12 Summary of measures: p nodes Network Diameter Bisection width Completely- Connected Arc connectivity #links 1 p 2 /4 p-1 p(p-1)/2 Star 2 p/2 * 1 p-1 Ring p/2 2 2 p Complete binary tree 2log((p+1)/2) 1 1 p-1 Hypercube log(p) p/2 log(p) p.log(p)/2 * The textbook mentions bisection width of a star as 1, but the only way to split a star into (almost) equal halves is by cutting half of its links. CS575 lecture 3 12

13 Meshes and Hyper cubes Mesh Buildable, scalable, cheaper than hyper cubes Many (eg grid) applications map naturally Cut through works well in meshes Commercial systems based on it. Hyper cube Recursive structure nice for algorithm design Often same O complexity as PRAMs Often hypercube algorithm also good for other topologies, so good starting point CS575 lecture 3 13

14 Embedding Relationship between two networks Studied by mapping one into the other Why? G(V,E) à G (V,E ) graph G, G, vertices V, V, edges E, E Map E à E, V à V congestion of k: k (>1) e-s to one e dilation of k: 1 e to k e -s expansion: V / V Often we want congestion=dilation=expansion=1 CS575 lecture 3 14

15 Ring into hypercube Number the nodes of the ring s.t. Hamming distance between two adjacent nodes = 1 Gray code provides such a numbering Can be built recursively: binary reflected Gray code 2 nodes: 0 1 OK 2 k nodes: take Gray code for 2 k-1 nodes Concatenate it with reflected Gray code for 2 k-1 nodes Put 0 in front of first batch, 1 in front of second Mesh can be embedded into a hypercube (Toroidal) mesh = rings of rings CS575 lecture 3 15

16 ring to hypercube cont G(0,1) = 0 i G(i,dim) G(1,1) = G(i,x+1) = 0 G(i,x) i<2 x 110 = 1 G(2 x+1 -i-1,x) i>=2 x 111 ( is concatenation) CS575 lecture 3 16

17 2D Mesh into hypercube Note 2D Mesh Rows: rings Cols: rings 2 r * 2 s wraparound mesh into 2 r+s cube Map node(i,j) onto node G(i,r) G(j,s) Row coincides with sub cube Column coincides with sub cube S.t. if adjacent in mesh then adjacent in cube CS575 lecture 3 17

18 Complete binary tree into hypercube Map tree root to any cube node left child to same node right child at level j: invert bit j of parent node CS575 lecture 3 18

19 Routing Mechanisms Determine all source à destination paths Minimal: a shortest path Deterministic: one path per (src,dst) pair Mesh: dimension ordered (XY routing) Cube: E-routing Send along least significant 1 bit in src XOR dst Adaptive: many paths per (src,dst) pair Minimal: only shortest Why adaptive? Discuss. CS575 lecture 3 19

20 Routing (communication) Costs Three factors Start up at source (t s ) OS, buffers, error correction info, routing algorithm Hop time (t h ) The time it takes to get from one PE to the next Also called node latency Word transfer time (t w ) Inverse of channel bandwidth CS575 lecture 3 20

21 Two rout(switch)ing techniques Store and Forward O(m.l) Strict: whole message travels from PE to PE m words, l links t comm = t s + (m.t w + t h ).l Often, t h is much less than m.t w : t comm = t s + m.l.t w Cut-through O(m+l) Non-strict: message broken in flits (packets) Flits are pipelined through the network t comm = t s + l.t h + m.t w Circular path + finite flit buffer can give rise to deadlock CS575 lecture 3 21

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect: