EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1

From last class Outline Crossbar network Shuffle exchange network Multistage network Today CLOS network Butterfly network Hypercube network Tree-based network Performance metrics 2

Announcements HW #2 out Today Programming HW #2 out last Friday (27 th Jan) 3

Large software project Scientific computing Graph analytics Big data Sample course projects Course Project (1) Multi-core implementation of data plane kernels for software defined networking (e.g., traffic classification, packet classification, etc.) Accelerating convolutional neural networks (CNN) using GPU Big data analytics using Spark (e.g., graph analysis, log processing, etc.) Students can work in teams of 3 4

Course Project (2) Project timeline Week 5: Project Discussion Week 5-8: Identify team members and project topic Week 9: Project proposal due (March 10 th ) Week 14-15: Presentation Week 15: Project report due (April 28 th ) Grading breakdown for the course project Proposal: 25% Final presentation: 25% Final report: 50% 5

CLOS network (1) Multistage network n input crossbar Cost: α n 2 Cost: α n n input crossbar 6

CLOS network (2) Structure of CLOS network Control n n 7

CLOS network (3) Stage i Stage i + 1 connections Any Box all boxes in the next stage 3 stage network Number of switches (boxes) = 3 n Cost of n n crossbar = O n - Total Cost = O n n = O(n / 0) Note: CLOS network can realize all n! permutations 8

CLOS network (4) Realizing a permutation Choose the control setting for each of the 3 n boxes so the desired connection is realized # of control bits/box = log( n!) A permutation can be specified by n log n bits 9

CLOS network (5) Example: 9 inputs / outputs 0 1 2 3 i i + 3 mod 9 1 0 1 2 3 4 5 2 3 4 5 6 7 8 3 6 7 8 10

CLOS network (6) Example: 4 input / output CLOS network Example permutation: Example Permutation: 0 2 1 3 2 1 3 0 Note: All 4! permutations can be realized 11

CLOS network (7) Another example: 4 input / output CLOS network Example permutation: Permutation: 0 2 1 3 2 1 3 0 12

Routing problem Given a permutation p CLOS network (8) Specify the switch settings such that all connections in p are realized Each 3 n box permutation on n inputs 13

CLOS network (9) Non blocking network Any connection request from input to output can be routed at any time without rearranging the existing set of connections at that time 14

CLOS network (10) Rearrangeable network (1) To route a connection, we may have to rearrange existing connections, i.e., change the control settings of some switches that are routing an existing connection Stage 0 Stage k 1 0 n 1... Permutation ; Switches -... Permutation ; Switches -... 15

CLOS network (11) Rearrangeable network (2) A (multistage) network is rearrangeable if any connection (i j) can be realized by (possibly) rearranging some existing connections. 16

n = 2 k for some k Butterfly network (1) 8-input butterfly network Stage 0 Stage 3 log - n + 1 stages Stage l, 0 l < log - n i i in Stage l + 1 i i complement bit l in Stage l + 1 l = 0 l = 1 l = 2 17

Butterfly network (2) n input butterfly network Total number of nodes = n E log - n + 1 Total number of edges = n E 2 E log - n # of edges/node # of stages 18

Mesh-connected Network 1-D mesh 2-D mesh (without wraparound) Without wraparound 1-D torus With wraparound ring 0 1 p-1 0 1 p-1 p p Number of connections per node 2k 19

Hypercube Network (1) 100 110 Existing edge Added edge 0 1 00 10 01 11 000 010 101 001 011 111 0-D hypercube 1-D hypercube 2-D hypercube 3-D hypercube 0100 0110 0000 0010 0101 0111 0001 0011 1100 1110 1000 1010 1101 1111 4-D hypercube 1001 1011 Construction of hypercube from hypercubes of lower dimension 20

Hypercube Network (2) In general, for k dimension hypercube: p = total number of nodes = 2 F Node i = i FGH i FG- i J k connections/node complement a bit of i Example: 001 000 010 100 000 100 101 010 110 111 001 011 21

Tree-based Network (1) Processing node Switching node Height: log p + 1 1 Static tree network Dynamic tree network p = total number of nodes 22

Tree-based Network (2) Fat tree (16 processing nodes) 23

Performance metrics (1) Diameter Diameter Maximum distance between any two processing nodes in the network diameter max{distance(i, j)} (T,V) distance(i, j) min path length between i and j length of the shortest path between i and j length of a path = # of edges in the path 24

Example: 0 1 2 Performance metrics (2) 3 Distance 0 1 1 0 2 1 0 3 1 1 0 1 1 2 2 1 3 1 2 0 1 2 1 2 2 3 1 3 0 1 3 1 1 3 2 1 Diameter = 2 25

Performance metrics (3) Diameter Example: 1-D mesh (no wraparound): p 1 { 0 p 1 } 2-D mesh (no wraparound): 2( p 1) { (0,0) ( p 1, p 1)} Hypercube : log - p { 0 p 1 } 26

Performance metrics (4) Diameter = d Routing can be performed in at most d steps (hops). 27

Performance metrics (5) Bisection width (1) Bisection width Minimum number of links to be removed to partition the network into two equal-sized (in number of nodes) subnetworks partition links g - nodes g - nodes Minimum over all possible partitions 28

Performance metrics (6) Example Bisection width (2) partition p p p for 2-D mesh (no wraparound) 29

Performance metrics (7) Bisection bandwidth (3) Bandwidth number of bits/sec Example: 8 bits Bandwidth = 8 clock rate Bisection bandwidth Minimum bandwidth between any two equal halves (number of bits/sec that can be exchanged between 2 equal halves of the network) g nodes g nodes - - i bandwidth of each link 30

Performance metrics (8) Cost of a static network Cost number of communication links in the network Example: Tree: p 1 1-D mesh (no wraparound): p 1 d-dimensional wraparound mesh: d E p Hypercube: gelmn 0g - k-ary d-cube A d-dimensional array with k elements in each dimension Number of nodes p = k o Cost: dp 31

Summary Network Diameter Bisection width Cost (No. of links) Completely connected 1 p - 4 p(p 1) 2 Star 2 1 p 1 Complete binary tree 2 log p + 1 2 1 p 1 1-D mesh, no wraparound p 1 1 p 1 2-D mesh, no wraparound 2( p 1) p 2(p p) 2-D wraparound mesh 2 p/2 2 p 2p Hypercube log p p/2 (p E log p)/2 Wraparound k-ary d-cube p = k o d k/2 2k ogh dp 32

Summary Static network Multistage network Cost Performance Routing Diameter Bisection bandwidth 33