Topologies. Maurizio Palesi. Maurizio Palesi 1

Size: px

Start display at page:

Download "Topologies. Maurizio Palesi. Maurizio Palesi 1"

Caroline Jefferson
6 years ago
Views:

1 Topologies Maurizio Palesi Maurizio Palesi 1

2 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance Cost and performance determided by many factors (flow control, routing, traffic) Measures to evaluate just the topology Bisection bandwidth Channel load Path delay Maurizio Palesi 2

3 Units of Resource Allocation A message is a contiguous group of bits that is delivered from source terminal to destination terminal. A message consists of packets A packet is the basic unit for routing and sequencing Packets maybe divided into flits A flit (flow control digit) is the basic unit of bandwidth and storage allocation. Flits do not have any routing or sequence information and have to follow the route for the whole packet A phit (physical transfer digits) is the unit that is transfered across a channel in a single clock cycle Maurizio Palesi 3

4 Units of Resource Allocation Maurizio Palesi 4

5 Network-on-Chip Factors that influence the performance of a NoC are Topology (static arrangement of channels and nodes) Routing Technique (selection of a path through the network) Flow Control (how are network resources allocated, if packets traverse the network) Router Architecture (buffers and switches) Traffic Pattern Maurizio Palesi 5

6 Topologies Direct network (Every node is both a terminal and a switch) Indirect network Switch Terminal node Maurizio Palesi 6

7 Direct and Indirect Networks Direct Network Every Node in the network is both a terminal and a switch Indirect Network Nodes are either switches or terminal Maurizio Palesi 7

8 Direct Network Topologies Most popular direct network topologies fall into n dimensional meshes Nodes have from n to 2n neighbors depending on their location k ary n cubes All nodes have the same number of neighbors (2n) 2 ary 4 cube 3 ary 2 cube 3D mesh Maurizio Palesi 8

9 Direct to Indirect Maurizio Palesi 9

10 Cuts A cut of a network, C(N 1,N 2 ), is a set of channels that partitions the set of all nodes into two disjoint sets, N 1 and N 2 Each element in C(N 1,N 2 ) is a channel with a source in N 1 and destination in N 2 or vice versa Total bandwidth of the cut C(N 1,N 2 ) B N 1,N 2 = c C N 1, N 2 b c Maurizio Palesi 10

11 Bisection The bisection is a cut that partitions the entire network nearly in half The channel bisection of a network, B C, is the minimum channel count over all bisections B C = min bisections The bisection bandwidth of a network, B B, is the minimum bandwidth over all bisections B B = min bisections C N 1, N 2 B N 1, N 2 Maurizio Palesi 11

12 Diameter The diameter of a network, H max, is the largest, minimal hop count over all pairs of terminal nodes H max = max x,y N H x, y For a fully connected network with N terminals built from switches with out degree O, H max is bounded by H max log δo N (1) Each terminal can reach at most O other terminals after one hop At most O 2 after two hops, and at most O H after H hops If we set O H = N and solve for H, we get (1) Maurizio Palesi 12

13 Average Minimum Hop count The average minimum hop count of a network, H min, is defined as the average hop count over all sources and destinations H min = 1 N 2 x, y N H x, y Maurizio Palesi 13

14 Physical Distance and Delay The physical distance of a path is D P = c P l c The delay of a path is t P =D P /v Maurizio Palesi 14

15 Traffic Patterns We represents message distributions with a traffic matrix where each matrix element s,d gives the fraction of traffic sent from node s to node d Hisorically based on communication patterns that arise in particular applications Matrix transpose or corner turn operations FFT, sorting shuffle Fluid dynamics simulations neighbor patterns Maurizio Palesi 15

16 Traffic Patterns Uniform Transpose 1 dst.x W src.x, dst.y H src.y Transpose 2 dst.x src.x, dst.y src.y Bit reversal Butterfly Shuffle Maurizio Palesi 16

17 Performance Throughput Data rate in bits/s that the network accepts per input port It is a property of the entire network It depends on Routing Flow control Topology Maurizio Palesi 17

18 Ideal Throughput Ideal throughput of a topology Throughput that the network could carry with perfect flow control (no contention) and routing (load balanced over alternative paths) Maximum throughput It occurs when some channel in the network becomes saturated Maurizio Palesi 18

19 Channel Load We define the load of a channel c, c, as γ c = bandwidth demanded from channel c bandwidth of the input ports Equivalently Amount of traffic that must cross c if each input injects one unit of traffic Of course, it depends on the traffic pattern considered We will assume uniform traffic Maurizio Palesi 19

20 Maximum Channel Load Under a particular traffic pattern, the channel that carries the largest fraction of traffic determines the maximum channel load max of the topology γ max =max c C γ c Maurizio Palesi 20

21 Ideal Throughput When the offered traffic reaches the throughput of the network, the load on the bottleneck channel will be equal to the channel bandwidth b Any additional traffic would overload this channel The ideal throughput Θ ideal is the input bandwidth that saturates the bottleneck channel bandwidth demanded from channel c γ c = bandwidth of the input ports γ c =γ max = b Θ ideal Θ ideal = b γ max Maurizio Palesi 21

22 Bounds for max max is very hard to compute for the general case (arbitrary topology and arbitrary traffic pattern) For uniform traffic some upper and lower bounds can be computed with much less effort Maurizio Palesi 22

23 Lower Bound on max The load on the bisection channels gives a lower bound on γ max Let us assume uniform traffic On average, half of the traffic (N/2 packets) must cross the B C bisection channels The best throughput occurs when these packets are distributed evenly across the bisection channels Thus, the load on each bisection channel γ B is at least γ max γ B = N 2B C ½(N/2) ½(N/2) ½(N/2) ½(N/2) Maurizio Palesi 23

24 Upper Bound on ideal We found that Θ ideal = b γ max and γ max γ B = N 2B C Combining the above equations we have Θ ideal 2 bb C N = 2B B N Maurizio Palesi 24

25 Another Lower Bound on max The average number of channel traversals required to deliver a packet for a given traffic is H min N Let us assume the best case in which all channels are loaded equally The load on every channel in the network is γ c, LB =γ max,lb = H min N C Maurizio Palesi 25

26 Simple Upper Bound on max Let us suppose a routing function that balances load across all minimal paths equally E.g., If there are R xy paths from x to y, 1/ R xy is credited to each channel of each path The maximum load over a channel c is γ c,ub = 1 N x N y N { P xy 1 / R xy if c P 0 otherwise So, the maximum load max,ub is the largest c,up over all channels γ max,ub =max c C γ c, UB Maurizio Palesi 26

27 Example (1/3) Let us consider a eight node ring network γ c,ub = 1 N x N y N { P xy 1 / R xy if c P 0 otherwise γ max,ub =max c C γ c, UB Maurizio Palesi 27

28 Example (2/3) Let us focus on channel (3,4) γ 3,4, UB = 1 N x N y N P xy { 1 / R xy if c P 0 otherwise γ 3,4, UB = 1 8 [ ] =1 Maurizio Palesi 28

29 Example (3/3) It is simple to observe that the load on all channels is the same max,ub = c,ub = 1 Now, let us compute the lower bound γ c, LB =γ max,lb = H min N C H min = 2 hops max,lb = 2 8/16 = 1 max,lb max max,ub max = 1 Maurizio Palesi 29

30 Latency The latency of a network is the time required for a packet to traverse the network From the time the head of the packet arrives at the input port to the time the tail of the packet departs the output port Maurizio Palesi 30

31 Components of the Latency We separate latency, T, into two components Head latency (T h ): time required for the head to traverse the network Serialization latency (T s ): time for a packet of length L to cross a channel with bandwidth b T =T h T s =T h L b Maurizio Palesi 31

32 Contributions Like throughput, latency depends on Routing Flow control Design of the router Topology Maurizio Palesi 32

33 Latency at Zero Load We consider latency at zero load, T 0 Latency when no contention occurs T h : sum of two factors determined by the topology Router delay (T r ): time spent in the routers Time of flight (T w ): time spent on the wires T h =T r T w =H min t r D min v T 0 =H min t r D min v L b Maurizio Palesi 33

34 Latency at Zero Load Topology Technology T 0 =H min t r D min v L b Router design Node degree Maurizio Palesi 34

35 Packet Propagation time Arrive at x Leave x t r x y z Arrive at y t xy Leave y Arrive at z t r t yz Routing delay Link latency Serialization latency L/b Maurizio Palesi 35

36 Case Study A good topology exploits characteristics of the available packaging technology to meet bandwidth and latency requirements of the application To maximize bandwidth a topology should saturate the bisection bandwidth Maurizio Palesi 36

37 Bandwidth Analysis (Torus) Asume: 256 1Gbits/s Bisection bandwidth 256 Gbits/s Maurizio Palesi 37

38 Bandwidth Analysis (Torus) 16 unidirectional channels cross the midpoint of the topology To saturate the bisection of 256 signals Each channel crossing the bisection should be 256/16 = 16 signals wide Constraints Each node packaged on a IC Limited number of I/O pins (e.g., 128) 8 channels per node 8x16=128 pins OK Maurizio Palesi 38

39 Bandwidth Analysis (Ring) unidirectional channels cross the mid point of the topology To saturate the bisection of 256 signals Each channel crossing the bisection should be 256/4 = 64 signals wide Constraints Each node packaged on a IC Limited number of I/O pins (e.g., 128) 4 channels per node 4x64=256 pins INVALID With identical technology constraints, the ring provides only half the bandwidth of the torus Maurizio Palesi 39

40 Delay Analysis The application requires only 16Gbits/s but also minimum latency The application uses long 4,096 bit packets Suppose random traffic Average hop count Torus = 2 Ring = 4 Channel size Torus = 16 bits Ring = 32 bits Maurizio Palesi 40

41 Delay Analysis Serialization latency (channel speed 1GHz) Torus = 4,096/16 * 1ns = 256 ns Ring = 4,096/32 * 1ns = 128 ns Latency assuming 20ns hop delay Torus = *2 = 296 ns Ring = *4 = 208 ns No one topology is optimal for all applications Different topologies are appropriate for different constraints and requirements Maurizio Palesi 41

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and