Homework Assignment #1: Topology Kelly Shaw

Size: px

Start display at page:

Download "Homework Assignment #1: Topology Kelly Shaw"

Georgina O’Connor’
5 years ago
Views:

1 EE482 Advanced Computer Organization Spring 2001 Professor W. J. Dally Homework Assignment #1: Topology Kelly Shaw As we have not discussed routing or flow control yet, throughout this problem set assume (1) that a network can support throughput to a level that saturates the bottleneck channel, i.e. speedup = 1, and (2) that traffic between two nodes is uniformly distributed over all minimum length paths between those two nodes. Problem 1: Path Diversity Consider the following topologies which all have 1GByte/s channels. 1.) 2-ary 4-fly 2.) 2-ary 4-fly extended by one stage at the beginning 3.) Benes network created from two 2-ary 4-flies. Evaluate each of the following traffic patterns on the 2-ary 4-fly. If the sustainable bandwidth for a traffic pattern on the 2-ary 4-fly is less than 1GByte/s, you must evaluate the traffic pattern on the extended butterfly. If the sustainable bandwidth for a traffic pattern on the extended butterfly is less than 1GByte/s, you must then evaluate the traffic pattern on the Benes network. 1.) What is the sustainable bandwidth per port of this network on traffic that routes from {b3, b2, b1, b0} to {b0, b3, b2, b1}? This means you'll route to {0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15}. 2.) What is the sustainable bandwidth per port of this network on traffic that routes from {b3, b2, b1, b0} to {b1, b3, b2, b0}? 3.) What is the sustainable bandwidth per port of this network on traffic that routes from {b3, b2, b1, b0} to {b2, b1, b0, b3}? (Note: You may have to calculate channel load in a manner similar to when multiple minimal paths are possible due to the randomized routing in the extended stages.) We ll use the following notation for this problem: 1. -> denotes going from an input port to an output port of a switch 2. denotes the butterfly permutations between switches of adjacent stages 1.) {b3, b2, b1, b0} to {b0, b3, b2, b1} On a 2-ary 4-fly: The route taken in binary is: Source: {b3,b2,b1,b0} -> {b3,b2,b1,b0} {b0,b2,b1,b3} -> {b0,b2,b1,b3} {b0,b3,b1,b2} -> {b0,b3,b1,b2} {b0,b3,b2,b1} -> {b0,b3,b2,b1}: Destination The easiest way to think of routing in a butterfly is to realize that each of the N/k switches per stage is uniquely numbered. Log k (N/k) base-k digits are used to number each switch. In any ordinary butterfly, all of the inputs to a switch will have the same log k (N/k) digits as the digits in the switch s unique number. The butterfly permutations occurring between switches of adjacent stages cause this to be true. 1

2 If the output port to be taken out of the switch is some function of the digits used in the switch s number, there will be conflicts because the inputs are trying to contend for the same output ports. The number of conflicts depends on the function determining which output port is desired. In the 2-ary 4-fly, each switch is represented by 3 binary digits. At the first stage, switches are represented by the digits b3b2b1. Since the first digit of the destination is b0 for this permutation, neither of the inputs on a switch will conflict; the inputs have different values for b0. The channel load is 1. Between the first and second stage, the address is permuted to be b0b2b1. This is the switch address for inputs in the second stage. The inputs to this switch have the same values for these 3 digits but differ for their value of b3 (one input will have b3=1 and one will have b3=0 based on the construction of the butterfly). Since the inputs differ in their value for b3 (their desired output port), there are no conflicts in the second stage and the channel load is 1. At stage 3, the switch address will be b0b3b1 and the inputs are being routed to the output port represented by the value b2. Again, there are no conflicts so the channel load is 1. Finally, the switch address in the 4 th stage is b0b3b2. The desired output port for each switch input is represented by b1, so there are no conflicts. Channel load is 1. The maximum channel load is therefore 1 (for all stages) and the sustainable bandwidth is 1Gbyte/s. 2.) {b3, b2, b1, b0} to {b1,b3,b2,b0} On a 2-ary 4-fly: The route taken in binary is: Source: {b3,b2,b1,b0} -> {b3,b2,b1,b1} {b1,b2,b1,b3} -> {b1,b2,b1,b3} {b1,b3,b1,b2} -> {b1,b3,b1,b2} {b1,b3,b2,b1} -> {b1,b3,b2,b0}: Destination In the first stage, the switch address is b3b2b1. Both inputs in each switch want to take the output port represented by b1. However, both inputs have the same value for b1 (otherwise they wouldn t be inputs to the same switch). Therefore, there is a conflict and channel load is 2. At the second stage, there are two inputs on each input port for every other switch. The address of the switch is b1b2b1 and each pair of inputs wants to take the output port represented by b3. The inputs from different input ports do not have the same value for b3 although they do have the same values for b1 and b2. This means there is no conflict between pairs of inputs although each pair causes each output channel to have a load of 2. The third switch will also have two inputs on each input port. The switch s address is b1b3b1. Each pair of inputs wants to take the output port represented by b2. Because the input pairs do not have the same value for b2, there are no additional conflicts and channel load remains at two. Finally, the fourth switch has an address of b1b3b2 and has two inputs on a single input port. Again, the inputs want to go out on different output ports (b0) so the channel load is 1. The maximum channel load is 2 for all stages and the sustainable bandwidth is 1 Gbyte/s / 2 = 0.5Gbyte/s. On an extended butterfly: The first stage of the extended butterfly is a repeat of the final stage of the 2-ary 4-fly. For every two switches, this new stage exchanges one input from each switch randomly. This means that each switch in the first stage of the original butterfly will have one input with b1=1 and one input with b1=0. The values for b2 and b3 will be the same as in the original butterfly configuration. (The variance in b1 means that switch addresses are not completely synonymous with the first 3 bits of the input addresses; only bits b3 and b2 will match the switch address.) It is possible that a first stage switch in the original butterfly may now have two inputs with the same b0 value. Having explained how the network inputs are exchanged in the new stage, we now examine the first stage of the original butterfly. At the first stage of the original butterfly, the inputs want to leave on the output ports designated by b1. Because of the arrangement done by the new stage, each switch has exactly one input with b1=1 and one with b1=0 so there is no conflict. The channel load is 1. At the second stage of the original butterfly, the inputs will have the same values for b1 and b2 and the desired output is represented by b3. The added stage did not alter the usual butterfly pattern for b3 or b2, so the two inputs to each switch will not have the same b3 value. Thus, there is 2

3 no conflict and the channel load is 1. At the third stage of the original switch, the inputs have the same values for b1 and b3 and the desired output port is b2. Again, there is no contention for the output port, so channel load is 1. At the fourth stage of the original switch, the switch address is b1b3b2 and the output port is b0. There is no conflict and channel load is 1. The maximum channel load is 1 and the sustainable bandwidth is 1 Gbyte/s. 3.) {b3, b2, b1, b0} to {b2,b1,b0,b3} On a 2-ary 4-fly: The route taken in binary is: Source: {b3,b2,b1,b0} -> {b3,b2,b1,b2} {b2,b2,b1,b3} -> {b2,b2,b1,b1} {b2,b1,b1,b2} -> {b2,b1,b1,b0} {b2,b1,b0,b1} -> {b2,b1,b0,b3}: Destination In the first stage of the butterfly, each switch is numbered b3b2b1. The desired output for each input is represented by b2. b2 is used to represent the switch number so both inputs to a switch have the same value for b2. This means there is a conflict and the channel load is 2. Four switches in the next stage have two inputs on each input port. The second stage switch is numbered b2b2b1 and the desired output port for the two pairs of inputs is b1. Because b1 is used in the numbering of the switch, both pairs of inputs want to exit the switch on the same output port. This results in a channel load of 4. At the third stage, 4 inputs enter one input port and there are no inputs on the other port (they are all overloading the input port of a different switch). The switch at this stage is represented by b2b1b1. At this point, you need to realize that there are 4 inputs to this switch with the same values for b2 and b1. They must, therefore, vary in digits b3 and b0; two inputs will have b3=0, two will have b3=1 (because the inputs come from different halves of the butterfly), two will have b0=0, and two will have b0=1 (because the pairs came about from the inputs to the first stage switches which differ in b0). The two inputs with the same value for b0=0 will therefore contend for one output port and the two inputs with b0=1 will contend for the other output port. Thus, channel load is 2. At stage four, the switch is represented by b2b1b0 and the desired output port is b3. The switch will have a pair of inputs on one input port. These inputs have different values for b3, so there is no contention. The maximum channel load is 4 so the sustainable bandwidth is 1Gbyte/s / 4 = An extended butterfly: As stated in part 2, the new stage randomly routes between inputs from one pair of switches in one stage to another pair of switches in the next stage. The switches at the first stage of the original butterfly are still represented by b3b2 but the third digit is not the same for both inputs. Each switch will have inputs with the same values for b3 and b2, different values for b1, and may have different or the same values for b0. In this traffic pattern, the inputs at the first stage of the original butterfly desire to exit the switch using the output port represented by b2. However, each switch has two inputs with the same value for b2. This means that the channel load will be 2. Four switches in the second stage of the original butterfly will have two inputs on each input port. They will want to leave the switch using the output port represented by b1. One input from each input port pair will have b1=0 and one input of the pair will have b1=1. (This is because of the randomization done by the new stage.) This means that two inputs will want to leave on the 0 output port and two inputs will want to exit on the 1 output port. The channel load is therefore 2. At the third stage of the original butterfly, one input port at each switch will have two inputs. They will want to use the output port designated by b0. At the added stage, we exchanged inputs randomly. This meant that each switch routed traffic with b0=0 to the top switch in the next stage 50% of the time and to the bottom switch 50% of the time. That means that there was a 50% probability that the inputs have the same value for b0 and a 50% probability that the inputs have differing values for b0. This means that a given channel out of the switch will have a load of 0 with 0.25 probability, a load of 1 with 0.5 probability, and a load of 2 with 0.25 probability. The channel load is therefore 0.25*0+0.5*1+0.25*2 = 1. Each final switch has two inputs on different input ports that are to exit on the output port represented by b3. The b3 values for the two inputs at each switch in the final stage will be different. Thus, channel load is 1. 3

4 The maximum channel load is 2 so the sustainable bandwidth is 1 Gbyte/s / 2 = 0.5 Gbyte/s. In the Benes network: Each new stage added to create a Benes network routes traffic randomly. At each new stage in the network, a bit is randomized. The first new stage of the Benes network results in two inputs to the switch in the next stage with identical values for b3 and b2, different values for b1, and the value for b0 will be 0 half of the time and 1 the other half of the time. The second new stage results in two inputs to the next stage with identical values for b3, the inputs will have different values for b2 and the values for b0 and b1 will both be 0 half of the time and 1 the other half of the time. The third new stage results in two inputs to the switch in the next stage with different values for b3 and the values for b0, b1, and b2 will be 0 half of the time and 1 the other half of the time. Thus, at the first stage of the original butterfly, you have two inputs with different b3 values and values for b0, b1, and b2 which are 0 half of the time and 1 the other half of the time. Suppose you try to determine the channel load when using b0, b1, or b2 to choose a given output channel. There is then a 0.25 probability that neither input will be routed to the channel, a 0.25 probability that both inputs will be routed to that channel, and 0.5 probability that exactly one input will be routed to the channel. The channel load will therefore be 0.25*0+0.25*2+0.5*1=1. On the traffic pattern b2b1b0b3, the channel load at each original butterfly stage will be 1 when using b2, b1, and b0 to determine the output port because of the randomization introduced by the additional stages. At the last stage, none of the inputs at a switch will have the same value for b3 and the channel load will be 1. Therefore, the maximum channel load is 1 and the sustainable bandwidth is 1Gbyte/s. Problem 2: k vs. n Comparison You need to create a k-ary n-cube with 256 nodes given the following constraints. At most 384 signals can go off of a chip. Similarly, 512 signals may go off a board. Finally, you are constrained by the backplane; 6000 signals may cross the midsection of the backplane. a.) Determine k and n. b.) Explain how you would package this network on chips and boards. c.) Using the same number of chips and boards, what is the largest square crossbar you can construct? 4

5 N 256 Ws 6000 wire bisection Wn 384 signals per chip Wb 512 signals per board L 512 bit packet length (assumed) n k w-chip w-board w-s w Bs H Ts T=H+Ts nodes per board nodes per board node per board nodes per board - in a ring nodes per board node per board nodes per board nodes per board node per board node per board If bandwidth is the most important criteria, the optimum choice is n=4, k=4 packaged 1 node to a board. This gives a bisection bandwidth of 5888 (nearly the limit of 6000) and a latency of 26 cycles. If minimum latency is more important, the n=2, k=16 design offers a latency of 19 cycles but at a penalty of reducing bandwidth by half to The network is packaged one node per chip and one chip per board. Placing more than one chip per board reduces the width of the channels and hence the bandwidth of the network. Constructing a square crossbar with 256 chips can be done as an array of 16 x 16 chips. Each chip implements a 192 x 192 port crossbar for a 3072 x 3072 port 1-bit wide crossbar. Of course, if you wanted to implement a network with w=23-bit wide channels, like the 4-ary 4-cube designed in part (a), then each chip could only hold an 8x8 crossbar and the 16 x 16 array would realize a 128 x 128 crossbar (not counting chips for fanout and fanin). Note that this 128 x 128 w=23-bit crossbar has only about half the bisection bandwidth (128 x 23 = 2944) of the 4- ary 4-cube. Problem 3: Concentration and Sharing (note: 7 questions) You are given the task of creating a 16-ary 2-cube network with 1GByte/s channels. Each port has an average bandwidth of 225 MByte/s. When a node presents a request to the network, it does so at a peak bandwidth of 1GByte/s for 200ns. a.) Estimate the cost of this network in terms of total pin bandwidth of the routers. b.) You decide to use concentrators. What degree of concentration should you use? c.) What are the average and peak bandwidths of the concentrated node? d.) What will be the topology of your new network? e.) Estimate the cost of your new network with concentrators, still assuming 1GByte/s channels. f.) What is the serialization latency for an access, still assuming 1GByte/s channels? g.) If you set the network to handle average bandwidth, what is the serialization latency? Soln: This solution assumes bidirectional channels. 5

6 a.) Each node has 4*2 channels. There are 256 nodes. The bandwidth of the channels is 1Gbyte/s. Therefore, the total pin bandwidth is (8 channels/node) *(256 nodes)*(1 Gbyte/s / channel) = 2048Gbyte/s or 2Tbyte/s. b.) The channels have a bandwidth of 1Gbyte/s. Given that the average bandwidth of a port is 225Mbyte/s, you can use a concentration of 4. This will actually decrease your channel load from 2 to 1 because you cut the radix of the network from 16 to 8. c.) The average bandwidth of the concentrated node is 900 Mbyte/s and the peak bandwidth is 4 Gbyte/s. d.) We are combining every four nodes. The way we will do this is by combining every two nodes in each dimension. This means each ring in the network will consist of 8 concentrated nodes. The final network is an 8-ary 2 cube. e.) The total number of concentrated nodes is 8^2 = 64. Since each concentrated node still has 8 channels, the total pin bandwidth is 8*64*1Gbyte/s = 512 Gbyte/s. f.) The serialization latency of the network would be 200ns with 1 Gbyte/s channels. g.) By sizing channels with the average bandwidth, 900MB/s, we ve reduced the channel bandwidth by a factor of The serialization latency, therefore, increases by the same factor 200ns *1.11 = 222ns. 6

Topology basics. Constraints and measures. Butterfly networks.

EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April