EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Size: px

Start display at page:

Download "EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University"

Jasper Bruce
6 years ago
Views:

1 EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition Computer Architecture: A Quantitative Approach by Hennessy and Patterson 1

2 Warehouse-scale computers The data center as a computer. L. Barroso 2

3 Performance metrics for clusters Supercomputers: Execution time Threads communicate using message passing FLOPS (FLOP/s): theoretical peak or using a standard benchmark (e.g., LINPACK is used for Top-500 supercomputer ranking) Data center scale: Abundant parallelism through request-level parallelism Latency is important metric because it is seen by users Bing study: users will use search less as response time increases Service Level Objectives (SLOs)/Service Level Agreements (SLAs). E.g. 99% of requests be below 100 ms S. Reda EN2910A FALL 13 3

4 Anatomy of data center 7-ft rack (42U) holds 40 1U servers and a rack-level ethernet switch Rack switches have uplinks connecting to cluster-level switches 4

high-density blade servers and GPGPU Blade

5 Servers Blade chassis Data centers: 1U servers; each has two quad-core processors & 16 GB Disk attached to each node, 1Gbps ethernet Supercomputers Use more expensive high-density blade servers and GPGPU Blade chassis has number of blades (e.g., Cray XE6 blade with four 16-core AMD Opteron processors with 64 GB). Chassis provides shared power supply and cooling 5

6 Power usage of servers Assumes two-socket x86 servers, 16 DIMMs and 8 disk drives per server, and an average utilization of 80%. 6

7 Storage hierarchy of a data center Data centers use global distributed storage (disks attached to each compute node), whereas supercomputers use Network Attached Storage (NAS) devices connected directly to the cluster network. 7

8 Data center network Datacenters often reduce communication costs by oversubscribing the network at the top-of-rack. In an oversubscribed network, we increase that 1:1 ratio between server and fabric ports. For example, with 2:1 oversubscription we only build a uplinks with half the bandwidth. Each server can still peak at 10 Gbps of traffic, but if all servers are simultaneously sending traffic they ll only be able to average 5 Gbps. In practice, oversubscription ratios of 4 10 are common. 8

9 Quantifying latency, bandwidth and capacity Data center assumption: 2,400 servers, each with 16 GB of DRAM and four 2 TB disk drives Each group of 80 servers is connected through a 1-Gbps link to a racklevel switch that has an additional eight 1-Gbps ports used for connecting the rack to the cluster-level switch 9

10 Support equipment 10

11 Power distribution and losses More losses inside servers to convert from AC to DC Total losses about 11% 11

12 Heat flow inside a facility Datacenters employ raised floors. The underfloor area often contains power cables to racks, but its primary purpose is to distribute cool air to the server racks through perforated tiles. Heat recirculation contaminates cold air aisles which is a main source of inefficiencies. 12

13 Heat exchange systems 13

ratio of total building power to IT power.

14 Data center energy efficiency Power usage effectiveness (PUE) reflects the quality of the datacenter building infrastructure itself, and captures the ratio of total building power to IT power. survey results of over 1100 datacenters in 2012 Check Google s green web 14

Need for energy-proportional computing Average activity distribution of a sample of two Google clusters, each containing over 20,000 servers, over a period of 3 months (

15 Need for energy-proportional computing Average activity distribution of a sample of two Google clusters, each containing over 20,000 servers, over a period of 3 months ( January- March 2013). Variable utilization creates server idleness à wasted power consumption à need for mechanism to adjust the power consumption based on the load of servers 15

16 Software Platform-level software Operating system Virtual machines Cluster-level infrastructure software Resource management Hardware abstraction (e.g., global distributed file systems and message passing) Programming frameworks Applications Transactional à data centers Batch à supercomputers 16

Supercomputer and data centers networks Data centers use Ethernet, which is good enough for request-level parallelism with limited inter-thread communication.

17 Supercomputer and data centers networks Data centers use Ethernet, which is good enough for request-level parallelism with limited inter-thread communication. Supercomputer use Infiniband and other proprietary networks which enable various possible network topologies to facilitate quick communication and synchronization among threads on various servers. 17

18 Indirect and direct networks or or or Indirect Network Network as a box with no servers Direct Network Node: computing + switch 18

Network metrics Network size: number of nodes Node degree: number of ports for each switch (switch complexity) Network diameter: Longest shortest path between any two

19 Network metrics Network size: number of nodes Node degree: number of ports for each switch (switch complexity) Network diameter: Longest shortest path between any two nodes in the network Network bandwidth: best-case total bandwidth Bisection bandwidth: worst-case total bandwidth when half of nodes is communicating with other half 19

Indirect networks: crossbar switch Offer best latency as diameter is just 1 Point-to-point connections between N pair of nodes can be established as long as the

20 Indirect networks: crossbar switch Offer best latency as diameter is just 1 Point-to-point connections between N pair of nodes can be established as long as the pairs are distinct. Bandwidth scales linearly with the number of nodes Size is not scalable as number of switches grows quadratically with the number of nodes. 20

21 Indirect networks: Omega and butterfly Omega network butterfly network Number of switches = N/2 log N à addresses size scalability problems of crossbar. Diameter grows as log N. Omega network uses perfect-shuffle exchange More prone to contention. Consider 0 communication with 5 and 4 wants to communicate with 6 in Omega. 21

22 Indirect networks: trees Binary tree Number of switches = N -1 Simple structure Bisection BW = 1 Fat tree Links do not have the same BW Links get thicker (i.e. more BW) as we get to root How to build a fat tree with skinny switches? 22

23 Direct networks: linear arrays and rings Great aggregate BW à can exchange N-1 messages at a time Bisection BW = 1 for array and 2 for ring Layout of ring can be done to get short links 23

24 Direct networks: meshes and tori mesh 2D torus 3D torus Improves bisection BW and diameter Increases number of links; requires more ports per switch (i.e., degree increases) What is the diameter? 24

à 4 chassis per rack and a total of 3 racks 2D torus: 4x3 torus.

25 Designing a torus topology for HPC cluster Cluster of 192 blades. Each server is 10U chassis with 16 blades and a switch with 32 ports Racks: 12 10U chassis à 4 chassis per rack and a total of 3 racks 2D torus: 4x3 torus. Each switch has 32 ports à 16 ports to the blades and 16 ports to each of the neighbors (4 cables each) [example from clusterdesign.org] 25

26 Direct networks: hypercubes n-dimensional hypercubes Number of nodes = 2 n. One hop to n nodes à n+1 ports per switch. Diameter = n. What is the bisection bandwidth? 26

27 Summary Supercomputers and data centers issues: Performance evaluation Server makeup Storage hierarchy Network topologies Support infrastructure for cooling and power delivery Power consumption concerns 27

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection