Real Parallel Computers

Size: px

Start display at page:

Download "Real Parallel Computers"

Naomi Montgomery
5 years ago
Views:

1 Real Parallel Computers

2 Modular data centers

3 Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing

Short history of parallel machines 1970s: vector computers 1990s: Massively Parallel Processors (MPPs) Standard microprocessors, special network and I/O 2000s: Cluster computers (using

4 Short history of parallel machines 1970s: vector computers 1990s: Massively Parallel Processors (MPPs) Standard microprocessors, special network and I/O 2000s: Cluster computers (using standard PCs); Grid computing Advanced architectures (BlueGene) Comeback of vector computer (Japanese Earth Simulator) IBM Cell/BE 2010s: Multi-cores, GPUs, Intel Phi Cloud data centers

Clusters Cluster computing Standard PCs/workstations connected by fast network Good price/performance ratio Exploit existing (idle) machines or use (new) dedicated machines Cluster computers vs.

5 Clusters Cluster computing Standard PCs/workstations connected by fast network Good price/performance ratio Exploit existing (idle) machines or use (new) dedicated machines Cluster computers vs. supercomputers (MPPs) Processing power similar: based on microprocessors Communication performance was the key difference Modern networks have bridged this gap Infiniband, 10G Ethernet, Myrinet

6 Literature Thomas Sterling, Clusters, in Encyclopedia of Parallel Computing (2011),

7 History of Cluster Computing Exploratory Period: Before 1980 Enabling Period: Classical Period: Advanced Period: 2005-now

8 History of Cluster Computing Exploratory Period: Before 1980 Intel X86, Ethernet (Xerox Parc), TCP, Unix, CSP model Enabling Period: and 32 bit 100 MHz 10 Mbit Ethernet BSD Unix (Berkeley), with virtual memory and networking PVM (message passing library) Condor: match-making scheduler

9 Classical Period: Two large research projects: UCB: NOW (Network of Workstations) High-end workstations & networks, proprietary software NASA: Beowulf Low-cost PCs, commodity networks, open source, <$50K 1994: MPI message passing standard 1995: Myrinet network (Myricom): expensive high-speed network that can be plugged into PCs 1997: First cluster in top-500 Supercomputer world was skeptical about whole concept 2004: 50% of top-500 were clusters

10 Advanced Period: 2005-now Clusters with multi-core & GPU nodes Variety of new programming systems: CUDA, OpenCL, OpenMP, OpenACC, Cilk, TBB, Sometimes called ``MPI + X MPI for message passing, something unknown (X) for accelerators Infiniband network: low latency, inexpensive 2011: 36% of top-500 was Infiniband, 50%: Ethernet Clusters have >80% of HPC market Supercomputers (IBM Blue Gene, Cray XT5) for high end of the market

11 Blue Gene/L Supercomputer

12 Blue Gene/L System 64 Racks, 64x32x32 Rack 32 Node Cards Compute Card Node Card (32 chips 4x4x2) 16 compute, 0-2 IO cards 2.8/5.6 TF/s 512 GB 180/360 TF/s 32 TB 2 chips, 1x2x1 Chip 2 processors 90/180 GF/s 16 GB 2.8/5.6 GF/s 4 MB 5.6/11.2 GF/s 1.0 GB

Blue Gene/L Networks 3 Dimensional Torus Interconnects all compute nodes (65,536) Virtual cut-through hardware routing 1.4Gb/s on all 12 node links (2.

13 Blue Gene/L Networks 3 Dimensional Torus Interconnects all compute nodes (65,536) Virtual cut-through hardware routing 1.4Gb/s on all 12 node links (2.1 GB/s per node) 1 µs latency between nearest neighbors, 5 µs to the farthest Communications backbone for computations 0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth Global Collective One-to-all broadcast functionality Reduction operations functionality 2.8 Gb/s of bandwidth per link Latency of one way traversal 2.5 µs Interconnects all compute and I/O nodes (1024) Low Latency Global Barrier and Interrupt Latency of round trip 1.3 µs Ethernet Incorporated into every node ASIC Active in the I/O nodes (1:8-64) All external comm. (file I/O, control, user interaction, etc.) Control Network

Top 500 http://www.top500.org/ Literature: Erich Strohmaier, Hans W. Meuer, Jack Dongarra, Horst D.

14 Top Literature: Erich Strohmaier, Hans W. Meuer, Jack Dongarra, Horst D. Simon: The TOP500 List and Progress in High-Performance Computing, IEEE Computer, Issue No.11, Vol. 48 (Nov. 2015), pp

15 TOP 500 Yardstick for supercomputing performance since 1993 Updated twice per year - Allows analysis over time Linpack benchmark: solves dense linear equations - Actual measured values (FLOPs) - Simple algorithm, but uses all system components

17 Trends High replacement rate Turnover rate (per 6 months): 190 systems out of 500 (until 2012) Average age of systems since installation: 1.26 years Performance grew faster than Moore s law: Faster nodes & larger machines with more nodes Until 2008: TOP500: factor 1.91 per year Moore s law: factor 1.59 per year Difference: increasing number of processor sockets Since 2008: decline in growth rate of sockets, but more cores per socket

20 Performance predictions Until 2008: overall TOP500 performance increased by ~ 1,000x per 11 years After 2008: increase of ~100x (extrapolated to 11 years) Mostly due to reduced growth in system size

21 Other TOP 500 s Linpack has been much criticized Green 500: Graph 500: Is TOP 500 representative for application performance? Comparison against Gordon Bell Award winner

Real Parallel Computers

Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history